Last Thursday we met with Ariel Rokam to discuss reproducibility in scientific research. Ariel is a neurobiologist from the University of Washington and a data scientist at the eScience Institute where he collaborates with scientists from various fields working towards promoting research that is reproducible and open. Reproducibility is fundamental to scientific progress and is a major component in conducting good science. In order to conduct good science, we need to be able to demonstrate reliable results that are consistent across replications. We talked with Ariel to discuss what it actually means to be reproducible and open, why it’s important, what some of the challenges are, and what tools we can use in making our own research reproducible and open.
So, what does it mean to be reproducible and open? How can we foster reproducibility in our own work and promote an open science community? Ariel outlined three main components of reproducible open research:
- Automation and computational reproducibility: Although it is, of course, important to keep a thorough record of our experimental methods, such as in a laboratory notebook, we can also work towards reproducibility by documenting computational analyses. Automation is, in a sense, a way to script your methods from start to finish. You could also think of it as a way of provenance tracking, where each step, from the input and management of the raw data to the final output, is documented in a reusable medium. This allows you (and others) to recreate the process at a later time. Being able to reproduce an experiment, data analyses, and figures provides transparency and accountability for your work. Can you produce all the analyses or figures in your paper with a single press of a button? If you’ve automated everything, then the answer should be yes. You should be able to easily recreate your output by simply copying your scripted code and running it through the appropriate software. One way you can track your research and make it available for others is by using an open access online repository, such as GitHub. If you use literate programming (i.e. code that is more or less self-explanatory) you will make it easier for others (and your future self!) to be able to read and reuse your code, and to see what transformations were made from the raw data to the final figures.
- Availability of data and code: In tandem with automating everything, it’s also useful to make your data and code available in a format that others can easily access. Shared community resources, like data, are becoming increasingly more common in fields such as genomics and astronomy. Making your data and code available accelerates scientific progress and allows others to build on your work without having to recollect and analyze new data. Research is expensive and time consuming! Making your data available for others to use reduces costs associated with research and reusing data saves time. Sharing data and code can also help improve the quality of scientific research by keeping research open and honest. Not driven by the prospect of helping others and bettering the scientific community? Well, you can benefit too! By using tools used by the broader community, such as open source tools, you will reach more people. And the more people that you reach and share your data with, the more citations you can generate. Sharing your data also promotes a faster turnaround time during the review process if reviewers have access to your data and code.
- Open access to publication: Publishing a manuscript in an open access journal ensures that your work can be viewed or potentially used by mostly anyone (keep in mind that different journals have different use restrictions). It is important to consider open access publication at the start of a project in part due to usage rights/restrictions, but especially when data sharing requires approval from others (i.e. collaborators, participants, etc..). Thinking about sharing from the beginning also saves time later, so as you work you can format your files in preparation for sharing.
Open science is one way to address some of the concerns regarding reproducibility. However, there are many fields (such as entomology) where open science is not common practice. This is in part because our current model encourages publishers to close access to papers and require subscriptions in order to view them. We’re still operating under the publish or perish pretense, where our careers hinge on our publication records. Resistance from the scientific community, partially acting out of fear of data stealing or being scooped, hinders data sharing. However, another perspective, is that data sharing can actually help prevent scooping. Making your data available first establishes precedence. One way you can do this is by using preprint servers. In addition, if another lab groups finds the same results as you, isn’t this a good thing? Reproducibility ensures that your results are sound. Although scooping is a cause for concern, for myself personally, it would actually more frightening if I were to think that I was working among so many cheaters that would potentially want to scoop me (rather than being afraid of the scooping itself). Scientists are often held by their institutions (and should also hold themselves) to some sort of responsible conduct of research code. Are there really that many scientists out their compromising their research ethics that we have to be so secretive about our research? If we can’t be open and trust each other, how can we expect others (like the public and policymakers) to trust us?
In short, science is constantly growing and thrives by building off of the work of others. There is never truly an end point. Publication of a research finding is the foundation for new research to begin and reliable results are critical to the expansion of new ideas. Making your research reproducible and open saves time, money, and effort invested into research. As we continue to work through our lampyrid dataset as a class, we are doing so with the intention of keeping our research transparent and reproducible to lay the groundwork for future research and to contribute to the open science community.