Reproducible Research Insights with Aaron T. Hess and Mark Chiew

0
1425

By Mathieu Boudreau

Screenshot of the GitHub repository where the code for this paper was shared, and is available here.

The October 2021 MRM Highlights Reproducible Research Insights interview is with Aaron T. Hess and Mark Chiew, researchers at the University of Oxford in the United Kingdom. Their paper is entitled “Accelerated calibrationless parallel transmit mapping using joint transmit and receive low-rank tensor completion”. Their paper was chosen because, in it, the authors demonstrated exemplary reproducible research practices; in particular, they shared all the scripts and data required to reproduce every figure published in the paper. To learn more about the work done by these two researchers, check out our recent interview with them.

To discuss this blog post, please visit our Discourse forum.

General questions

1. Why did you choose to share your code/data?

There are a number of reasons. First, to increase the impact of the work, by reducing the barriers that might otherwise prevent others from building on what we have done. When our methods are used by other people, the value of the work we have put in is multiplied. It also allows for independent validation of our results. Second, this approach has educational value for people who are trying to learn how these reconstruction methods work, or how they are practically implemented. We’ve all benefited from learning by example, so releasing code/data allows us, too, to contribute back to the community. Third, preparing code and data for release forces you to adhere to good practices with respect to reproducibility and documentation, and this, in addition to the external-facing benefits, also has value internally when it comes to archiving the work for your own group.

2. What is your lab or institutional policy on code/data sharing?

The Wellcome Center for Integrative Neuroimaging (WIN) is an open science community with a positive culture for sharing data, tasks, tools and protocols, and research practices that improve the transparency, reproducibility and impact of our outputs and accelerate translation to the clinic. We have a dedicated institutional open science community engagement coordinator as well as open science ambassadors. You can find out more about all this here:  https://www.win.ox.ac.uk/open-win.

(left) Mark Chiew’s research group. From left to right: Charlie Millard, Xi Chen, Mark Chiew and Mo Shahdloo. (right) Aaron and his son outside the Radcliffe Camera.

3. At what stage did you decide to share your code/data? Is there anything you wish you had known or done sooner?

Our intention was to share the code and data from the very beginning. This allowed us to be proactive about things like designing our paper figures to be programmatically generated, rather than scrambling to make everything presentable at the last minute. However, we do wish we had thought a little bit earlier about data protection regulations and proper compliance as we ended up having to re-organize the brain dataset we released so as to avoid reconstruction of any de-anonymizing facial features.

4. How do you think we might encourage researchers in the MRI community to contribute more open-source code along with their research papers?

We could try to ensure that papers cite relevant open-source code or research whenever these are used, and it would also be useful to acknowledge and reward the release of such resources. MRM could follow the path that some other journals have taken, by creating a “data/resource” category for papers that specifically detail the release of a community resource — papers that count as a publication and thus provide an easy way for others to cite it. Another idea is to badge papers that have provided open-source code, thereby flagging them in the same way that open-access papers are.

Questions about the specific reproducible research habit

1. What practical advice do you have for people who would like to write code that creates reproducible figures to be shared along with their paper?

Screenshot of the website where the data for this paper was shared, and is available here.

If you choose to do this, and it does involve some upfront costs in terms of time, you should be really committed to it and have the figure exclusively programmatically generated. It saves so much time later on, when you’re editing the paper and making rapid changes, and also when you come back to the paper months later for revisions. You can and should also use a version control system to then track all the changes you make to the figures, which you cannot easily do with manual figure generation.

2. What questions did you ask yourselves while you were developing the code that would eventually be shared?

Will I understand what I’ve done here in 6 months’ time? Will people judge me for using single-letter variable names? Should I feel bad that this code relies on a commercial, non-free software platform (MATLAB)? Will anyone even look at this?

3. What considerations went into ensuring that this software can be used, maintained and/or improved in the long term?

I think the biggest consideration is that the code was written to be as clear and easy to follow as possible, making it easy for others to dive into, or for ourselves to get back into for future development. In contrast, code optimized for convenience at the time of writing, or for speed or density (in terms of LOC), can make future maintenance and improvement difficult. This applies particularly when it’s nobody’s full-time job to look at this specific codebase.

4. Are there any other reproducible research habits that you didn’t use for this paper but might be interested in trying in the future?

One thing that I’ve never done is make use of container tools like Docker or Singularity. While the software that we develop typically isn’t that complex, in future work it might be interesting to explore the use of containers to manage and package dependencies in a robust way.