By Mathieu Boudreau
This MRM Highlights Reproducible Research Insights interview is with Joseph Woods and Divya Bolar, researchers at the University of California San Diego (UCSD). Their paper is entitled “VESPA ASL: VElocity and SPAtially Selective Arterial Spin Labeling”. It was chosen because of the exemplary reproducible research practices implemented by the authors, who shared their data, simulation code, and analysis code on Zenodo.
To learn more, check out our recent MRM Highlights Q&A interview with them.
1. Why did you choose to share your code/data?
JGW: I chose to share our code and data for three reasons. One was to allow readers of our paper to check the validity of the results we present, should they wish to do so. This transparency can only improve the trust people have in our results and it is sometimes an option I wish I had with other papers I read. The second reason is so that readers can run further analyses on the data. There may be a question someone has that we didn’t answer, or they may want to explore a novel question unrelated to the main aims of our paper. Freely sharing the data enables them to do this and could potentially lead to novel ideas which benefit us all. The third reason for sharing was in case anyone might find the code I wrote for the project useful. This thought is partly inspired by researchers like Brian Hargreaves who has made lots of useful code available from various projects, from which I have learnt a lot. There are very few reasons I can think of not to share code, whereas sharing data can be trickier, because of privacy concerns.
DSB: I concur with all Joe’s points – Dr. Hargreaves’ code in particular (both his Bloch simulator and his spiral code) has been so useful for my own work in the past. I would also emphasize the importance of sharing MR pulse sequences themselves; if the ultimate goal is to translate innovative MR technology to the clinic, it first needs to be rigorously vetted by a broad user base beyond just a few institutions.
2. What is your lab or institutional policy on sharing research code and data?
JGW: As far as I’m aware, the University of California San Diego encourages the sharing of code and data but doesn’t require it. We didn’t have any funder requirements for this work.
3. At what stage did you decide to share your code/data? Is there anything you wish you had known or done sooner?
JGW: From my perspective, I have always intended to share as much as possible, while respecting volunteer privacy and institutional policies. When I published my first paper four years ago, I shared the data and code to generate the figures in the paper and I’ve tried to share more each time, as well as improving the setup, quality and documentation of the code. It’s a continual learning curve to improve your own process, but taking the time to comment your code while you’re writing it definitely reduces the workload later! I think that starting off with the intention of releasing your code provides a very good incentive to do things carefully and reproducibly.
DSB: One thing I have learned is the importance of verifying that shared software runs as expected on as many different platforms as possible. Our goal is that code should run on a variety of different systems without modification. If the user has to modify scripts, MATLAB .m files, etc. they may be less enthusiastic to use the software. We have now successfully tested the code on several systems to ensure compatibility.
4. How do you think we might encourage researchers in the MRI community to contribute more open-source content along with their research papers?
JGW: Initiatives like MRM Highlights, which showcases people’s efforts to share content, will hopefully encourage others to do the same! Funder and journal requirements are helping, but it will probably need to become clearly beneficial or essential for researchers’ careers to convince many people to put in the effort.
DSB: Practical workshops at ISMRM and other meetings also provide incentive for content sharing; at a Member Initiated Training (MIT) session at this past ISMRM, Joe walked a room of a hundred people through code to generate velocity-selective ASL pulses (he really is a champion of open source!); not only did this benefit the participants, it also provided Joe with well-deserved exposure as an expert in the field.
Questions about the specific reproducible research habit
1. Why did you choose Zenodo to host your article’s code and data?
JGW: It was a case of habit more than anything. Zenodo is a prominent and easy to use repository that I came across four years ago when submitting my first paper and I’ve now uploaded four datasets to it. A DOI is provided for every submission on the website, making it easy for others to cite the repository if they use the code or data (I’m still waiting for my first citation from that though!). I was also reassured that it was linked to CERN and so would likely be around for a long time.
2. What considerations went into ensuring that the code and data you shared can be used, maintained and/or improved in the long term (on the user or the developer side)?
JGW: In terms of writing code, I generally try to comment it well and make it easy to understand. How successful I am at this is for others to judge! In terms of ensuring that it could be used by others, we did various tests to check that the analysis code could be successfully rerun on other computers with minimal setup efforts beyond installing the required external software. The main obstacle to using our code is that it uses MATLAB, which not everyone has access to.
3. What practical advice do you have for people who would like to write code that creates reproducible figures to be shared along with their paper?
JGW: Make the code as easy to understand as possible and get other people to test it (and follow good programming practices)!
4. Are there any other reproducible research habits that you didn’t use for this paper but might be interested in trying in the future?
JGW: I have definitely started to think about how I can make my code more widely accessible in the future and this might include switching to an alternative programming language like Python and using a platform like Docker to streamline the setup process. It would be terrific if, in the future, we could open any paper online and simply click a link to take us to the data and analysis/simulation code which could be easily rerun or reused.