Reproducible Research Insights with Hannah Scholten

0
922

By Mathieu Boudreau

The GitHub repository where the authors shared their source code.

This MRM Reproducible Research Insights interview is with Hannah Scholten, a researcher at the University Hospital of Würzburg in Germany. Her paper entitled “Fast measurement of the gradient system transfer function at 7 T” was chosen as a Highlights pick because it demonstrated exemplary reproducible research practices by sharing code in a well-formatted GitHub repository, shared scripts that reproduced figures, and also shared data.

To learn more, check out our recent MRM Highlights Q&A interview with Hannah Scholten and Herbert Köstler.

General questions

1. Why did you choose to share your code/data?

When I started my PhD, I myself had to implement a method that was described in a paper, where the authors did not share any code or data. I think it would have saved me a lot of time had they done so. I therefore decided I wanted to share my code and data to make it easier for other researchers to try out my method.

2. What is your lab or institutional policy on sharing research code and data? 

There is no general policy, but we are increasingly tending to share code and data if possible.

3. How do you think we might encourage researchers in the MRI community to contribute more open-source content along with their research papers?

I think we need to make people see the benefits of opening up their research. I often wish I had the code someone used in their paper, so I could try out my own ideas on it, or check something that I didn’t understand from just reading the paper. That said, I believe that I can only expect others to share their code if I also do so myself. Highlighting reproducible research practices, as you are doing by conducting these interviews, is certainly a step in the right direction.

4. Are there any other reproducible research habits that you didn’t use for this paper, but might be interested in trying in the future?

I was not able to share my sequence code for this paper, as it is written in the proprietary sequence coding framework from Siemens. In the future, I would also like to try using open sequence programming frameworks, such as Pulseq or gammaSTAR, which would allow me to also share the sequences I use.

Questions about the specific reproducible research habit

1. Your code repository is really well documented. What advice would you give to people preparing to share their first repository?

The Zenodo repository where the authors shared their data.

Thank you. I would recommend keeping the code as clean as possible from the start (a piece of advice I did not follow myself, by the way). I had to spend quite a bit of time cleaning up the code, i.e., removing useless pieces I had only commented out before, renaming variables meaningfully, or adding explanatory comments. The last two points, in particular, are not too hard to do right from the start, I think, and they can be helpful to you, too, even if you don’t choose to share your code. Also, if you plan to share data along with your code, choose your platform accordingly. I realized too late that the amount of data I wanted to share was too much for GitHub, so I had to upload the data separately.

2. What questions did you ask yourselves while you were developing the code that would eventually be shared?

When I named my variables and wrote my explanatory comments, I always tried to put myself in the position of someone who did not write this code, but nevertheless has some understanding of the subject. I also asked myself how to make the code executable on a different computer without having to change too much, for example the data paths. I thereby discovered some MATLAB commands that I hadn’t known before, for example how to change the working directory to the one where the currently open script is located.

3. How do you recommend that people use the project repository you shared?

I think the most interesting use case is for other people to try and run my code for calculating the GSTF (gradient system transfer function) with their own data. I have included a demo script and explained how the data have to be structured for that purpose. I was very happy when someone emailed me a few months ago with a question about the GSTF, and said my code had helped them to validate their own implementation. So, I also see code sharing as a small networking opportunity.