Reproducible Research Insights with Sang Chung, Yueh Lee, and Jennifer Goralski

0
1752

By Mathieu Boudreau

The Carolina Digital Repository page where the authors shared their data and code.

The February 2021 MRM Highlights Reproducible Research Insights interview is with Sang Hun Chung, Yueh Lee, and Jennifer Goralski, researchers at the University of North Carolina in Chapel Hill, North Carolina. Their paper is entitled “Comparison of single breath hyperpolarized 129Xe MRI with dynamic 19F MRI in cystic fibrosis lung disease” and it was chosen as this month’s Reproducible Research pick because they shared code and data that reproduce several of their figures. To learn more about this team and their research, check out our recent interview with them.

To discuss this blog post, please visit our Discourse forum.

General questions

1. Why did you choose to share your code/data?

We actually hadn’t really considered it until it came up during the submission process for MRM. Once the recommendation was raised with the team, we couldn’t see any reason not to share the code. Especially early in the development process, we thought it would be a great way to get feedback and make connections with other researchers studying similar processes.

2. What is your lab or institutional policy on code/data sharing?

This was the first collaborative project by our two labs, so while we didn’t have a formal policy in place, Dr. Branca (a co-author) had participated in data sharing in the past and was able to help us understand the process. UNC offers its employees access to the Carolina Digital Repository, which allows researchers to upload work so that it becomes accessible, indexed, and searchable. 

3. At what stage did you decide to share your code/data? Is there anything you wish you had known or done sooner?

Sang Chung, Yueh Lee, and Jennifer Goralski

As just mentioned, we didn’t make this decision until late on, when we reached the submission stage. Now that we have participated in the process, we are motivated to do the same for future articles. Once we had decided to share our code, we had to clean it up to make it readable to people outside the group. Moving forward, we plan to write code with more explanatory comments, better formatting, with line indentations for example, self-explanatory variable names, logically grouped sub-sections, and so on.  

4. How do you think we might encourage researchers in the MRI community to contribute more open-source code along with their research papers?

We think it should become an expectation rather than a suggestion, as long as all data is de-identified. Researchers should have to “opt-out” of data sharing and be required to provide a reasonable explanation as to why the data/code cannot be shared in its current form. 

Questions about the specific reproducible research habit

1. Why did you choose MATLAB as the software language for your project? Did you consider any other languages?

Our team had prior experience with MATLAB. The combination of good documentation and simple code structure makes it easy for potential new lab members to understand the code.

2. How did you decide on the license you chose for your shared code/data (CC0 1.0 Universal License)?

The example data we provided was de-identified and the code uses functions provided by MATLAB, so we saw no reason to put limits on the license.

3. What questions did you ask yourself while preparing your code for sharing?

The most important thing for us was to create code that would be as automated as possible. We wanted something simple, efficient, and easy to understand. Lastly, we added comments to help the reader understand some key lines. Our goal in sharing was to actually provide value to the reader. 

4. Are there any other reproducible research habits that you haven’t used for this paper but might be interested in trying in the future?

Since sharing media is easy these days, we think it would be interesting to add short explanatory videos. We could quickly demonstrate our test set-up and provide a summary of the research. This way we could reach a wider audience and better communicate ideas that might be lost in text.