By Mathieu Boudreau
The May 2021 MRM Highlights Reproducible Research Insights interview is with William T. Clarke and Saad Jbabdi, researchers at Oxford University. Their paper is entitled “FSL‐MRS: An end‐to‐end spectroscopy analysis package”, and it was chosen because their software and algorithms were shared as open source, and integrated as a package for the FSL software. To learn more about this team and their research, check out our recent interview with them.
To discuss this blog post, please visit our Discourse forum.
General questions
1. Why did you choose to make your software open source?
FSL-MRS was always intended to be part of FSL, for which the code has been available for free academic use and distribution ever since its birth, over 20 years ago. We can never see any good reason for *not* sharing code.
Making an academic tool open source means that others can get involved easily and build upon what we have done so far. With regard to FSL-MRS specifically, recently there have been a large number of keen and highly capable students using MRS in our lab. By making the code available and understandable to them, they are able to adapt it to their own specific uses. We hope that this can continue both internally and externally.
2. What is your lab or institutional policy on code/data sharing?
As we are funded by Wellcome, all journal publications from our Centre (Wellcome Centre for Integrative Neuroimaging (WIN)) must be open access. There is currently no requirement from Wellcome that code or data generated by WIN must be openly shared, although this may change in the future. WIN is, however, really committed to sharing all of our research outputs as openly as possible, and is investing a great deal in developing the community, training and infrastructure to support us in doing so. This will be great for meeting publisher or funder requirements, but it will also make the life of the individual researcher much easier. Figuring out where and how to share code and data is hard (what servers to use?, what format?, what license?, subject consent?, anonymization?). Over the next year, WIN will be setting up public sites for sharing all the stages of our research outputs (MRI scanning protocols, imaging data, and experimental tasks), which will sit alongside our public repositories for code (for example the institutional GitLab site where FSL-MRS is primarily hosted. All this is detailed at https://www.win.ox.ac.uk/open-win.
Within the wider institutional context of the University of Oxford, we are able to apply open-source licenses to code and software products as long as we have approval. In some instances, the University works to also leverage the commercial value of the tools we build, which is crucial to generating a sustainable income stream to support the development of tools such as FSL and support free use for non-commercial users.
3. What questions did you ask yourselves when preparing your software to be shared as open source?
Whilst FSL-MRS presents some (hopefully!) user-friendly command line tools, we wanted the underlying code to be usable and adaptable by others. Different modules of the code do different things, e.g., processing, fitting, etc. are all separate modules and sub-modules. These modules can be combined together or with other tools to create pipelines. Next, we planned to create good documentation and examples from the outset. This is extremely important and often lacking in many open-source tools, particularly bits of code published with papers.
The decision was easy when it came to where we would host the code for sharing, as it was always intended to be distributed with the main FSL package. Being part of FSL, there are also existing resources we can benefit from to support our users in accessing and adapting our code: 1) a very active mailing list where users can post questions on the software or the methods, 2) the FSL training course which runs annually (~160 attendees each year for the past 20+ years). For both of these resources we thought it was important that FSL-MRS should be well integrated with the existing materials so that users had an easy route to access training and ask questions. FSL-MRS is already getting a few queries on the mailing list and we hope to have content on using FSL-MRS as part of the FSL course material in the near future.
One thing that I wish we had got sorted sooner was the implementation of code testing. FSL-MRS was a learning exercise in Python for both of us, and as such, there wasn’t much code testing at the start. So, our message is, sort this at an early stage!
4. How do you think we might encourage researchers in the MRI community to contribute more open-source code along with their research papers?
We think there are two things that need to be achieved to do this: one practical, one behavioral.
The first practical step is that we need better tools and resources for the sharing of data, use cases, documentation and examples. We also need to ease the code packaging process, i.e., everything that goes alongside the open-source code that makes it usable. The problem of sharing the source code is almost solved; institutional GitLab sites or GitHub fulfill most requirements. However, finding a place to host large data, ensuring it is anonymized, making working examples, and hosting it all sensibly somewhere online, remains difficult. For Saad, the lack of documentation in many open-source projects remains a real irritation. There are tools out there which can help with the onerous task of creating documentation (Read the Docs and autodocstring are two that Will uses.)
The second part is that researchers need to be convinced of the benefits to themselves of sharing their code in an open-source way. I don’t think this can come from imposition of open-source requirements from funding bodies, journals or institutions. By going through the open-source process, researchers will find that it forms a fantastic archive of their own work, makes sharing the tools they have created with collaborators really simple, and makes extending or modifying their own tools much faster and easier.
Questions about the specific reproducible research habit
1. What advice do you have for people who would like to contribute to large software projects like FSL?
Do seek advice from those developers already working on the tool. We have been helped hugely by the expert developers already working on the FSL project. They can guide you on all aspects of coding, packaging, documentation and testing. There might also be hoops you need to jump through when contributing.
Also take advantage of the development and distribution tools that are already in place. Shamelessly copy the practices and tools of good projects — that’s the whole point of open-source practices! But make sure full credit is given to those whose work you build on.
Talk to the users. Even in the early stages of development, it is important to get input from end users. This can reveal problems with the software design that a developer might struggle to anticipate without talking to the end user. We are happy with some of our decisions, but some bits of FSL-MRS (basis spectra simulation) are still too hard to use.
2. Can you share some resources to help the audience get started in creating an add-on for FSL?
FSL-MRS is written in Python, and whilst there are many great resources on the net for learning Python in general, the resources from our own internal ‘Pytreat’ organized by Saad are available online. These tutorials have a particular focus on imaging and neuroimaging.
Perhaps the easiest bit of FSL to extend is FSLeyes, the FSL viewer. It has powerful capabilities for incorporating sophisticated plugins. The Spinal Cord Toolbox developed by Julien Cohen-Adad’s team in Montreal, for example, has an FSLeyes plugin for looking at spinal cord imaging. FSL-MRS also has a plugin for MRS data under development.
We’d love to help anyone who wants to extend FSL-MRS. Perhaps the area most ripe for development is post-fitting analysis. Currently the toolbox stops quite abruptly after generating the metabolite concentrations. We try to provide users with the full information from the MCMC fits, but we’d be happy to make changes to allow for more inventive ideas.
3. What considerations went into ensuring that this software can be used, maintained and/or improved in the long term (e.g., on the user or developer side)?
This project has been a test case for the new FSL distribution and release infrastructure. The plan is to move FSL to a continuous release paradigm, one where each FSL tool can quickly iterate developments or fixes, and release new versions. This is being achieved by using an automatic Conda packaging pipeline integrated into the GitLab continuous integration (CI) infrastructure. When developers wish to release a new version of the software, a tag in the Git repository will cause the CI system to automatically build and distribute the updated software.
What all this means is that if we have sufficient automatic code testing, releasing a new version of FSL-MRS and making incremental improvements becomes easy. That means that either of us, or one of the FSL developers, or even someone new to the project can keep FSL-MRS maintained and up-to-date.
There are some interesting scientific opportunities as well. We’ve had discussions with the Osprey developers (Georg and Helge) about automatic benchmarking of the fitting algorithm for new releases. (Osprey is another great new open-source MRS analysis tool box.) Using the ISMRM MRS Study Group Fitting Challenge data as validation in the paper was part of this intention.
4. Are there any other reproducible research habits that you haven’t used for this paper but might be interested in trying in the future?
Even though the FSL-MRS code is definitely open source, the FSL-MRS paper certainly doesn’t tick all of the reproducible boxes. We had difficulty in releasing some of the data that wasn’t ours to share, and the validation work isn’t hosted anywhere accessible (even though some of it is now integrated into testing).
Using the FSL-MRS paper as a learning process, Will has just published a new paper on the uncertainty of low-rank denoising methods in spectroscopy (https://www.biorxiv.org/content/10.1101/2021.05.15.444311v1). In this paper all the code and data are made accessible. This includes the core denoising code, released as a Python package, and the figure generation code, released as a Git repository. It would have been incredibly difficult to do this without the experience of making FSL-MRS open source. And going through the process made it very easy to rerun the analysis when some embarrassing mistakes were found. We encourage everyone to just try and take just the first steps in making their work reproducible. You don’t need to do it all at once!