When should you turn your ugly research script into a reproducible package?

Blog status: I haven’t been spending much time on the blog. I admit I don’t have much to say at the moment, we’ll see how it goes in the future. This current article is mostly a journal-like question for myself, to reflect a bit, but maybe it helps others as well.

Lately I’ve been re-assigned to a research project, and instead of writing ‘production’ software I spent my days preparing experiments and analyzing data with ‘throw-away’ scripts. Plenty of times these scripts are little code snippets that I quickly write to investigate something and then discard them again. But often I do need to re-use code, or want to share the code with others, and then the question becomes whether I should improve the quality. So when is it the right time to turn a script into something more reproducible and easier to maintain?

When you are writing ‘real’ software this question doesn’t pop up. Your code is important and will be deployed to some automated system, so you add all the required engineering quality, like unit tests and documentation and everything else. But in a research setting there’s a lot more gray area. On the one extreme there’s that script that you are certain will be used only once. And on the other extreme there’s code that you are certain will be re-used by your future self and colleagues and should thus be turned into a high quality package. In between there is a whole spectrum of other cases. Maybe you have a script that you used a couple of times now. Or maybe a co-worker asked how to reproduce your data analysis and asks for the code you used for that. You are not very comfortable with this code, you didn’t really test it well, you didn’t put much effort into it, you are not proud of it, you are not certain it’s correct and it’s still rapidly evolving. What to do with this code?

Some people say you should immediately turn any code into a unit tested package. But I personally do not believe that is feasible. That first script is an experiment in finding out what code you even need to write. You don’t even know what results to expect yet. You have to just fiddling with code and data and plots until things start to make sense. There’s nothing wrong with that (unreproducible) exploration.

Other people never ever write packages, or unit tests, or documentation. They may refuse to share their code with others. Maybe out of fear they’ll be judged by others, or because it’s too much work to share the code, or any other reason. I disagree with this approach as well. If you are a scientist/researcher working with data and code, and you stumble upon a presentable insight, or find yourself repeating the same tasks, then part of the job is to make your code legible and reproducible.

(Nowadays I notice people are often insecure in sharing their code with me. They will do so with a lot comments like “I’m not a good coder”, “this is not very good code, please don’t expect too much”, “please don’t share this code with anyone else”. This happened less when I was a junior coder myself. I understand the sentiment. I should probably spend more time comforting people upfront that I’m not there to judge their code, I just want to understand how they did their analysis and figure out how to continue together. There’s some lesson in here for senior programmers reviewing scripts of researchers.)

Currently I have some code that’s in the gray area. I have a few scripts that I keep re-using. I already made the scripts fully reproducible (they are in a git repo and have a well defined environment). Every time I run the scripts on new data I have to update the functions to work on that new case, so they are becoming more and more generic and abstract. This is actually quite nice, those functions are becoming very useful to me! But at the same time, I’m also beginning to forget what each function is actually doing. And when I change one of these functions, I might actually break a previous analysis I performed (and I’m lazy, I don’t want to re-run all previous analysis to find out). Some code that I expected to be re-used by others I already placed inside a package. But the code that’s still left in the scripts is just so specific for my analysis. No one else wants to use that code yet. I really feel this barrier, this question of whether I want to put in the effort to write a package, with the risk that no one will use it while I will have to keep updating failing unit tests just to do my research.

I just want to say that I understand the desire to not write high quality code for your research. This is the source of the two culture problem between scientists and engineers.

At some point I will pass the threshold. I will keep re-doing the same analysis for months and just want quick reproducible functions. Or others will want to run my code and I don’t want to explain the functions to them all the time. Then the answer is clear, the time has come, I will turn my scripts into a package. (Even at this point some people will still refuse to turn their code into an easily installable, reproducible, well-documented package. Please don’t be like those people.)

In the mean time, it’s good to recognize this gray area between ‘ugly script’ and ‘high quality code’. It’s good to ask yourself often whether it’s time to turn that script into a package. If you find yourself asking this question then that’s not a sign of insecurity, it’s a sign you are growing as a professional scientific coder.