The Nebulous Mysteries of Scientific Coding

The Nebulous Mysteries of Scientific Coding

A reflection into the divide between scientists and software engineers. Warning: slightly philosophical.

There is a concept in meta-rationality called “nebulosity”. I will look up the definition later, but in my own words nebulosity means the following:

Nebulosity: a concept or problem is ill-defined. You cannot describe it perfectly. The boundaries of the concept are unclear.

Nebulosity drives rational people crazy, it’s worse than NP-hard. Rational people need well defined problems. Even if you can prove that the problem cannot be solved, at least the problem itself should be known. But is this always possible?

You may have a problem that you can barely describe to yourself. You may feel some shape of it, intuitively in your mind, but you cannot explain it perfectly. You notice that it is especially difficult to explain the problem to people unknown to the domain around the problem. There is only some vague shape you can gesture at. After wrestling with the problem for a long time, you may even begin to wonder whether there is a problem at all. This can be challenging if you built an identity or career around such a nebulous concept.

My nebulous problem

The problem I have been wrestling with the last years has such nebulosity. It started simple. Software development is slow in our organization and many organizations around us. One part of the problem, that many people complained about, is that the organizations contain many scientists who do not know how to develop software properly, and the professional software engineers do not understand the scientific domain. This causes lots of errors, both in the communication and in the software itself.

(This is a nebulous problem that can be generalized to any profession that involves people who focus on learning the domain, instead of building the products, say a business analyst, a financial quant or a mechatronics designer. Generalizing a nebulous problem makes it more nebulous and even harder to solve. More people will feel the shape of the problem, but it applies less to their exact case. This is a nebulous problem faced by high level thinkers in general, leading to proposed solutions that do not apply to the context. We have a potentially recursive nebulosity growth here.)

The scientist vs engineer problem seems easy enough to fix. Simply teach the scientists the good practices of software engineering. Give them the right tools for their domain. Then they write better code and create better scientific software, or at least they learn to communicate better with software engineers. But this is a nebulous problem. It turns out many of the scientists do not want to learn software engineering skills. It will take too much time away from their real ‘science’ work. They are also not rewarded for getting better at coding, they are rewarded for finding insights and writing articles. This simple problem just became some kind of complex resource allocation problem; how much time should scientists spend on software skills so that it pays off in their career, without becoming a non-scientist? Then there is the fact that all their scientists friends around them are not great coders either. Why should they change first? Is it a peer-pressure problem? Or maybe they believe they are actually amazing coders, never having met better coders in professional settings. “Look at how quickly I wrote these thousands of lines! I have been successfully working like this for 20 years! You cannot teach me anything.” Never mind that their colleagues cannot understand the code, nor reproduce any of the results. Maybe it’s even a status thing, unlike engineers the scientists may look down upon building and coding? There are so many possible root causes.

Virtually all these problems are interpersonal human problems, not hard science puzzles like we find in math or physics. Interpersonal problems are virtually always nebulous and multi-faceted. Many rational-oriented people shy away from interpersonal problems, thus enhancing the problem instead of tackling it head-on. This is another nebulous problem. You cannot see the cloud from the inside. Sure, it’s a little foggy around here, but that’s always been the case. Yet the frustration remains.

Or is there really a problem? It's good to question your own beliefs from time to time. Can we argue the problem away?

Scientists focus on understanding the universe, and occasionally build something for that reason. Engineers focus on building stuff, and use their understanding of the universe for that. Perhaps these activities should be kept separate? Or perhaps separation of these types of people happens naturally in large organizations and we should accept that fact of life? Or maybe we should allow a third group to arise, scientific coders, an elite group of people who help bridge the gap between the two cultures? Problems can become opportunities, right?

I have spoken with managers who believe there is no problem. They are quite satisfied with the two culture separation. They prefer the scientists to only communicate their findings to the software engineers via another medium than code. Maybe math-like pseudocode, written in ambiguous text documents, or haphazardly explained in a few meetings. Or perhaps the scientists share the incomprehensible throw-away example code. "Ambiguous", "incomprehensible" and "irreproducible" are keywords here, because the documents are never clear to the engineers, the example code is complex and doesn't reproduce. The software engineers are quickly confused and give up on understanding all together. The scientists become frustrated with the miscommunication and perceived apathy of the software engineers. The product development is delayed and the resulting code behaves incorrectly.

This doesn't seem like an acceptable situation for me. Yet the proponents of improving scientific software engineering also seem confused. (That includes me.) No one knows the exact solution that can finally resolve the matter effectively. After many years of wrestling with this cloudy issue myself, I have learned a great deal, but have not succeeded on pinpointing the exact problem. Most of my success has come from finding other people who also experience this nebulous problem. People who cannot accurately articulate the root causes either, yet feel the pain and want to solve the matter. I started calling them “scientific coders”, but even that is nebulous; finding the right words to name these people.

This cloudy-ness has become a growing part of my career and professional curiosity. With this blog I hope to clarify my thoughts, to better describe the shape of the problem, and identify possible solution directions. The uncertainty around the problem definition does not reduce my confidence in moving forward.

Nebulous conclusion

So, for now: breathe in, breathe out. Embrace this journey through the cloud. We can neither define nor solve the problem quickly. There is no shortcut that I know of.

If you are interested, here is the original definition of nebulosity that I referred to: metarationality.com/nebulosity. It describes nebulosity far more in-depth than I did. Actually the entire meta-rationality blog seems to revolve around nebulosity.

The concept of nebulosity is fascinating in itself. A big step in your personal development may come from the conscious choice to stare nebulosity in the face. To accept its existence. A lot of that personal development is dealing with uncertainty, because many people struggle with uncertainty in life. Once you see nebulosity, you cannot un-see it. You may notice that all concepts are a little nebulous. Nothing is perfectly defined.

Edsger W. Dijkstra, famous in many ways, seems to defy nebulosity by noting that "The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise." This is interesting on several levels. First of all, I slightly disagree since abstractions are leaky, so their precision will fail under the right circumstances. Secondly, you should read the context of his thoughts. This quote comes from a lengthy lecture where he discusses all the misconceptions around programming. While the quote itself is about code, I can already see the two culture problem emerging in his talk as he laments about scientists who do not appreciate computers and programming. Observe how great thinkers struggle with this nebulosity, even as they confidently announce precision in some intellectual areas.

Here we come to the end of my introspection. I questioned whether to publish this blog post here on The Scientific Coder or on my personal website Functional Noise. Since I've applied nebulosity to scientific coding, this blog seemed like the right place. I believe it can help any of you deal with the stress and difficulties of being stuck inside this nebulous problem. Known that you are not alone and that it is no shame to struggle within this field of work.