'The Code Has Already Been Written'
theodp writes "John D. Cook points out there's a major divide between the way scientists and programmers view the software they write. Scientists see their software as a kind of exoskeleton, an extension of themselves. Programmers, on the other hand, see their software as something they will hand over to someone else, more like building a robot. To a scientist, the software soup's done when they get what they want out of it, while professional programmers give more thought to reproducibility, maintainability, and correctness. So what happens when the twain meet? 'The real tension,' says Cook, 'comes when a piece of research software is suddenly expected to be ready for production. The scientist will say 'the code has already been written' and can't imagine it would take much work, if any, to prepare the software for its new responsibilities. They don't understand how hard it is for an engineer to turn an exoskeleton into a self-sufficient robot.'"
You can often tell whether someone is "programming as a means to an end (of your own)" versus "programming to build a tool for someone else". For instance, I have experience in the financial industry. Quite a lot of traders see coding as a means to implement their cool new model. Looking at their code, you can often tell. It's as if everything was built to just exactly fulfil the requirement, with no thought to the fact that those requirements might change. But of course, they do change. So you get hacks and workarounds, and cut'n'paste cargo cult code. Kinda like what those Orks in Warhammer 40K might make. And of course the problem with spaghetti code is that if you write it, nobody can ever help you solve problems/improve it. It's the coding equivalent of painting yourself into a corner. There's loads of smart traders out there with an excel spreadsheet that actually is an extension of their personalities (In fact it's their Magnum Opus. Everywhere they go, they try to take this quirky little file with them). Every little hack is something only they can explain (comments, yeah right. Do your body parts have explanatory comments?) and only they can fix if wrong.
On the other hand, you sometimes hire a guy who is a programmer, but knows nothing about the domain. Very good with OO models and that kind, but you have to teach them everything about finance. What's a settlement date, what kinds of options exist, etc. You get what you ask for, because they know how to turn problems into object models, but you have to ask VERY carefully. And teach. Unfortunately, not everyone has time for that, and so you end up with something that still doesn't quite do what it's supposed to.
So you often end up gettings guys who understand the problems, but can't program, programming. And guys who can program, writing the wrong program.
It isn't stupid at all. Lots and lots of scientific software is written by grad students worried about the results, and don't care about the quality of the code itself. Their idea of what is "good code" has no relation to what a programmer who's worked in a production environment would call "good code". And invariably they decide to include libraries from other grad students at other institutions of equal or lesser value. And don't even get me started about documentation...
There's a nice concept devised by Ward Cunningham which captures this issue nicely: "Technical debt".
Failing to put in the effort that makes code maintainable during its construction incurs a notional "debt" which the software carries with it. Future developers working on the code "pay interest" on this debt in the form of time wasted on understanding and modifying the crappy, undocumented code, or on fixing bugs that wouldn't have been present if the code were better. Sometimes, those future developers may decide to spend time refactoring, building tests or documenting, and those cleanups pay down the "principal" on the "debt". After their cleanup work is done, future work has smaller interest payments (less effort for the same results).
Startups often deliberately decide to incur great amounts of technical debt on the theory that if the revenue starts flowing in they'll have the money to fund paying it down, but if they don't start getting some money the whole company will evaporate.
For scientific research, it's pretty clear that it also makes sense to incur lots of technical debt in most cases, because there's little expectation that the code will be used at all once the research is complete. Even when that's not the case, I think few scientists really know how to create maintainable software, because it normally is the case. I don't see a lot of scientists spending time reading about code craftsmanship, or test-driven design, or patterns and anti-patterns, or... lots of things that at least a sizable minority of full-time software engineers care a lot about.
I guess the bottom line, to me, is that this article is blindingly obvious, and exactly what I'd expect to see, based on rational analyses of the degree of technical debt it makes sense for different organizations to incur.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
That's a great post, of the kind that saves me a lot of typing. You covered the first-order considerations brilliantly.
What you missed was technical debt blindness, which has been around since forever. Books I read around the time of the Mythical Man Month talked a lot about maintenance syndrome: that the original development team would be regarded as brilliant for producing working functionality at tremendous speed (undocumented, with no error handling for edge cases), then the first maintenance team would all be fired as underachievers for adding hardly any new functionality in the first year or two.
Turns out it's hard to erect a machine shop over top of adobe mud brick construction without adding some reinforcement to the structure, which usually takes a lot longer than the entire original edifice.
You can instead take a wrecking ball to the first iteration, but this rarely works out as well as hoped. You end up with far more ambitious adobe mud construction built with a whole new generation of unproven tools. At some point you have to bite the bullet and ferment what you began with.
People hide debt blindness behind widely divergent construals of simplicity, where "simple" usually turns out to be a euphemism for any decision that sidesteps paying down debt in the short term.
For professional software engineers, there is one true simplicity to rule them all: generativity and compositionality. Can you build the next layer on top with any hope of having it work and able to support an ongoing stack? For us, it's a long term game of pass the baton. For everyone else (management, scientists) the endgame is to cash out, and take credit elsewhere (e.g. publication biography).
Unfortunately, a citation is not a formal linkage that the compiler either accepts or rejects. By the standards of compositionality, citation is payment in dubious coin. Citation is not falsifiable. Scientists still count their citations even when they come from papers that are full of crap, peer review notwithstanding. For a professional software engineer, when you start instantiating objects from one library inside an abstract expression template library, you come face to face with compositionality in a way that few scientists can even imagine, having weened at the outrage of being improperly cited.
Technical debt blindness on the part of management quickly turns a software engineering shop into a highly non-linear fiasco. We've all seen this.
Somehow this game works out better (for the participants) when played by bankers with leverage debt. But now it's my turn to pass the baton, since that deserves a whole lot more typing and I've done my bit.
"I'm working on commercializing NASA software and this couldn't be more true."
I don't think is just NASA. Wasn't an IBM engineer the one that said: X time for a valid prototype; 10*X for an in-house product; 100*X for a commercial packaged application?
Given that what a scientist does for her calculations is in the "valid prototype" stage (why the hell would she be interested in anything else?) you can do the math about how much it will cost to put it on a shelf, so to say.