'The Code Has Already Been Written'
theodp writes "John D. Cook points out there's a major divide between the way scientists and programmers view the software they write. Scientists see their software as a kind of exoskeleton, an extension of themselves. Programmers, on the other hand, see their software as something they will hand over to someone else, more like building a robot. To a scientist, the software soup's done when they get what they want out of it, while professional programmers give more thought to reproducibility, maintainability, and correctness. So what happens when the twain meet? 'The real tension,' says Cook, 'comes when a piece of research software is suddenly expected to be ready for production. The scientist will say 'the code has already been written' and can't imagine it would take much work, if any, to prepare the software for its new responsibilities. They don't understand how hard it is for an engineer to turn an exoskeleton into a self-sufficient robot.'"
yes I can, at least 5 times more than it took to wright the first time. Thats why I will use it myself to do research, not try to make it usable to others, I do not have that long between needing to publish. It is just not worth doing and with the joys of specialization it probably would not be worth it for someone else anyway.
I'm working on commercializing NASA software and this couldn't be more true. When talking to the inventor they inevitably say "Oh yea the software is done, anyone can write code for this, should be easy to sell." even if it's coded in Fortran,, has no Gui or documentation of any sort. It definitely is functional but hardly has any of the features consumers demand.
The abstract in Slashdot is pretty much the whole text in the linked post. The other 3 paragraphs repeat the same idea.
Most managers think the prototype is the product.
To a scientist, their software is simply a tool, a means to an end. Their results and discoveries are what they really care about. When it comes to reproducing scientific results for verification, it is actually advantageous that another group not use existing software. Another research group using the same faulty software, with the same hidden bugs, will likely come to the same incorrect result.
Productization of software is a completely different exercise. You have to make your software work for a larger crowd on a plethora of devices. You actually have to consider how your software fits into the larger product lifecycle. The key difference here is that you have customers that you need to keep happy.
Kan jeg få en pils, vær så snill?
If I'm getting this right, scientists view software as nothing more than a specialized calculator. They don't want a program that spoon-feeds them information let alone set the premise for how data should be calculated and organized.
Programmers on the other hand feel the opposite and think their users should do nothing more than input data and record the results. The research being already incorporated into the software and all that.
Life is not for the lazy.
This assumes people are very clearly an engineer/programmer OR a scientist. But I would consider most software engineers to be computer scientists as well. Its a fairly nonsense distinction. The analogy to spiderman and doc ock is fun, but ultimately metaphor don't prove anything.
"Programmers need to understand that sometimes a program really only needs to run once, on one set of input, with expert supervision. Scientists need to understand that prototype code may need a complete rewrite before it can be used in production."
This is just an extreme generalization, to the point of stereotyping.
GCS/MU/P d- s:- a-- C++++$ UL++ P+ L++ E+ W++ N o K- w--- O M+ V- PS+++ PE Y+ PGP t+ 5- X R++ tv+ b++ DI++ D++ G+ e++ h-
When I learned to program, my programs would only run for me. Every program would only do something useful if the user (me) adhered to one "obvious" way of interacting with the program. Then I observed that other people would not understand the obvious way of using my programs. I believe that over time, experience teaches programmers to avoid this problem - first by retrofitting existing programmers, then by looking for ways of writing less personal programs from the start. Eventually this leads to an appreciation for design guidelines, standards and generally programming in the large concepts.
Most "software engineers" in the comercial world are exactly the same. As soon as its written and minimally functional its off to production. The idea that software in the comercial environment is engineered is a pipe dream.
Posting anonymously for a reason... The software we use to manage data flows from a certain big experiment in Switzerland (Globus, Bestman) are excruciatingly bad. Software gets promoted from version N to N+1, not because they have made major bug fixes or functionality upgrades, but because it's grant-writing time again and they need to show progress.
Software written by scientists often barely works, and heaps misery upon misery upon the hapless admins who have to maintain the service. Exposure to Grid software, in particular, should be regulated by OSHA.
I work with Monte Carlo code and statistical analysis software. I use CERN's ROOT package for the stats analysis, CERN's GEANT4 for the MC code, and *nix scripting when I need to handle multiple files. Every single piece of code I write is written for a purpose. That purpose is generally to generate data and then analyze it. The only other people who are going to see it? Maybe my supervisor, and, if I'm just in on a contract, maybe the guy who has to work on my code later. But to be blunt, that doesn't matter. All that matters is that I know what's going on.
That being said, sometimes I write software for my own personal use. There, I tend to write more robust code, trying to follow various programming standards. Because I figure, if I write something for myself that turns out to be fairly useful, someone might want to use it, or adapt it. But professionally, all my code needs to do is get out that table or prepare that figure. Is it sloppy? Yes. Does it get the job done? Also yes. Fortunately, not only is my field esoteric, it's also government work, so it's practically a guarantee that my code will never have commercial release.
Cynical Idealist
You can often tell whether someone is "programming as a means to an end (of your own)" versus "programming to build a tool for someone else". For instance, I have experience in the financial industry. Quite a lot of traders see coding as a means to implement their cool new model. Looking at their code, you can often tell. It's as if everything was built to just exactly fulfil the requirement, with no thought to the fact that those requirements might change. But of course, they do change. So you get hacks and workarounds, and cut'n'paste cargo cult code. Kinda like what those Orks in Warhammer 40K might make. And of course the problem with spaghetti code is that if you write it, nobody can ever help you solve problems/improve it. It's the coding equivalent of painting yourself into a corner. There's loads of smart traders out there with an excel spreadsheet that actually is an extension of their personalities (In fact it's their Magnum Opus. Everywhere they go, they try to take this quirky little file with them). Every little hack is something only they can explain (comments, yeah right. Do your body parts have explanatory comments?) and only they can fix if wrong.
On the other hand, you sometimes hire a guy who is a programmer, but knows nothing about the domain. Very good with OO models and that kind, but you have to teach them everything about finance. What's a settlement date, what kinds of options exist, etc. You get what you ask for, because they know how to turn problems into object models, but you have to ask VERY carefully. And teach. Unfortunately, not everyone has time for that, and so you end up with something that still doesn't quite do what it's supposed to.
So you often end up gettings guys who understand the problems, but can't program, programming. And guys who can program, writing the wrong program.
This is hardly unexpected. The code needed to process data from science experiments can be years in the making by one or few persons sculpting it to do the job they need done. It might be a bit much to say that it's throw-away code, but once the paper is out the door it probably won't see much use again.
All of this combined with the fact that the coders are scientists and thus aren't concerned with UI issues and whatnot make it so it may take a lot of manual intervention at various steps to use the software, but in the end the science gets done... and you make a neat gun for the people who are still alive </portalroll>.
________
Entranced by anime since late summer 2001 and loving it ^_^
And I gotta say - the linked blog post makes me think the author just got in an argument with his scientist boss, and he lost.
#DeleteChrome
As a university researcher in applied game development I pretty much work on abstracting and generalizing *finished* software.
I usually do this: I spend between six months and a year building a game according to some technique, framework or new language I am researching. The game is then finished, published and even sold. Then a paper is written describing the technique and its inpact. Lather, rinse, repeat.
This is just anecdotical experience, but in this day and age of shrinking research budgets it is not uncommon to find scientists who also package and sell their research.
So this whole "programmers are cool, they develop finished stuff while the other a-hole scientists quit halfway" is just a stupid generalization based on a superficial stereotype of academia. Also, THIS IS NOT NEWS, and even if it were it wouldn't matter.
My book: Friendly F#, fun with game development and XNA; my game: Galaxy Wars by VSTeam; my gamedev language: Casanova.
I recently worked with two software companies on the same project. One wrote software for any old client, the other was a specialised scientific software house.
The scientific software house wrote some really appalling code, but boy, did they care that they understood what the software was supposed to do, and that it delivered the right results.
The generic software house wrote clean, maintainable code using industry best practices... and wrote code that was almost useless, as they gave no thought to what it really needed to do. They billed us for a huge number of changes just to get it to functional (as opposed to well written) status.
The days of long-lived software are pretty much gone. There are a handful of companies that still maintain the programs they've written a long time ago, but most programs written today are written quickly and dirtily, to spring up one day and fall into oblivion the next. "Apps" are little more than short fads that come and go, easy to implement due to having little functionality, and just as easy to discard for the next one.
Scientists write software to do science. Programmers write software because they've spent their lives learning how to perfect the art..
It is indeed true that code written by scientists often needs polish. But the other truth is that it's often necessary for a scientist to write the code. Too many programmers think that all they have to do is write code to spec. But when you're writing code that supports the needs of a subject that takes a decade to master, only someone with that mastery can understand what the specs mean.
Often, the best results emerge when a scientist writes the code, and a programmer reviews and polishes. But that can cause a lot of friction: scientists don't like criticism, and programmers would rather program than review and polish. It's a challenge for project management.
This has nothing to do specifically with scientists, this is more about the difference between code you write for your own use versus code you write for others to use. Scientists aren't the only people who write code for their own use!
Conversely, scientists often do write code that needs to be shared, sometimes among large groups. I used to work in the field of experimental high energy physics, which typically have collaborations of hundreds or even thousands of people. Some of the software I worked on was to be used by the wider collaboration, and there were many coding practices we were expected to adhere to, in order to ensure the code worked properly on all the different systems in use. (We supported about 6 different OS's: VMS and several flavors of Unix.) Other software was written for my own personal analysis, and it wasn't meant to be shared, although it was expected that I at least run some consistency checks to ensure the code was giving reasonable results.
On the other extreme are general purpose tools, written by scientists for use by scientists on many different collaborations, such as CERNLIB, Root, Minuit, GEANT, etc. And lets not forget, the World Wide Web was created at a high energy physics lab (CERN) to facilitate online collaboration! It seems to have proven robust enough for a somewhat wider use!
If I can be modded down for being a troll, can I be modded up for being an orc, or a balrog?
The issues surrounding transitioning research S/W written by scientists into honest-to-goodness production systems are ones I'm very familiar with.
At my company, a lot of energy has been put into bridging the gap over the years with varying results. I believe that the root cause of the problem is that research S/W is not an end-product; typically for scientists the end-product is a research paper, white paper, proposal paper, etc., for which the S/W is only a tool for getting to the end-product. As soon as the experimental (or proof-of-concept) S/W returns the desired results, the software is considered "done".
In contrast, production S/W is often THE end-product for developers, so a lot more attention is given to robustness, re-usability, etc. All the standard thinking that you want to go into your production S/W.
One big issue for us is that the research S/W is almost always written in Matlab, while the production code is written in C++ and Java. The single largest source of bugs in our systems is porting S/W from Matlab to C++ or Java. (As an aside, please let's not talk about the Matlab 'compiler', nor Octave. -- we've already tried them both, and they're both performance hogs and also create SCM and CM nightmares).
We experimented with requiring that the research S/W be written in C++, but it was a disaster. The scientists couldn't get anything done, and the code was just awful. So, back to Matlab it was.
And, my experience is that people who I have a great deal of respect for, who I consider brilliant in their fields, holding PhD's, etc., have produced the crappiest Matlab code I've ever had the sorrow to read. My favorite instance was the use of these local variable names within a single function of research S/W that was considered "done" (true story):
i
ii
iii
iiii
iiiii
iiiiii
And, of course, little documentation as to the mechanics of the code. And believe me, it gets worse from there. Bear in mind that the code does indeed work for its particular purpose, and may well be ground-breaking in that particular research domain. But "done"? Ready for production? Not without a major porting effort (which is really a re-writing effort). The most mysterious thing to me, though, is that the scientists, for all their intellectual firepower, don't understand that it's a problem.
The solution we've converged on is to require our bizdev to be responsible for funding efforts to rewrite the research code and get it integrated into the product baseline. And, the bizdev types can't proclaim a particular capability "done" (eg., sell it to customers) until they've funded and executed those efforts. It took years of education to get to this point, but things are moving along much better then before.
In the course of every project, it will become necessary to shoot the scientists and begin production.
Most scientists (e.g. physicists, chemists, mathematicians, geo-*) solve their problems with formulas. Then they code these formulas in a coding language which is most likely C, Fortran, Algol68 (not really) or Mathlab. While programmers often also only code, software engineers try to design software and have to incorporate different aspects. This is even true when writing software for the sciences. However, the same apply to all the other fields we write software for.
There are programmers and there are Software Engineers.
The two things are different, and people who don't know any better equate them.
There's nothing wrong with being a programmer at all, but programming is a subset of Software Engineering.
It's akin to the difference, imho, between a construction worker and an architect. One can be a hack or a craftsmen, but tends to have a smaller overall picture of the where/what/when/why behind decisions that often seem unimportant or superfluous. The other can be incompetent or a good engineer and tends to have (or is supposed) the background of understanding to know why things are or should be done a certain way - at the very least being able to understand the impact of short term decisions on the long term.
There are, of course, exceptions to everything.
Loading...
In software development things become more and more planned and predicted and tested over the last decades. Something which was more or less an art is becoming a set of established techniques. So software development becomes more and more and engineering task. On the other hand. Software developers and designers are always trying to use new stuff because the problems of today cannot be solved with the technology from 10 years ago.
I was on an software engineering workshop on modeling and domain specific languages. After my presentation I said that I think we are not engineers. As of the above situation. However, we use engineering methods/techniques to handle complexity and to get the information we need to model, write and deploy software.
Punching down the logic was the easy and fun part. Exception handling is the main challenge. Then come middleware issues.
It's easy to disassociate yourself and to become a patronising git and to claim to have done the hard work. Making software maintainable, supportable and well performing is never a matter of course.
Generally speaking it is hard for a non-programmer to imagine a programmer's job. The tedious thing is that scientists will always have significant influence and may not appreciate the hard programming work. This is even more so with clueless manager.
I hadn't the slightest objection to his spending his time planning massacres for the bourgeoisie... (P.G. Wodehouse)
I always like the Numerical recipes quote: Scientists solve next years problem on last years computer. Computer programmers solve last years problem on next years computer.
I've lived on both sides of this divide but mainly on the scientific side. I become apoplectic with software engineers who just don't vest themselves in the science. The perpetually want a set of requirements. And they get upset if a new requirement is added later. I see software as a way to explore a space. Model it. Determine what more modeling is needed. You are constantly trying to do something that usually is beyond what is computationally possible so you have to figure out what approximation is going to work. What has to be done at full scale and what can be done at lower resolution. Mock up stuff.
The engineers who don't see it as a process just are impediments. Scientists want lots of simple things fast then see what is working and add new simple extensions. They don't want to wait 4 months for some delivered code based on specs it took 2 months to write.
Hence scientist tend to write their own code.
Some drink at the fountain of knowledge. Others just gargle.
Yes, I have seen some bad code come from scientists and engineers. I have also seen simple but ugly code, unnecessarily reengineered by OO design zealots and broken and ruined with complexity. It depends on who did the writing and the rewriting. The best policy is for software engineers to give scientists simple interfaces to write to and then stay out of their way.
an ill wind that blows no good
"The Code Has Already Been Written" sound like Scientists don't believe in evolutionary while Programmers do. Really?
This is also true in medical devices and in biologics (medicines) - i.e., with respect to scientific and engineering knowledge, not just computer programs.
As an engineer and manager in product development and tech transfer roles, I'm continually amazed at how little R&D biologists know or care about tech transfer, manufacturing, marketing, logistics, etc.
I started in R&D, and was guilty of that once, but still ...
the old INTJ vs INTP battleground...
"All programmers are optimists. Perhaps this modern sorcery especially attracts those who believe in happy endings and fairy godmothers... But however the selection process works, the result is indisputable: 'This time it will surely run,' or 'I just found the last bug.' So the first false assumption that underlies the scheduling of systems programming is that all will go well, i.e., that each task will take only as long as it 'ought' to take. The pervasiveness of optimism among programmers deserves more than a flip analysis..." [Fred Brooks, Mythical Man-Month, p. 14-15]
So really this observation is really just a slightly-different flavor of something that is characteristic of all programmers.
I had one job where I was assigned a particular game feature. So I took a week and I did it. My technical manager took it and played with it and came back to me completely shocked: I couldn't find any bugs! he says. (Also: I was continually being chastised for long schedules, and having my estimates arbitrarily cut in half by the manager.) That job convinced me I had to get out of the industry.
We know where leadership by an anti-intellectual "strongman" who scapegoats minorities and likes boisterous rallies goes
The site Software Carpentry aims to teach scientists and engineers key programming tools and approaches to write better code. There are many, many resources to help non-programmers write better code. The fellow who runs it, Greg Wilson, has done yeoman work in this regard. I was so impressed that I invited him to an academic conference and we were really pleased.
My entry into this problem is "Where's the Real Bottleneck in Scientific Computing?" (from the American Scientist). It says everything that the article here does and much more. Highly recommended.
In all the (big-pharma) shops i worked at, i'd write and test the command-line number cruncher inside (until my boss could get a paper or two out of it) then hand it to "two guys" that would slap on a stunningly restrictive (in terms of functionality) GUI (itself a third party tool set based on Qt) and it'd sell just fine ...no sweat [shrug]
I write all my code for someone else: me a year from now.
I'm proud to say I can pick up something I wrote ten years ago and be quickly up to speed. The basic principle is this: something that is obvious now will not be so a month from now. Writing robust software now will save a whole pile of misery later. It isn't just a case of having to re-write a piece of software; when it comes time to write a paper, you have to go back through your software and figure out precisely what you have done.
The other thing is that, these days, you may face an FOI request. You really don't want crap programming exposed for public ridicule.
This issue is actually getting less bad in some cases. In my area of biology, there is a strong trend towards journals requiring that analysis code be archived along with data, as a condition for a paper being accepted. Also, grant proposals are looked upon more favorably when they promise to develop, document, release, maintain, and support software that implements their new ideas. These requirements make me plan more for future use and clean up my code so that it's not an embarrassment when others see it. I think this is all contributing to a healthier scientific and software-ific environment.
"DIY fixers do a hack job of wiring their routers in their home basements to their computers in second floor bedroom. They drill a hole and take the cable clearly marked "indoor use only" outside the home hanging in a lazy lopsided catenary curve up to the bedroom window, take it through the window into the house. The window sash does not close properly and allows bugs to get inside.
Professional electricians on the other hand use flexible drills to make nice access holes, wire the cables, patch up the dry wall, clean up all the debris, and charge you 90$ an hour. "
P.S: Exercise to the reader: Make it a car analogy.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
I've experienced almost the same thing, but with engineers instead of scientists. I attributed the engineer's disdain for software quality to a different motivation. Namely, the disparity in status and pay between engineers and programmers. I believe that they felt that spending extra time on making the software readable, maintainable and all those other ables was beneath them.
The proof of the pudding came when I happened to hire an inexperienced guy as a programmer. He was so smart that he soon learned not only to program but even to be a better engineer than the real engineers. The company culture turned against him viciously. They ran him out of town on a rail. Fortunately for him, he move to Santa Clara, got a job with a software company not involved with engineering, and soon earned 600% more than those engineers.
Is that little story off topic? I think not. I believe that engineers and probably scientists feel very threatened when they write software in the presence of lower paid programmers who might be able to do it better. Their defense is to downplay all aspects of the software other than it's ability to calculate the right answer in thier own hands.
The cure? In scientific or engineering cultures, give non-science non-engineer talented tech people as much pay and status. In hospitals some nurses should earn more than doctors. When will that happen? Not in my lifetime.
That's a great post, of the kind that saves me a lot of typing. You covered the first-order considerations brilliantly.
What you missed was technical debt blindness, which has been around since forever. Books I read around the time of the Mythical Man Month talked a lot about maintenance syndrome: that the original development team would be regarded as brilliant for producing working functionality at tremendous speed (undocumented, with no error handling for edge cases), then the first maintenance team would all be fired as underachievers for adding hardly any new functionality in the first year or two.
Turns out it's hard to erect a machine shop over top of adobe mud brick construction without adding some reinforcement to the structure, which usually takes a lot longer than the entire original edifice.
You can instead take a wrecking ball to the first iteration, but this rarely works out as well as hoped. You end up with far more ambitious adobe mud construction built with a whole new generation of unproven tools. At some point you have to bite the bullet and ferment what you began with.
People hide debt blindness behind widely divergent construals of simplicity, where "simple" usually turns out to be a euphemism for any decision that sidesteps paying down debt in the short term.
For professional software engineers, there is one true simplicity to rule them all: generativity and compositionality. Can you build the next layer on top with any hope of having it work and able to support an ongoing stack? For us, it's a long term game of pass the baton. For everyone else (management, scientists) the endgame is to cash out, and take credit elsewhere (e.g. publication biography).
Unfortunately, a citation is not a formal linkage that the compiler either accepts or rejects. By the standards of compositionality, citation is payment in dubious coin. Citation is not falsifiable. Scientists still count their citations even when they come from papers that are full of crap, peer review notwithstanding. For a professional software engineer, when you start instantiating objects from one library inside an abstract expression template library, you come face to face with compositionality in a way that few scientists can even imagine, having weened at the outrage of being improperly cited.
Technical debt blindness on the part of management quickly turns a software engineering shop into a highly non-linear fiasco. We've all seen this.
Somehow this game works out better (for the participants) when played by bankers with leverage debt. But now it's my turn to pass the baton, since that deserves a whole lot more typing and I've done my bit.
The real distinction has nothing to do with "Scientist" vs. "Programmer". It is actually "Researcher" vs. "Engineer"/"Maintainer". When I'm in complete investigative mode (aka researcher) I don't care much about the code quality, so long as I can get it to do what needs to happen to collect information related to the problem I'm investigating. Not "the answer" because if the answer was already known and understood, I wouldn't have to research it. When I'm developing product, it _has_ to be maintainable by someone who comes after me. And also has to be maintainable by _me_ even after a hiatus. If I'm spending 100% of my time maintaining the same code over the course of a year, it means I was completely inept. Perhaps this is a bit of a pride issue, but I want to be able to move forward and not be a "Wally" type engineer (as in Dilbert.com)
I had a similar experience when one of the Engineer from an Architectural company told me to edit the program from Fortran to CBASIC. When I explained to him I have to rewrite it for all users to use it successfully, he disagreed. It took 10 months an d 10K for for him to accept the reality.
This problem is not just present in these two domains. You see this dichotomy elsewhere, specifically in IT.
I do IT in a scientific research-oriented organization, having taken over for previous staff members who were very much of the "IT should be done like research" school of thought. The result was that each problem was addressed quickly and without any consideration for the whole. Being as they were working with physical assets and not just software (though there was a lot of that, too), the end result is similar to a large, monolithic application with plenty of places where it can break, being almost completely unmaintainable in its current incarnation without a complete overhaul.
I consider them to sysadmin like programmers. From my experience with the programmers I've worked with (people who are self-ascribed programmers, both good and bad at their job), the quick fix or solution is the most desired result. Documentation? What's that? Don't do something elegant and atomic, looking at the larger picture: kludge it to work with what's there currently pull from misc. other things you're aware of, etc. since you built the original and are aware of its intricacies. (Again, by no means document these inter-relationships, why they're important, and what something disappearing might do somewhere else. Don't use common topology/framework practices for your designs/implementations, just make a giant fucking spider web. Don't, by any means, be consistent. "Ship it, it's done".)
I've seen the same thing in IT as a whole. There are entirely too many 'administrators' who are basically glorified technicians who make well above their performance grade. They are negligent in their responsibilities to plan roll-outs. No, a "complete rebuild and reorganization, trash the original' is no more a viable option in most cases than "a complete rewrite" is due to the time (and money) it'd take to do so.
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
This is a really common problem in the academy. So common in fact, that one particular academician has come up with a special license, the Community Research Academic Programming License (aka, the CRAPL). It's worth a look and good for a chuckle:
http://matt.might.net/articles/crapl/
The first principle is that you must not fool yourself - and you are the easiest person to fool. -Richard Feynman
better would be to find a good common vertical point and then
1 drill into the floor of the second floor and drop the cable down
2 drill into the floor of the first floor (and repeat)
bonus points if you can do a wall fish (tape a poker chip to a cr2032 and an LED then attach that to fishing line)
Any person using FTFY or editing my postings agrees to a US$50.00 charge
The job of a scientist is to come up with new ideas and test them.
In that job, code is a tool, like a hammer or a mass spectrometer. If the tool works well enough for the job at hand, why on earth would you spend time making it work better ? It is just crazy.
The other problem is that scientists are arrogant (so they think what works for them is ok for others) and non scientists are stupid, as they expect scientists to do their work for them - in this case, production code.
sigh
It is not a scientist job to make a product, which, roughly speaking, is something that works more then twice, and can be used by someone without extensive training.
It is a scientist job to produce enough data to test their idea; you only have to do this once (ok, twice if you want to be a stickler about reproducibility). The code destructs on the 3rd run, you don't care, cause it aint your job, and your boss ain't gonna be happy, you doing something that ain't your job.
We have all come across prototypes - they work well enough for the 1 use case (the same as the science scenario) - but when you go to productionize it - it is a pain in the ass. If anything is written for a single purpose it will be hard to turn into a product and become generic.
Scientists are not professional programmers - they are like intermediate / junior programmers:- just as we are not scientists - I would not expect a professional programmer to be able to analyse a set of scientific data.
We're right and they're wrong.
BTW I didn't RTFA.
This is true, and starting to become a problem given the increasing expectation that as much as possible of our scientific work should now be open source. While this is great in theory, in practice, it means I (as a scientist) am under pressure to make my scientific "exoskeleton" code publicly available. I'm not qualified (and don't have the time) to polish it up into a product that is really suitable for distribution, and my employer doesn't have the funds to hire programmers to do this for every piece of code that I and my colleagues write. If half-baked scientific code is released, though, there is a real risk that it will be misunderstood and misused by others.
I work for a group at NASA. One of our group's tasks is to take scientist-written code and wrap it for distribution to hundreds of remote sites around the world. We try our damnedest to run the code as-is, but fairly often have to modify it to remove stuff like:
* Hard-coded input and output file and directory names
* Small and arbitrary length limitations on file pathnames - I've run into buffers that were declared as 53 characters in length, probably because that was what they needed on their system
* Large arrays being allocated on the stack - Linux distros have different default stack sizes
Most of the problems stem from the picky crap that C makes users go through for simple stuff like string manipulation. Both the scientists and the downstream developers like me would be MUCH happier if the scientists worked in something more forgiving (Matlab/IDL/Mathematica, or a flexible scripting language with a decent interface to heavy duty math libraries).
To a Lisp hacker, XML is S-expressions in drag.
Bullshit. False issue. Move along. There is nothing interesting here.
I do not believe in karma. "Funny"=-6. Do good and forbid evil. Yours, Oft-Offtopic Flamebaiting Troll.
actuaries engineers accountants and anyone else that does code as a process component rather than as an end product. I've dealt with this for 30 years...
As a computational physicist who has written plenty of research oriented code, as well as a principal programmer I can say that this is simply bogus. I've worked with plenty of scientists that generate excellent software, and many computer scientists and programmers who do not. This is just the standard BS against scientists...
Mark this article as bogus BS...
I wish it were as simple as this thread implies. The truth of the matter is that most commercial developers who are paid to worry about maintainability don't understand how to do it much better than their academic counterparts. Managers notice this and put all kinds of process in place to enforce good practice--requirements and design docs that are practically books, compile-time coding standard tests, smoke tests, regression test suites, automated tests and so on and on and on. These do not, however, turn developers into good programmers. They only turn them into safe ones.
Another thing the thread ignores is that 90% of all robust mission-critical code is in error paths. Academics rarely put those in, and great developers count on great code structure to save themselves much of that trouble. Let a few mediocre-but-safe programmers at that great well-structured code though, and the error paths multiply and must be addressed (usually by more mediocre-but-safe programmers). So for large systems, the starting point doesn't matter very much.
See, 2,000 safe programmers can write systems that enable a company that writes mission-critical applications to reach billions in sales, regardless of whether the initial code base was academic or extraordinarily good. Twenty genius programmers cannot do that, as a rule, even if they are 100 times as productive as the worker bees. Managers and executives understand this and go with it. At some point the weight of poor-but-safe code overwhelms the system's ability to grow and evolve, and it's time to start over.
------------
A hundred buggy lines in the code, a hundred buggy lines.
Fix a line and recompile, a hundred one buggy lines in the code.
Seen far to many Excel spreadsheets macros. Please make the hurting stop.
Actuarial specs are the other hand are sweet. Pure math.
BS! name the university where a PhD student only takes 4 classes. Most universities require that u r enrolled, even if that enrollment is only research credits.
U must be talking about credits beyond the MS requirements...but it still sounds slim to me.
At Caltech, my sciences PhD program requires a whopping whole 6 classes (which does indeed work out to 1-2 a term for the first couple years). Exceptions are occasionally made for students who have already pioneered new fields of study during their undergraduate years in particular areas. The GP poster is certainly not out of line in describing the typical program of top-tier universities --- might I venture a guess that "u r" not personally familiar with the operations of higher-end educational institutions?
I've seen this problem go both ways in trying create commercial scientific software. For every scientist who can't believe that the code needs to be re-written, there is a software engineer who has read a survey paper and thinks they don't need to consult the domain experts. The reality is somewhere between--both groups have spent years learning their own craft and a little time picking up enough of the other's craft to get by. As I've told many scientists, "I wouldn't let them loose in the lab to do research, why would I think I can write commercial software?" (For background, I'm a former computational/structural biologist turned non-coding software professional.)
"You can immediately look at code and tell who was trained to program by/as a scientist and who actually learned as a programmer"
Would a scientist and a professional programmer be equally good at detecting the difference?
In my own twisted career started out in physics, electronics and math, then got into video production, photography and fine art, but for most of my life, software work paid the rent despite flat zero formal education in software/CS. I'm not quite either category, scientist or programmer, but really in a third category - artsy creative - and of course there are many other styles of thinking.
Looking at a scientist's code and professional software engineer's code, they'd be just "different kinds of messes" to my right-brained spatial-artsy-impressionistic way of mentation. I certainly don't see professional programmer's code as any more "readable" than other code. OTOH I'd expect anyone with a real software/CS education to see a big difference.
Maybe it would be a good career-fit test to ask a candidate if they discern any difference between the work of talented people in their desired field and the work of talented practitioners of other fields who must attempt work in that field. Someone who doesn't see a difference should look for work in different field...
Although, no matter what else, I will agree, most physicists write atrocious code!
It's naive to consider either class of software as being sufficient, or either kind of programming to be superior. Like most problems there is a strong management component to assigning resources to each in appropriate scales.
A computer scientist/software engineer delivered a well-phrased summation of half of this discussion during a 5-minute talk at a recent lightning software session at a science meeting. (Note that there are rarely science sessions at software meetings.) A domain scientist/software engineer then delivered a well-phrased summation of the other half of this discussion. Both were right and both were wrong.
My five minutes? Pointing out that the real issue was that management rarely supplies sufficient resources to coherently accomplish software projects of any type. Typically projects are underscoped by a factor of three or more, whether the particular project is to build a robot or an exoskeleton. This is true whatever software process is followed, but in terms of the Mythical Man Month, it's like omitting the nurses and anesthesiologist from the surgical team.
As pointed out by others examples of the reverse can also be found in practice. However I agree with this generalization to be true for coding practices. I also like to add another related generalization. Autodidact developers tend to code quick and dirty (with a lot of experience how the actual code run in daily practice under heavy loads etc), people with a heavy academic background (also depends on the specific university) tend to code slower and cleaner.
That said, in business I often hear some developer is very skilled, "he had written software X on his own faster than three others combined". Always makes me very skeptic, because 9 out of 10 times, the coder coded so quick and dirty that it is still alpha quality and will probably take a lot of extra work to get it maintainable after three years, when the this coder is already working somewhere else (and having a lot of referrals being so good). While the guy that has to make it maintainable and fix would say something completely different.
Ofcourse, generally speaking, business people are to blame themselves also as they have nonsense criteria to evaluate the productivity of software developers (counting lines etc). In the other cases, both autodidact and academic, there is some really good developer who also has "The Art of Computer Programming" on his desk or has a good background in functional programming.
They should oblige those said scientists to maintain and extend their code, they will immediately learn what does it mean to look at one year old code and try to debug a problem occurring in a case that was not previously envisioned. Or to extend the functionality to work on different data. The abstract is right, scientists don't see the code as something reusable. For them is just a mean to an end
I'm a programmer who spent the last 8 years developing a commercial language tutoring AI. The original prototype was written in Prolog, consisting of about 5,000 lines of code in a Hypercard-like environment. The current system contains back-end authoring tools in C++ (15,000 lines), various tools and utilities in C/C++/Java/SQL (20,000 lines), Java/JSP web application (75,000 lines), a 50,000 word dictionary, 500,000+ entry grammar/morphology, plus 5,000 audio and video clips. The original software was developed by 2 people; we've had 20 contractors and contributors (graphic designers, voice talent, audio engineers, GUI designers, testers, programmers, editors and linguists) involved for a total of 40,000 man-hours. So, yeah, the academic version was a bit simplistic.
All the software engineers I know are perfectly capable of emitting a dense chunk of spaghetti code to solve one task one time, the same as the scientists, but they generally don't because they know spaghetti code is difficult to prove that it will behave correctly, even just the one time it's needed. Unlike non-computer-field-related scientists, they also know better than to call such code "production ready."
They probably wouldn't even call such code a program. We all write scripts for one-off tasks when that seems less expensive than doing things manually and there's no readily available preexisting application for it. But after completing the task, we either throw the script away or refactor it into something maintainable. People who don't code for a living won't even know what refactoring means -- unless they have experience in collaborative coding. Maybe when more scientists realize the power of Open Data and github that might bring the two worlds closer together?
Same as the difference between the developer and the end user. The developer reckons the program is done when it executes without segfaulting, regardless of the fact that the interface can only be understood by another developer. Making a system correct and maintainable is only part of the job; making it usable by the rest of the world is a whole other business, and requires a knowledge of cultural linguistics and human-computer interaction which are not yet part of most developers' skillsets. Writing stuff for other developers to use is only a start.
And the code isn't the science. You're meant to do the code yourself from the science displayed in the paper.
And, really, for much the same reason as you have three different coding teams writing FBW systems for commercial jets: it's more certain that good (but not provably correct) code will disagree on obscure pathways when the chance of another different program doing the same job will be accurate. And then the programs vote. Sigma clipping in signal processing does the same thing.
For a scientist, the code is just to do the numbers. The paper shows the science and the result of the number-crunching. Do your own program and if it disagrees, then there's something worth looking at in the disagreement. The code isn't really important, the results are.
The part between the twain is the taint. ;) It t'aint here and it t'aint there.
Having done both i just want to say. Its not as simple as that. In academia you are more doing code for yourself, you also have a lot more time and resources at your disposal. Sure in a company you have more money for the really important stuff, but more often than not that leaves much to be desired elsewhere. Both tasks are similar in may ways, albeit at different levels.
The reason why i went back to academia for a while is the same that leads to wanting more requirements lists. In a company your expected to perform a certain task, you quite often loathe the tasks but nevertheless they need to be done. Now the bosses set up these tasks and the bosses might not even understand what it is they set up as a task. The requirements list is a way for you to shift blame, you need it because a lot of corporate culture is about covering your ass. Otherwise you'll notice quite quickly that you will be sucking up all the mistakes of others into yourself.
Example:
A quick implementation for code is needed, the company knows that the code needs to be changed later but takes the money up front with a easy implementation. It gets implemented by a guy that isn't all that good at engineering code so its not readily compartmentalized. Said person decides engineering is not for him so he gets a MBA instead of his MScE that he planned. Said person moves to management. The code works but badly, the client has constant problems with it.
So the client says now its good time for phase 2, an contract is drawn out. This contract includes tasks that MUST work. The code goes around for a while untill eventually the company has burned 2 engineers with this code. The problem lands on my table, at which point is say "ok i can do this because you have nobody else going for it, but dont expect it to get spectacular results". I work for it 3 months (18 months past deadline by the way so presure is up). To discover what the actual problem is:
1. Original coder is now boss of the division, and has sold his code as the best code the company has ever done. So requirement is to keep as much of the orignal code as possible. However the original code doesn't do at all what the client wants, no matter how good or bad it is its the wrong tool.
2. Original contract contains a logical error, one can not under any circumstances fulfill the contract, I can mathematically prove this fact. As a consequence something has to give.
So you'd now think that since i wasn't the original author of any of these i would be off the hook. No, way, The original architecture screw up, deadline blowing and the contract is suddenly my fault. Even if i inherited all these problems. And this is why for many the answer is if it isn't in the requirement list its not going to be implemented, case closed. Its a way to formalize where the blame lies.
The code hasn't "been written", someone WROTE IT. Someone is responsible for it. It didn't materialize out of the ether. Someone wrote it!
Subject... verb... object.
What did it, what was done and what did they do it to.
Passive voice obfuscates and deflects responsibility.
"The decision has been made."
BULLSHIT! "We/I decided...." Someone made the decision, that someone should be the SUBJECT of the sentence.
How is it that i remembered 5th grade English class but journalists and so many other supposed adults can't?
Utilizing the synergization of benchmark e-solutions to pre-workaround action items!
My first coding job was at NIST (then NBS). I took code written by engineers, and made it usable by someone other than the engineer who wrote it. In most cases the input UI consisted of a '?' prompt at which you were to enter a comma separated list of floating point numbers, and the output UI consisted of a block of comma separated floating point numbers. The more interesting part of the assignment was eliminating inputs that could be derived from other inputs. You would have thought that an engineer would have done that without thinking about it - mostly not.
Contentment is the greatest wealth
- Sukhavagga Dhammapada
Contentment is the goal behind all goals.
My personal observation was that when I got my BS way back in 1990, I knew everything that I needed to succeed in the software world except for handling non-sunny day cases. Sure, we talked about stuff like error handling, validating user input, and so forth in various classes, but it didn't really sink in. It wasn't until I had a job and worked on a system that had to stay up and run for months at a time that I learned those lessons. Most school projects only last one semester, and really only have to work once, so no one really gets much exposure to the necessity of bullet proof code.
Those scientists seem to have the same mind set. It works in a few sunny day cases, so it must be ready to ship. Management can think like that too, especially if some other group is tasked with support and bug fixes. But those of who have had to pick up the pieces know better than that. Isn't that part of the value-add that profession software people add to a project? Coding really isn't that hard to anyone who can handle the symbolic manipulation (mostly algebra) and can pay attention to details. But there is a world of difference between toys and serious applications.
As a software engineer, if you find yourself in that situation, your road is simple: look at the source, find a few corner cases that will break it, and then you can demonstrate that the code is not production ready. Then you should be able to get the green light to harden it. If you do it right, you can earn the respect of whomever cobbled together the original code, and then you can work with them next time. That is kinda the Holy Grail, isn't? You get to add your software experience and they get to add their domain knowledge.
And if they are jerks about it, at least you get to rub it in their faces how bad they are at writing code. While that isn't really a "win" of any sort, it can be amusing to knock someone down who has put himself on a pedestal that he hasn't earned.
- doug
The problem is fairly common. What works in the lab would at best be cost-prohibitive in the real world. Here's a classic example: In Solidworks, you can design anything but you can't build it because the tooling prevents it. You can also design stuff but it's impractical to assemble. You can design stuff that could be built but would cost so much that the market can't support the price tag.
Lab-based software rarely has a user-interface that works for the general population. Embedded systems are fairly easy to design and cheap to prototype but packaging it for production is a major expense. Making plastic molds is still cost prohibitive for very small quantities.
> 1 UOW = program for yourself
> 3 UOW = give it to someone else
> (you install, you copy, etc)
> 9 UOW = give it to local group
> (howto, platform change)
> 27 UOW = shareware/open source
> (configure/make/make install)
> 81 UOW = product
> (real docs, slick UI, support teams)
> 243 UOW = business
> (lawyers, CEO, sales, marketing)
I wrote an inventory system for my govt agency that didn't actually need it.
Because I was an employee/analyst instead of a contractor, I could say, "No," and force them to explain why the other 5 inventory systems we had did not meet their needs (lack of data entry was not a suitable reason for me.)
When they finally convinced me they needed two pieces of info for tracking computernames on the local wan, I built them one but since we only buy certain types of machines each year (maybe 3 or 4 different models), I built no way to enter new machines. Call me and I do it by inserting new vals into an array in the source code.
That seems sort of silly but think of what's involved in creating a new item_entry_screen, validating the input, restricting it to a few users, handle editing of poorly entered records. It's a lot of work. It's only worth doing under certain conditions. Right now I meet their needs within a 24-hour turnaround that takes me about 15 minutes of work/year
Twice upon a time...
Once - I shared an office with a PhD physicist. Our shared task was to implement some very complex Einsteinian physics into a very large system. He developed the algorithms, I designed the software to be an integral part of the other 500K+ lines of code being developed, using the C++/OOP/Programming by Contract model of our large software development effort. He developed FORTRAN "pseudo-code" that actually ran using his PC FORTRAN compiler, then demanded I implement it exactly as written at the preliminary design review. At the CDR, when I showed him the actual code framework and how I'd had to divide up his code modules to meet our standards and actually reuse some of the libraries other folks had written, he threw a fit - ended up complaining up the chain of command until I was justifying my design to senior management. I won :)
Another time - I was asked to review code. 4.5 million lines of code. Written over 15-20 years by a variety of scientists, grad students, and other "subject matter experts." Over 10 languages, no standards, duplicated data repositories, obscure libraries, hand-rolled interfaces... there was a even library someone had written to allow FORTRAN programs to draw maps using XLib. Some hard analysis, and I figured that the whole system of systems could be reduced by over 50%, and down to 4 languages, and if we standardized on some open-source tools, we could save millions in maintenance.
Domain scientists are supposed to be the experts. Let them design algorithms and do their jobs. Computer scientists are supposed to turn these things into robust, maintainable, extensible, supportable systems. Let's do ours right and keep our jobs.
That is the core problem. They feel the problem is solved, which leads to something even worse - they feel the software guys are idiots because they keep fucking up their work. We use a lot of Matlab and Simulink too, and the Mathworks uses this to absolutely drive a wedge into the organization. If it can't be done in Simulink, you use the "legacy code tool" to hook in some of that antiquated C-code. They also preach that a "model" is self-documenting, so those guys don't have to write docs or explain anything. And I'm with you on the SCM issues. Management of course listens to the expensive guy with the PhD who wants a $30K tool chain that doesn't work with anything else. All this because of ignorance (a word which unfortunately carries negative connotations).
Often in a large corporation code may end up being used at many sites around the world and have to interface with other programs. Other programmers are likely to have to write programs that work with the original code and may even be incorporated into the original software. Without internal documentation they can take many times longer to write due to the shortsightedness of the original programmer. Writing a program to do one thing, or a piece of science is fine *IF* there is no likelihood of that software ever being enhanced or integrated with or into other software. I spent months deciphering undocumented code written by people who thought they knew how to code. To give them credit the code did do what they wanted, but there were major problems when it had to interface with a commercial program, Unfortunately the code was written by those located far up the food chain and word was, "It will not be changed" as the whole corporation used it. My only comment was "It will, sooner or later. Some years later the whole corporation went to a commercial database. They had to assign a team to convert the data and thought they'd simply export the original data into a spreadsheet and then import it into the new system. Unfortunately when they tried this they ended up with hundreds of thousands of non unique records. My guess would be it cost them a minimum of several million dollars to do something that would have only taken minutes had the original program been written following good programming practices... IOW Non professional programmers such as engineers and professionals in other fields that learn how to write a program approach from an entirely different perspective. What they do, works... usually, more or less, but they write to do a specific job, period with no thought as to future use of the code. They rarely follow good programming practices, write spaghetti code... lack of documentation, ... Earlier I saw a comment about not having time to do all that which is not normally considered a valid excuse, but a programmer in that field would be able to write good code that would do the job, be adaptable, do it faster, AND once finished others would be able to follow and understand the code. To top it off the programmer would likely get the job done in far less time.
That could partly explain the difference between the passion I felt for computer science, and what I felt was the 'lack of caring' about software' of 'many programmers around me'....(its just a job).
I was always horrified at the increasingly horrified at those who did coding just
to churn it out and hand off to others, whereas I'd be regarded as overly
cautious or slow to call something 'finished'....
Definitely caused a crimp, as those who like to throw things 'over the wall', don't take it as personally when things don't work 'just right'....
is a perfect example of how scientific code should not be written. Mad rush, tweaked to get the results they wanted and no way to replicate results.
Furthermore many of the tweaks to gain the expected results made no sense whatsoever - they were simply fudges.
Writing software is a science not an art form. If you treat it as some form of black art - as the UK folks did then you deserve the ridicule you get.
If you reply with but non-deterministic systems are non bounded and hence cannot be proven, Then you need to read Dijkstra's A Discipline of Programming - this puts forward a very simple seven state bounded model for ND systems mathematical provability and if you canot understand this book then dont try and build modelling systems.
IMHO Unless you apply a the science of programming all you are doing is hacking something together - and if it does what you expect that is more luck than anything.
The example is a stupid set-up. The scientist has one set of requirements, "test theory" this proves successful, and therefore it is decide to produce a product from this. This is probably less than 10% of research work (just a guess from working in a research to first of a kind product development lab)
Tell this scientist that he has hit pay dirt, but these are the requirements for the product solution.
A sufficently well written set of requirements should convince the scientist that there is more work to be done, maybe get him to review the requirements too.
Inform programmer the current state of the software, the requirements, the understanding of the scientist.
If you really needs the buy in of the scientist to complete the project write the tests first, fail the existing software and productise it!