Ask Slashdot: How To Encourage Better Research Software?

← Back to Stories (view on slashdot.org)

Ask Slashdot: How To Encourage Better Research Software?

Posted by Soulskill on Friday April 29, 2011 @07:13AM from the mo'-money-mo'-problems dept.

An anonymous reader writes "There is a huge amount of largely overlapping but often incompatible medical imaging research software — funded by the US taxpayer (i.e. NITRC or I Do Imaging). I imagine the situation may be similar in other fields, but it is pronounced here because of the glut of NIH funding. One reason is historical: most of the well-funded, big, software-producing labs/centers have been running for 20 or more years, since long before the advent of git, hg, and related sites promoting efficient code review and exchange; so they have established codebases. Another reason is probably territorialism and politics. As a taxpayer, this situation seems wasteful. It's great that the software is being released at all, but the duplication of effort means quality is much lower than it could be given the large number of people involved (easily in the thousands, just counting a few developer mailing list subscriptions). No one seems to ask: why are we funding X different packages that do 80% of the same things, but none of them well?"

12 of 104 comments (clear)

Min score:

Reason:

Sort:

Pragmatism? by gilgongo · 2011-04-29 07:21 · Score: 2

"No one seems to ask: why are we funding X different packages that do 80% of the same things, but none of them well?"
When I think about this, I'd rather have that than one single package, if only for the reason that without competition, I'd not be able to know if it was doing anything well or not.
Pragmatism here says plurality is probably better than some kind of Stalinist central control.

--
"And the meaning of words; when they cease to function; when will it start worrying you?"
1. Re:Pragmatism? by goombah99 · 2011-04-29 07:37 · Score: 5, Insightful
  
  The original article is clueless about the difference between research products and production software. In research there is no a priori omniscience about what is best. What you see at the end is the few survivors of an evolutionary competition of zillions of efforts. You don't see the three planned outcomes that we had known could have been written from a well thought out requirements document.
  There is a decades old saying that scientists develop the next generation of algorithms using last years computers . COmputer scientists write last years algorithm on next years computer. It is still true.
  
  --
  Some drink at the fountain of knowledge. Others just gargle.
Not going to happen by robbyjo · 2011-04-29 07:21 · Score: 5, Insightful

Not only that most researchers are not proficient in programming language, they shape their codes more like prototypes so that they can modify the codes easily as the science progress. Conventional programmers will be frustrated with this approach since they want every single spec set in stone, which will never happen in research setting since research progresses very rapidly and specs can change dramatically in most cases. If you can set the spec in stone, it is usually a sign that the field has matured and is getting transitioned to engineering-type problems. Once the transition happens, it's no longer research, it's engineering. Then you can "make the code better".

--

--
Error 500: Internal sig error
1. Re:Not going to happen by sockman · 2011-04-29 07:41 · Score: 2
  
  Didn't you just describe why agile came about? Because we, as software professionals, realize that specifications are not set in stone and the system should be easy to adapt and modify for future requirements.
2. Re:Not going to happen by Anonymous Coward · 2011-04-29 08:15 · Score: 3, Informative
  
  I do medical imaging as my day job. The parent understates the "spec" problem -- its just as much a testing problem. The typical spec I work against is "create a tool that distinguishes this disease state from some other disease state and from healthy normals with optimal power". Optimal power is, of course, only defined by the results you get or against other software (probably that measures different facets of disease). Moreover, the spec gets driven by log10 increases in image numbers --- that is 1:10:100:1000:10000 images. So the original spec is generally an idea for a few images -- then as the idea gels the sample battery size is increased. A lot of places don't have 100+ image sets -- particularly for cutting edge imaging methods. There's also a catch-22 -- in general if you know how to detect algorithm failure you'd build that in to the code. By the time you get to testing on 1000 subjects there's enough code in place that it's hard to justify a rewrite using "proper SDLC". (Go off and re-read Joel on Software about the value in "rewriting" software!) Besides do you want the creative people managing software development or do you want them moving on to the next great idea?
  As far as the original poster's whine -- I don't buy the "didn't have git and hg". SCCS for example wasn't pretty but certainly worked in the particle physics community which was globally distributed from the 1980's. There was a lack of sharing for two reasons: 1) if you are competing for customers and or grant money you publish the idea but don't give away the code (it's your competitive edge) 2) if you have a new idea its often the case that the available code you could find wasn't worth the effort to merge. Now one of the problems is that there is a huge buy-in for most of the toolkits -- its hard, for example, to simply lift a function out of ITK to use elsewhere. If you want to use ITK you have to buy-in and create ITK apps. It's also non-trivial to drag a function from some other framework into ITK. (This is not to pick on ITK, it's a good toolkit; it applies to most other frameworks too.) Moreover, there are a couple of different classes of image processing users -- those who are worried about whether software works (or seems to work) and those who worry about whether its right. Ideally you want both, but testing for "works" is different than testing for "right".
  Heck even up until 6-7 years ago many labs had their own image format used in processing. DICOM data comes off the imaging device -- but DICOM is a very flexible standard. (Here flexibility means that about half the stuff you need to know to really do large scale processing is stored in well defined locations; the rest is vendor specific and vendor software revision specific.) So most toolkits munge the incoming data into some standard format -- simple formats sound great, but can often lack sufficient detail for a particular analysis. The Mayo Analyze 7.5 format, for example, spent years as a ubiquitous standard, but couldn't sanely store oblique images. Its at least settling down to a handful of decent storage formats which helps with interop.
  Medical image research is not software engineering.
3. Re:Not going to happen by Puff_Of_Hot_Air · 2011-04-29 13:15 · Score: 2
  
  I think the research environment is fundamentally different from a commercial environment. In many software projects the requirements are continually changing. This is not a result of poor planning by the people requesting the software, but rather the desire to take best advantage of new scientific information as it becomes available. The resulting informal code development is very efficient for the project, but produces code that is difficult to transport to other projects.
  Your situation is not different to many commercial environments. In fact, this is one of the largest problems in software development (notice I use the word development, not engineering). There are ways to write quality, flexible, extendible, maintainable programs in these environments, but it is much harder. I'm not talking out of my arse here, I've been in this game for many years now, and have seen approaches that work, and ones that fail. If the resultant program is truly "use once, then throw away", then continue as you are. If you find that you want to build on it later, or give it to others, then there are existing techniques that can assist. The smart approach is to add some of these ideas that look as though they would help, one at a time, and only keeping them if they help. Your right when you say that your environment is "fundamentally different", my experience has been that everyone's situation is unique, but there are certain tools, techniques, and strategies that pre-exist, and may save you time if you spend a little time investigating whether they're right for your environment.
Terms of grant must specify coding standards by diabolicalrobot · 2011-04-29 07:29 · Score: 2

This problem is widespread in almost every discipline which uses any form of computation. I think the best way is for major funding sources like the NIH, NSF etc to build in to the grant terms which coding language, existing libraries be used. Or how/what/ software will be developed should be used an additional metric for deciding which proposals to accept. Proposals which are strong otherwise but do not state in clear terms how software will be built should be asked to modify their proposals to include such information. Pre-existing, well-designed, modular software architectures should be extended rather than building architectures from scratch. This is a waste of funds and time. Funding organizations must also recognize that developing good software takes time and money and set aside budgets in the grant for hiring dedicated programmers. (Scientists are very often not good software engineers and they are interested rather in trying things out quickly to see if it works at all) Such programmers can then take hacky research code from the scientists and turn it around into great reusable code.
Because researchers aren't programmers by AdmiralXyz · 2011-04-29 07:42 · Score: 3, Insightful

I'm a computer scientist in the middle of getting my BA, but for research experience or in the process of taking an elective, I've spent time with grad students in other departments- mostly biology and linguistics- and the software they write. Smart people? Absolutely- they're experts in their field. But they can't write code to save their lives. I've seen things that make me want to run screaming to TheDailyWTF and the quality software engineering on display there ;)

I don't think this is a bad thing, myself. Most of this code is single-use only, being written for a specific purpose (or a specific thesis paper), and will never be used again. Not to mention they're taking enough time to get their degrees as it is- I don't think it's reasonable to ask them to become expert software engineers as well. OP claims that taxpayer dollars are being wasted, but think how much waste there'd be if every researcher had to get a CS degree before they started in their own field, too.

--
Dislike the Electoral College? Lobby your state to join the National Popular Vote Interstate Compact.
Convert research into useful by gr8_phk · 2011-04-29 07:47 · Score: 3, Insightful

If you're not happy with what's out there, you need to roll your own. If what's out there is open source, you can pick the best of each of them and build the solid system you're looking for. With research projects, once the stated goal has been reached they are done - until a follow-up grant for further work is awarded. That seems to be what research is about - showing that things can be done or done a different way - not producing a useful software product. Once they show what and how, it's up to someone else to take that and make something great from all the pieces. Unfortunately that means sifting through all the duplicate stuff and finding the best approach and possibly reimplementing it to fit in with everything else you're doing.

For example, you may find Kalman filters, genetic algorithms, neural networks, GPU implementations, etc. all able to solve a particular problem. For real-world software you really don't care about all that, you just want the ONE that works best in your application. Of course then there will be papers on "extensible frameworks" with "plugins" that can handle any of those implementations... Again, for real software you pick the one that works "best" for your definition of best and go with that. To make this happen, you need to get an ego-less (read non-PhD) software team to pull it all together.
Re:What the hell are you talking about? by blueg3 · 2011-04-29 07:55 · Score: 2

Are you seriously trying to tell us that these big labs are not using version control while developing their systems?
That's a lot more common than any sane programmer would suspect.
Intel and Microsoft by goombah99 · 2011-04-29 07:55 · Score: 2

There is a legend that this is what happens at Intel and Microsoft. It used to be said that every odd numbered Intel was not much of an improvement. It's still true since Windows 1.0 that every other release of windows has sucked. It was perfectly predictable that Vista would tank. (No I don't hate microsoft. Even people that love microsoft can see this has become a "law".)
In both cases the supposed explanation is that there are two difffenent teams working at the same time. The better one gets the first release and second one patches their changes into it for the sucky intervening release.
No idea if that is true in practice.

--
Some drink at the fountain of knowledge. Others just gargle.
Not really. by jd · 2011-04-29 08:07 · Score: 2

I track a lot of scientific software on Freshmeat. You'd be amazed at the redundancy. Medical stuff isn't as bad as some areas.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)