Ask Slashdot: Successful Software From Academia?
An anonymous reader writes "A lot of masters and PhD theses are about development of software targeting the solution or the automation of a specific problem. Bioinformatics, for example, has a lot of journals about software tools that are coded in academic environments; some of this software is the final result of a four-year PhD. But my question is, how much of this software will see the light outside the universities? I know of some examples, like BSD, but they are an exception, right? Is there any list of successful software created entirely inside universities' labs that became widely used?"
That work for you?
PostgreSQL
"I use a Mac because I'm just better than you are."
In this day and age, most good software developed in acadamia tends to get spun into a business venture that makes its academic developers very, very rich. See Google, for example.
#DeleteChrome
The problem with software in academia is that it is often devoted to a sole purpose. It is not a generalized solution -- conversely -- it's often a demonstration of a solution so specific that it's never been done. Hence the awarding of a title to the creator. On top of that the teams are usually small and time is usually tight. It's also usually a side effect of the greater thing, the thesis. It will always take a backseat to the theory.
... if it had depended on hardware or the constant change of text files like PDF and DOC, I think you can understand how hard it would be for academia -- let alone the originating researcher(s) -- to maintain and support for the community. An open source effort could pick up that slack but then who deserves credit for that work?
When software is widely adopted, it is because it has been widely supported and is a more generalized solution to a problem. If it uses hardware, it supports all kinds. If it reads or writes files, it covers all formats. This leads to widespread adoption but also takes a lot of time and a lot of contributions. If you're also working on your thesis, this is a daunting task to work on the side.
Nobody gets their PhD by making a predecessor's implementation support more file formats or hardware. So this is left to the licensing of the originator and the community -- who are often recognized as the real workhorses that go from prototype to actual usable software. That's why you don't find many PhD projects turned instant open source hit.
In bioinformatics , a relatively young field, most consumers of the software work in a lab and the input is fairly simple. But even with simple input they first had to agree on a format (those are just a few of what used to be many). BLAST and FASTA go back to the 1990s and 1980s respectively
My work here is dung.
* Kerberos (Widely used, part of Active Directory)
* X11
* AFS (Andrew File System)
* MACH (Used by GNU HURD and OS X)
And that's just a starting sample.
-- Sometimes you have to turn the lights off in order to see.
And valgrind
I think most of the finite element/multiphysics packages started as research projects, either in university or government labs (some military, some conventional). For studying e.g. electromagnet design, heat deposition by currents /EM radiation e.g. microwave studio. Most of the radioactivation and nuclear shielding simulations used by the nuclear industry for designing radiation shielding are or were academic projects (e.g. MARS, FLUKA, MCNPX).
and LLVM
Subject of several theses:
http://www.tug.org/docs/liang/
http://www.pragma-ade.com/pdftex/thesis.pdf
https://www.tug.org/docs/plass/plass-thesis.pdf
(John Hobby's on METAPOST http://ect.bell-labs.com/who/hobby/thesis.pdf )
Probably others. More information at
http://www.tug.org/
and
http://www.latex-project.org/
and
http://wiki.contextgarden.net/Main_Page
William
Sphinx of black quartz, judge my vow.
It started out as someone's graduate research project in the late 80s/early 90s, and today it is the #1 aircraft design software tool in the world. Its installed in universities, aircraft manufacturers, aerospace consulting firms, and government and military institutions across the planet.
Disclaimer: I worked on the software after it went commercial.
The backend for quite a few compilers, and a few shader compilers...
Care about electronic freedom? Consider donating to the EFF!
From Univ of Illinois - it arguably changed the internet from a tool for techies to a new way to do business. One of the problems is if something is really good commercial companies may morph it into products that eclipse the original; but their contribution, when though of as basic research, was invaluable. So the definition of success should not be limited to widely used, popular, or well know; but also include defined a new industry or way of approaching a problem.
I'm a consultant - I convert gibberish into cash-flow.
The title of Linus' thesis is: "Linux: a Portable Operating System" - so yes, it counts.
The real question is, if it is enough that a project can trace it's roots back to a academia - even if >90% was added later and or by developers outside academia. I bet many products considered purely commercial started out started out in the back of the head of students during their studies. Many of those dropped out to build a company rather than stay and write a thesis about it. If you include those, and even consider some studying other majors than CS - your probably looking at the bulk of all software in existence.
Run with the lemmings, and you'll get your feet wet.
Frequently the software doesn't start in a given academic lab, so much as it starts somewhere in a given research community and propagates to the academic labs as research needs dictate. ImageJ, for example, started at NIH, but now it's available to all and in use all over the place (including my lab).
Other software is developed cooperatively, and then academic contributions are added as they're needed to enable someone's research. If you run R (the statistical program) and start looking through all the extensions available in CRAN, you'll see tons of additions that have been generated in academic labs and released for use by the wider research community.
I work in biomechanics, and I've seen a few programs come out in that field through largely academic development. AnimatLab began (I think) at Georgia Tech, and I think Cofer et al. are still developing it within the university. OpenSim started at Stanford as an open source musculoskeletal simulation program, and is vastly preferable to the godawfully expensive SIMM, which does pretty much the same kinds of things. OpenSim is still alive and well at Stanford, although the developer network spans multiple institutions, academic and otherwise.
Much as I might wish that I could spend more of my time developing programs and playing with software within the academic sandbox, more often it's simply more practical to cast the nets for software from someone, somewhere doing somehow similar research, and then using the software you find if it's useful to your work, rather than reinventing the wheel in favor of advancing academic software development.
"What's the use in being grown up if you can't be childish sometimes?" --Fourth Doctor, "Robot"
FWIW, I'm a PhD student at a reasonably large institution in the US.
Very little of this stuff sees the light of day. The vast majority of software is written simply as a proof of concept for some particular method/system/algorithm in order to get published. Good conferences/journals will typically want not only a well thought out idea, but an idea that you can and have implemented it to some extent, and that it works. That having been said, most of what gets produced is complete and total garbage -- typically just enough code to be able to prove that something runs correctly and in a given amount of time.
Personally, I have written a bunch of junk code during my time here. I'd like to think I know more or less how to write good code after all these years, but writing good, well documented, well tested code takes time we don't have -- writing code is simply a means to an end (publication) -- and so most of the code I write is hasty and ugly. This even applies to code that people say is for "wide distribution".
Before you go hounding on academia however, I'd warn that writing "good code" isn't really the point of what we're doing -- the point is to produce a reasonable method of solving some particular problem or type of problem. Going into bioinformatics for example, there are a whole bunch of problems that involve performing more efficient analysis of certain types of graphs. If a researcher discovers something along these lines, he/she will likely write some junk code to prove that the bare algorithm works, perform some analysis of it, publish it and move on. This may or may not end up actually being a useful improvement -- if it is however, then some implementer whose actual job it is to code whatever medical software might be using this algorithm then has a basic blueprint of how to proceed.
As for some examples of software from academia that have made it out, let me think...
Coverity - static code analysis tool, started at Stanford then moved into being a startup and is now quite successful
PostgreSQL - Originally from Berkeley
Bro (Intrusion Detection System) -- written by a researcher from Berkeley/ICSI -- is still somewhat "in academia", but I have heard of several production deployments
That's all I feel like coming up with right now, but I think the general pattern here is that if/when some piece of software produced in academia is seen to have value in its own right (e.g., away from the original research/publication that spawned it), it typically gets spun off in a start-up or a more concerted effort is given to its development, at which point one can actually spend the time to write good code.
BIND
BIND was written by Douglas Terry, Mark Painter, David Riggle and Songnian Zhou in the early 1980s at the University of California, Berkeley as a result of a DARPA grant. Versions of BIND through 4.8.3 were maintained by the Computer Systems Research Group (CSRG) at UC Berkeley.
Battlemaster--Game with friends in medival realms
http://moodle.org/
The 'problem' with bioinformatics is that the field is extremely broad. Unless you write BLAST or one of the big sequence assemblers, your software is only going to appeal to a tiny fragment of an already small bioinformatics community.
I wrote software as part of my Ph.D. that is now distributed world wide. I guarantee you've never heard of it - it sets the standard for how to do certain types of phylogenetic analysis, but almost no one does that analysis.
During my time as a postdoc, I wrote a very simple curve fitting routine and put a minimal GUI on top of it. I am now getting requests from multiple countries to modify it to read in files from their instrumentation. Once again, only the tiniest handful of people care, but for those people, this is revolutionary stuff.
The question here is, how do you define success? Like a lot of the responses to this thread, I wrote a small script here or there to solve my own problem. Turns out, it solved a problem for someone else, too. My best known piece of software was a hack, a one-off script, written in an afternoon, that I got yelled at for even bothering to spend time on, and was only ever intended for my own use. It turned out to be the lynchpin for our project, got published in a peer reviewed journal, and has since gone global. I found out later that one of my undergrad computer science profs had solved the same problem 20 years before I did, in a more elegant way, and published it in a good, but non-science, journal - no one has ever heard of it.
Neither of us had the expectation that our software would amount to much. I would define the prof's work as 'successful' - he published a paper on an interesting academic topic. I would define my software as 'wildly successful' - I got an unexpected publication and a global (if small) user base, along with a reputation for fixing problems that would later get me a good postdoc position.
This isn't really an academia question. The most common advice in the open source community is 'scratch an itch'. Write something to fix a problem you see. If you write good stuff, maybe your code will become 'successful'. Or, maybe your afternoon worth of hacking will just turn into an afternoon worth of experience you can apply to the next problem.
-V-
Who can decide a priori? Nobody.
-Sartre
Isn't the first one that comes to mind the world wide web? CERN is definitely academia. I'd imagine many other protocols originate in academia. Any idea about SMTP, Usenet, etc.?
BSD, X11, Mach, PostgreSQL, and SSH were all explicitly academic projects.
There is also a question about what qualifies as academia beyond simply universities and government labs. Linus Torvalds started Linux while a PhD student but later landed in industry. Bjarne Stroustrup worked at AT&T Research when he started C++ but he landed at Texas A&M shortly after.
Virtually all programming languages originate in or near academia : Lisp was MIT. Python was started at CWI. Haskell. OCaml. etc. Among the non-academic languages most originate within huge organizations who's research departments start to resemble academia : Smalltalk was PARC. Fortran and Cobal were IBM. C was AT&T. Erlang was Sony. etc. Java and Perl were seemingly further from academia, but academia's influences upon them abound.
Afaik, all computational libraries used for serious numerical programming, like stock trading, computational fluid dynamics, etc., were developed in academia.
The Christian religion has been and still is the principal enemy of moral progress in the world. -- Bertrand Russell
archie -- Princeton?
CAP (appletalk for Unix) -- Columbia
cops/tripwire -- Purdue
GNU everything -- MIT
Gopher -- Minnesota
Kerberos -- MIT
Khoros -- New Mexico
Mach -- CMU
NNTP -- UC San Diego
Mosaic -- Illinois
sendmail -- UC Berkeley
BSD -- UC Berkeley
RCS -- Purdue
Usenet -- Duke/UNC
tcl/tk -- UC Berkeley
multi-CPU Unix -- Purdue
cu-seeme -- Cornell
I'm sure I'm forgetting quite a few. And of course not all of these are STILL successful, but in their day they made their mark, and often paved the way for other projects.
As I note upthread, virtually all important programming languages originated in academic-like environments, even if they are officially corporate.
There are I think two revolutionary non-academic programming languages :
- Smalltalk was developed by Xerox PARC, but ultimately created object oriented programming, which certainly used academia to gain traction.
- C was developed by AT&T, but completely revolutionized our world. It's almost surely the most important language ever written. There had been structured languages before. I think Fortran and Cobal were developed by IBM. And academia had all it's research and teaching languages. Yet, it was C that brought structured programming and type-safty to system level programming, previously dominated by assembler. Imho, const is pure genius. C could not help but succeed with or without academia, but AT&T was still a fairly academic environment at that time.
In other words, your classification of generalized academic project doesn't include either afaik, but clearly both can fall under some generalized academia. You could not design C, and maybe Smalltalk too, without thinking deeply about languages from a hybrid academic and industrial perspective. If you pursue a blind industry perspective, you create garbage like PHP or VB.
The Christian religion has been and still is the principal enemy of moral progress in the world. -- Bertrand Russell