Ask Slashdot: Successful Software From Academia?
An anonymous reader writes "A lot of masters and PhD theses are about development of software targeting the solution or the automation of a specific problem. Bioinformatics, for example, has a lot of journals about software tools that are coded in academic environments; some of this software is the final result of a four-year PhD. But my question is, how much of this software will see the light outside the universities? I know of some examples, like BSD, but they are an exception, right? Is there any list of successful software created entirely inside universities' labs that became widely used?"
That seems silly. When I worked in a bioinformatics group as an undergrad, we use a *LOT* of software that was only used inside of a university, partially because the kind of research it targeted wasn't necessarily popular in commercial areas yet, and some because what we used was OSS and many commercial organizations preferred closed sourced alternatives (sometimes for speed optimizations, sometimes for support reasons).
Maybe you should define your criteria as widespread use in the context of the target field, rather than outside of a university?
That being said, I think a lot of it either directly or indirectly (through a third party reimplementation), does make it out.
Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
That work for you?
PostgreSQL
"I use a Mac because I'm just better than you are."
kerberos, ganglia, folding?
In this day and age, most good software developed in acadamia tends to get spun into a business venture that makes its academic developers very, very rich. See Google, for example.
#DeleteChrome
There was this company called Google that came out of some phD students' work. I think it's still around and doing business.
The problem with software in academia is that it is often devoted to a sole purpose. It is not a generalized solution -- conversely -- it's often a demonstration of a solution so specific that it's never been done. Hence the awarding of a title to the creator. On top of that the teams are usually small and time is usually tight. It's also usually a side effect of the greater thing, the thesis. It will always take a backseat to the theory.
... if it had depended on hardware or the constant change of text files like PDF and DOC, I think you can understand how hard it would be for academia -- let alone the originating researcher(s) -- to maintain and support for the community. An open source effort could pick up that slack but then who deserves credit for that work?
When software is widely adopted, it is because it has been widely supported and is a more generalized solution to a problem. If it uses hardware, it supports all kinds. If it reads or writes files, it covers all formats. This leads to widespread adoption but also takes a lot of time and a lot of contributions. If you're also working on your thesis, this is a daunting task to work on the side.
Nobody gets their PhD by making a predecessor's implementation support more file formats or hardware. So this is left to the licensing of the originator and the community -- who are often recognized as the real workhorses that go from prototype to actual usable software. That's why you don't find many PhD projects turned instant open source hit.
In bioinformatics , a relatively young field, most consumers of the software work in a lab and the input is fairly simple. But even with simple input they first had to agree on a format (those are just a few of what used to be many). BLAST and FASTA go back to the 1990s and 1980s respectively
My work here is dung.
Subject says it, X was mainly developed at MIT. I guess Ingress and Postgress where originally also university projects.
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
* Kerberos (Widely used, part of Active Directory)
* X11
* AFS (Andrew File System)
* MACH (Used by GNU HURD and OS X)
And that's just a starting sample.
-- Sometimes you have to turn the lights off in order to see.
And valgrind
I think most of the finite element/multiphysics packages started as research projects, either in university or government labs (some military, some conventional). For studying e.g. electromagnet design, heat deposition by currents /EM radiation e.g. microwave studio. Most of the radioactivation and nuclear shielding simulations used by the nuclear industry for designing radiation shielding are or were academic projects (e.g. MARS, FLUKA, MCNPX).
and LLVM
Subject of several theses:
http://www.tug.org/docs/liang/
http://www.pragma-ade.com/pdftex/thesis.pdf
https://www.tug.org/docs/plass/plass-thesis.pdf
(John Hobby's on METAPOST http://ect.bell-labs.com/who/hobby/thesis.pdf )
Probably others. More information at
http://www.tug.org/
and
http://www.latex-project.org/
and
http://wiki.contextgarden.net/Main_Page
William
Sphinx of black quartz, judge my vow.
It started out as someone's graduate research project in the late 80s/early 90s, and today it is the #1 aircraft design software tool in the world. Its installed in universities, aircraft manufacturers, aerospace consulting firms, and government and military institutions across the planet.
Disclaimer: I worked on the software after it went commercial.
Rocks clusters (http://www.google.ca/search?gcx=w&ix=c1&sourceid=chrome&ie=UTF-8&q=rocks+clusters) CHARMM (http://www.charmm.org/) Gaussian as an example of how academic-inspired software should NOT be commercialised (http://www.gaussian.com/)
The backend for quite a few compilers, and a few shader compilers...
Care about electronic freedom? Consider donating to the EFF!
Is there any list of successful software created entirely inside universities' labs that became widely used?
That is an odd restriction to make. Students are only at university for a short time. If their work during that time turns into something useful then they naturally continue it after they leave, either as a an open source project or as a business venture. This is how it is meant to work, and there are tons of examples of such software.
MATLAB and Maple were both created at universities and later commercialized. Same for SPICE. On the open source side there is Apache, Sendmail, PostgreSQL, and the original implementations of nearly every RFC protocol on the internet.
From Univ of Illinois - it arguably changed the internet from a tool for techies to a new way to do business. One of the problems is if something is really good commercial companies may morph it into products that eclipse the original; but their contribution, when though of as basic research, was invaluable. So the definition of success should not be limited to widely used, popular, or well know; but also include defined a new industry or way of approaching a problem.
I'm a consultant - I convert gibberish into cash-flow.
right up front: I know about this only because I work for these guys, but...
there's a whole host of Linear Algebra-related software written for high performance computing environments that is attributable largely to various teams of academics throughout the past 30 or so years. It is my understanding that these libraries get used by most anyone doing high-performance computing.
http://www.netlib.org/lapack/ http://en.wikipedia.org/wiki/LAPACK
The title of Linus' thesis is: "Linux: a Portable Operating System" - so yes, it counts.
The real question is, if it is enough that a project can trace it's roots back to a academia - even if >90% was added later and or by developers outside academia. I bet many products considered purely commercial started out started out in the back of the head of students during their studies. Many of those dropped out to build a company rather than stay and write a thesis about it. If you include those, and even consider some studying other majors than CS - your probably looking at the bulk of all software in existence.
Run with the lemmings, and you'll get your feet wet.
And QNX http://www.qnx.com/
Tesla was a genius. Edison however was a overrated hack who liked to torture puppies.
Frequently the software doesn't start in a given academic lab, so much as it starts somewhere in a given research community and propagates to the academic labs as research needs dictate. ImageJ, for example, started at NIH, but now it's available to all and in use all over the place (including my lab).
Other software is developed cooperatively, and then academic contributions are added as they're needed to enable someone's research. If you run R (the statistical program) and start looking through all the extensions available in CRAN, you'll see tons of additions that have been generated in academic labs and released for use by the wider research community.
I work in biomechanics, and I've seen a few programs come out in that field through largely academic development. AnimatLab began (I think) at Georgia Tech, and I think Cofer et al. are still developing it within the university. OpenSim started at Stanford as an open source musculoskeletal simulation program, and is vastly preferable to the godawfully expensive SIMM, which does pretty much the same kinds of things. OpenSim is still alive and well at Stanford, although the developer network spans multiple institutions, academic and otherwise.
Much as I might wish that I could spend more of my time developing programs and playing with software within the academic sandbox, more often it's simply more practical to cast the nets for software from someone, somewhere doing somehow similar research, and then using the software you find if it's useful to your work, rather than reinventing the wheel in favor of advancing academic software development.
"What's the use in being grown up if you can't be childish sometimes?" --Fourth Doctor, "Robot"
IIRC, rsync was the culmination of its original author's thesis.
tasks(723) drafts(105) languages(484) examples(29106)
FWIW, I'm a PhD student at a reasonably large institution in the US.
Very little of this stuff sees the light of day. The vast majority of software is written simply as a proof of concept for some particular method/system/algorithm in order to get published. Good conferences/journals will typically want not only a well thought out idea, but an idea that you can and have implemented it to some extent, and that it works. That having been said, most of what gets produced is complete and total garbage -- typically just enough code to be able to prove that something runs correctly and in a given amount of time.
Personally, I have written a bunch of junk code during my time here. I'd like to think I know more or less how to write good code after all these years, but writing good, well documented, well tested code takes time we don't have -- writing code is simply a means to an end (publication) -- and so most of the code I write is hasty and ugly. This even applies to code that people say is for "wide distribution".
Before you go hounding on academia however, I'd warn that writing "good code" isn't really the point of what we're doing -- the point is to produce a reasonable method of solving some particular problem or type of problem. Going into bioinformatics for example, there are a whole bunch of problems that involve performing more efficient analysis of certain types of graphs. If a researcher discovers something along these lines, he/she will likely write some junk code to prove that the bare algorithm works, perform some analysis of it, publish it and move on. This may or may not end up actually being a useful improvement -- if it is however, then some implementer whose actual job it is to code whatever medical software might be using this algorithm then has a basic blueprint of how to proceed.
As for some examples of software from academia that have made it out, let me think...
Coverity - static code analysis tool, started at Stanford then moved into being a startup and is now quite successful
PostgreSQL - Originally from Berkeley
Bro (Intrusion Detection System) -- written by a researcher from Berkeley/ICSI -- is still somewhat "in academia", but I have heard of several production deployments
That's all I feel like coming up with right now, but I think the general pattern here is that if/when some piece of software produced in academia is seen to have value in its own right (e.g., away from the original research/publication that spawned it), it typically gets spun off in a start-up or a more concerted effort is given to its development, at which point one can actually spend the time to write good code.
Blackboard.
*shudders*
Someone tell me their thesis was rejected...
SPICE is a general-purpose circuit simulation program for nonlinear dc, nonlinear transient, and linear ac analyses. Circuits may contain resistors, capacitors, inductors, mutual inductors, independent voltage and current sources, four types of dependent sources, lossless and lossy transmission lines (two separate implementations), switches, uniform distributed RC lines, and the five most common semiconductor devices: diodes, BJTs, JFETs, MESFETs, and MOSFETs. SPICE originates from the EECS Department of the University of California at Berkeley.
'The tyrant will always find pretext for his tyranny.' - Aesop's Fables
Both SAS and R were originally developed inside academic environments. I'd say they both enjoy a rather wide audience (one FOSS, the other rather on the expensive side).
BIND
BIND was written by Douglas Terry, Mark Painter, David Riggle and Songnian Zhou in the early 1980s at the University of California, Berkeley as a result of a DARPA grant. Versions of BIND through 4.8.3 were maintained by the Computer Systems Research Group (CSRG) at UC Berkeley.
Battlemaster--Game with friends in medival realms
the University at Champaign-Urbana lays claim to one or two projects that have some popularity ..
the Mosaic browser and its offshoots Netscape, Internet Explorer and Oracle Screens began there.
Javascript (as part of Netscape??)
Apache web server
Project Gutenburg
and, if 'travelling' across the universe fictionally counts as 'widely used outside of the university' then there is HAL in 2001, that (who?) claims to have been activated at the Urbana campus.
And Mach (kernel developed at CMU, used in NeXT and MacOS).
-Dave Haynie
There were two very different versions of SPICE - SPICE2 was a fortran program, and is the basis for the PC version PSPICE (Microsim>OrCAD>Cadence) and minicomputer version HSPICE, though many newer simulators are based on the code for spice3 re-written by a subsequent Berkeley effort in c. Its legacy in electronics engineering is such that even independently generated simulators (Eldo, spectre) rely on the conventions and methods from SPICE, though incorporating incremental improvements (a new algorithm here or there, and distinguishable mainly by how it differs from SPICE).
whatever is - the music is
http://moodle.org/
The 'problem' with bioinformatics is that the field is extremely broad. Unless you write BLAST or one of the big sequence assemblers, your software is only going to appeal to a tiny fragment of an already small bioinformatics community.
I wrote software as part of my Ph.D. that is now distributed world wide. I guarantee you've never heard of it - it sets the standard for how to do certain types of phylogenetic analysis, but almost no one does that analysis.
During my time as a postdoc, I wrote a very simple curve fitting routine and put a minimal GUI on top of it. I am now getting requests from multiple countries to modify it to read in files from their instrumentation. Once again, only the tiniest handful of people care, but for those people, this is revolutionary stuff.
The question here is, how do you define success? Like a lot of the responses to this thread, I wrote a small script here or there to solve my own problem. Turns out, it solved a problem for someone else, too. My best known piece of software was a hack, a one-off script, written in an afternoon, that I got yelled at for even bothering to spend time on, and was only ever intended for my own use. It turned out to be the lynchpin for our project, got published in a peer reviewed journal, and has since gone global. I found out later that one of my undergrad computer science profs had solved the same problem 20 years before I did, in a more elegant way, and published it in a good, but non-science, journal - no one has ever heard of it.
Neither of us had the expectation that our software would amount to much. I would define the prof's work as 'successful' - he published a paper on an interesting academic topic. I would define my software as 'wildly successful' - I got an unexpected publication and a global (if small) user base, along with a reputation for fixing problems that would later get me a good postdoc position.
This isn't really an academia question. The most common advice in the open source community is 'scratch an itch'. Write something to fix a problem you see. If you write good stuff, maybe your code will become 'successful'. Or, maybe your afternoon worth of hacking will just turn into an afternoon worth of experience you can apply to the next problem.
-V-
Who can decide a priori? Nobody.
-Sartre
Also LAMMPS and DLPOLY, but they are a bit more niche. The ones you mention are used a lot in big pharma these days, for example.
Staying on the chemistry/chemical physics front, quantum chemistry codes like Gaussian all came from academia.
Isn't the first one that comes to mind the world wide web? CERN is definitely academia. I'd imagine many other protocols originate in academia. Any idea about SMTP, Usenet, etc.?
BSD, X11, Mach, PostgreSQL, and SSH were all explicitly academic projects.
There is also a question about what qualifies as academia beyond simply universities and government labs. Linus Torvalds started Linux while a PhD student but later landed in industry. Bjarne Stroustrup worked at AT&T Research when he started C++ but he landed at Texas A&M shortly after.
Virtually all programming languages originate in or near academia : Lisp was MIT. Python was started at CWI. Haskell. OCaml. etc. Among the non-academic languages most originate within huge organizations who's research departments start to resemble academia : Smalltalk was PARC. Fortran and Cobal were IBM. C was AT&T. Erlang was Sony. etc. Java and Perl were seemingly further from academia, but academia's influences upon them abound.
Afaik, all computational libraries used for serious numerical programming, like stock trading, computational fluid dynamics, etc., were developed in academia.
The Christian religion has been and still is the principal enemy of moral progress in the world. -- Bertrand Russell
It started out that way - but by the time Linus graduated in 1997, linux had become a huge thing, and I bet that if he hadn't made it the topic of his masters - he wouldn't have finished at all.
It started out as a method for Linus to access his work on the school Minix computers (source: Just for Fun). He later did use it as part of a Masters project for doing multi-architecture Operating systems, but that's it. It is mostly a development as a personal (prior to 1995, part-time/full-time without pay while he pursued academic degrees) and commercial (since 1995 when he's been paid to work full-time on it) project.
Truth is like the sun. You can shut it out for a time, but it ain't goin' away. - Elvis Presley (source: imdb.com)
Just to nitpick a bit, Erlang was developed by Ericsson, not Sony.
archie -- Princeton?
CAP (appletalk for Unix) -- Columbia
cops/tripwire -- Purdue
GNU everything -- MIT
Gopher -- Minnesota
Kerberos -- MIT
Khoros -- New Mexico
Mach -- CMU
NNTP -- UC San Diego
Mosaic -- Illinois
sendmail -- UC Berkeley
BSD -- UC Berkeley
RCS -- Purdue
Usenet -- Duke/UNC
tcl/tk -- UC Berkeley
multi-CPU Unix -- Purdue
cu-seeme -- Cornell
I'm sure I'm forgetting quite a few. And of course not all of these are STILL successful, but in their day they made their mark, and often paved the way for other projects.
As I note upthread, virtually all important programming languages originated in academic-like environments, even if they are officially corporate.
There are I think two revolutionary non-academic programming languages :
- Smalltalk was developed by Xerox PARC, but ultimately created object oriented programming, which certainly used academia to gain traction.
- C was developed by AT&T, but completely revolutionized our world. It's almost surely the most important language ever written. There had been structured languages before. I think Fortran and Cobal were developed by IBM. And academia had all it's research and teaching languages. Yet, it was C that brought structured programming and type-safty to system level programming, previously dominated by assembler. Imho, const is pure genius. C could not help but succeed with or without academia, but AT&T was still a fairly academic environment at that time.
In other words, your classification of generalized academic project doesn't include either afaik, but clearly both can fall under some generalized academia. You could not design C, and maybe Smalltalk too, without thinking deeply about languages from a hybrid academic and industrial perspective. If you pursue a blind industry perspective, you create garbage like PHP or VB.
The Christian religion has been and still is the principal enemy of moral progress in the world. -- Bertrand Russell