The Effect of Programming Language On Software Quality
HughPickens.com writes: Discussions whether a given programming language is "the right tool for the job" inevitably lead to debate. While some of these debates may appear to be tinged with an almost religious fervor, most people would agree that a programming language can impact not only the coding process, but also the properties of the resulting product. Now computer scientists at the University of California — Davis have published a study of the effect of programming languages on software quality (PDF) using a very large data set from GitHub. They analyzed 729 projects with 80 million SLOC by 29,000 authors and 1.5 million commits in 17 languages. The large sample size allowed them to use a mixed-methods approach, combining multiple regression modeling with visualization and text analytics, to study the effect of language features such as static vs. dynamic typing, strong vs. weak typing on software quality. By triangulating findings from different methods, and controlling for confounding effects such as team size, project size, and project history, they report that language design does have a significant, but modest effect on software quality.
Quoting: "Most notably, it does appear that strong typing is modestly better than weak typing, and among functional languages, static typing is also somewhat better than dynamic typing. We also find that functional languages are somewhat better than procedural languages. It is worth noting that these modest effects arising from language design are overwhelmingly dominated by the process factors such as project size, team size, and commit size. However, we hasten to caution the reader that even these modest effects might quite possibly be due to other, intangible process factors, e.g., the preference of certain personality types for functional, static and strongly typed languages."
Quoting: "Most notably, it does appear that strong typing is modestly better than weak typing, and among functional languages, static typing is also somewhat better than dynamic typing. We also find that functional languages are somewhat better than procedural languages. It is worth noting that these modest effects arising from language design are overwhelmingly dominated by the process factors such as project size, team size, and commit size. However, we hasten to caution the reader that even these modest effects might quite possibly be due to other, intangible process factors, e.g., the preference of certain personality types for functional, static and strongly typed languages."
e.g., the preference of certain personality types for functional, static and strongly typed languages.
My guess is that this has a bigger impact on most projects than actual features of a chosen language. I was thinking it the whole time I read the summary and then, sure enough, it's mentioned as a disclaimer at the end...
Creationist Textbook Stickers Declared Unconstitutional by CowboyNeal
"We aren't saying that functional, static and strongly typed languages are inherently superior. We're just saying that if you don't prefer them, maybe you aren't really cut out for programming."
For a project starting in 1983, and expected to last 8 months, Basic seemed like a good idea. By 1995, it should have been obvious to everyone that a re-write in ANYTHING ELSE was justified (not that I would personally recommend using Perl to write a Basic interpreter to re-interpret the original Basic code or using Snobol4 to translate the Basic into Fortran).
I could have used C instead - the project would probably have taken a couple of weeks longer, but would have saved countless people years of grief. I have C programs from the 80's that compile on *BSD unchanged, and still work as intended*. It was a toss-up at the time.
My point is that the language choice may be influenced by incorrect information about the external world - because the external world is subject to massive change.
* I had to rewrite some C from the 70's cos they were written for Idris and all in capitals :-{ Fortran4 programs from the 70's may compile and run, but you certainly need to re-test them!
Yes its true: my lawn is written in Fortran, but my Mum's has an Ibjob border.
Sent from my ASR33 using ASCII
The problem with these kinds of studies is that there is no actual way to objectively measure software quality. You can correlate all the data you want, but garbage in means garbage out.
For this study they used two thinfactor gs to determine software quality: one is the number of bugfix commits. Ugh. I'm not even clear if the number of bugfix commits means higher quality, or lower quality. That could go either way. It might mean you had better testers, or that you committed things in small batches, or that you had more branches. The other factor was a natural language processor that read the check-in comments. While this is a really cool idea, you would need a lot of research just to prove that this approach actually works before you can start using the technique to draw conclusions about some other data.
So while this was very cool, and very ambitious, the results are completely meaningless until someone can prove that this technique actually measures software quality at all.
I don't care what they say, software written with Emacs is way better than software writen with Vi!
Stop-Prism.org: Opt Out of Surveillance