Which Programming Languages Are Most Prone to Bugs? (i-programmer.info)
An anonymous reader writes:
The i-Programmer site revisits one of its top stories of 2017, about researchers who used data from GitHub for a large-scale empirical investigation into static typing versus dynamic typing. The team investigated 20 programming languages, using GitHub code repositories for the top 50 projects written in each language, examing 18 years of code involving 29,000 different developers, 1.57 million commits, and 564,625 bug fixes.
The results? "The languages with the strongest positive coefficients - meaning associated with a greater number of defect fixes are C++, C, and Objective-C, also PHP and Python. On the other hand, Clojure, Haskell, Ruby and Scala all have significant negative coefficients implying that these languages are less likely than average to result in defect fixing commits."
Or, in the researcher's words, "Language design does have a significant, but modest effect on software quality. Most notably, it does appear that disallowing type confusion is modestly better than allowing it, and among functional languages static typing is also somewhat better than dynamic typing."
The results? "The languages with the strongest positive coefficients - meaning associated with a greater number of defect fixes are C++, C, and Objective-C, also PHP and Python. On the other hand, Clojure, Haskell, Ruby and Scala all have significant negative coefficients implying that these languages are less likely than average to result in defect fixing commits."
Or, in the researcher's words, "Language design does have a significant, but modest effect on software quality. Most notably, it does appear that disallowing type confusion is modestly better than allowing it, and among functional languages static typing is also somewhat better than dynamic typing."
Brainfuck
You already have to be a genius to understand functional languages, so of course those people make fewer mistakes.
I love it when functional fans insist it's more analogous to how the brain really thinks. That's why so few people can figure out how to do things that way.
Rediscovers how great Ada would have been for the consumer.
Domestic spying is now "Benign Information Gathering"
" On the other hand, Clojure, Haskell, Ruby and Scala"
Yeah, those are niche languages. Good, autistic programmers seek out niche languages and write clean code since they're the only ones working on it.
The mainstream languages are the ones you do at work, with a couple shitty coworkers, and an endless amount of scope creep and impossible deadlines that creates a spaghetti nightmare.
Or could it be that the software written in C++ usually tends to be large complex software where performance is important along with various other complicating factors. While the software written in ruby for example tends to be simpler?
Sounds like this 'study' started with a conclusion already in mind.
Despite every example online... it is possible to write clean perl. That said, I've not seen any reasonable dbi example. It's all the same bad garbage over and over. This led all the Python followers to declare there is no good perl and no bad Python.
Several years later, indeed there is now lots of bad Python in existence! As time marches on there are even more examples of this behavior.
I would argue any language of sufficient age will have an equally large sum of poor examples.
Languages can sing and it takes work to make them so so.
Something the linked article didn't seem to address it that the population for each language will differ. The average Haskell programmer is going to be very different from the average C++ programmer, or, god forbid, the average Python programmer.
Also, while they did try to address problem domains, I don't think they addressed systemic issues. For historical reasons, there are many projects which use C or C++ simply because of what they need to interface with to get the job done. For instance, there simply aren't going to be that many browser projects which aren't written in C++.
Personally, I think the interesting take-home is not the difference between languages, it's how small the number of commits for security and memory issues was.
This is an interesting study, but I don't know if the results can be extrapolated to include closed source software.
My problem with this is that I don't see any evidence of:
a) Projects in the study have a published project plan with somebody managing it at a high level (I would think the Linux Kernel could be thought of as having a plan with strong central management ). I tend to believe that projects in which multiple individuals (with varying levels of understanding of the software, the app's background and issues experienced during development) would be at a much lower quality level than something managed by a strong, continuous team - this doesn't seem to be a consideration when I RFTA (popularity of projects seems to be a bigger issue).
b) Different development tools used by different developers. In terms of the C/C++ typing issues, Windows software developed and built in Visual Studio, Eclipse Text Editor with MinGW or something like Komodo Edit with Cygwin and user written make files will identify different typing issues and may generate code that works differently, especially in regards to identifying and handling typing issues. I would like to know how many bug fixes are the result of something that isn't flagged and works fine on VS and doesn't work when built in MinGW, leading to a fix.
b.1) I'm not 100% sure of the methodology used in this study, but wouldn't a file that originally had tabs for indentation that an editor automatically changes it into spaces be misidentified as a "fix" if it's uploaded back into the repository? This is a combination of b) and c).
c) Different coding styles. I know of several Open Source projects in which a developer has re-formatted code simply because they don't think it's in the "correct" style and they have difficulty reading it resulting in them changing it so they can follow it better. To be fair, I'm sure a lot of us have done that because some people have very different and strongly felt ideas about how code should be formatted.
d) Lack of formal testing methodologies. I don't think many Open Source projects have strong, automated regression testing processes and methodologies before allowing a new release.
e) Difference in functional use of different languages. I would think that methods written in C, C++ and Objective C would be providing more low-level functionality than Clojure, Haskell or Scala. Ruby probably fits somewhere between the two groups.
Comments?
Mimetics Inc. Twitter
More like difficulty of task.
If your coding in C, you can pretty much guarantee you're doing something low level, complex threaded and difficult.
C++ and it could be some major app.
If you're using a nice fluffy wrapped language, it will often be used for some office 'form' style application.
The idea that bugs stem from the programming language and not from the complexity of the task being tackled is bogus.
I know I'll get flamed for this, but Python is really error-prone in a particular area, and that's its ridiculously weak name resolution rules. In a language like C, Perl, or even PHP, names are resolved during the compile phase. The compiler knows which definition of a name is going to be used at any point. Python doesn't have this - when it runs across a name, it walks up the scope hierarchy looking for a candidate.
This means that code can run happily for months or even years, until it just crashes with an undefined name error. This could be because of a rarely-used code path with a typo in it, botched refactoring of a rarely-used code path, or a particular set of rare circumstances where a global name isn't set before the code gets to a certain place.
The usual response is that unit tests should catch this. But let's face it, 100% unit test coverage is pretty rare, particularly for the kind of fast turnaround stuff that Python's frequently used for. Also, unit testing isn't necessarily going to simulate a corner case where a global doesn't get set before code that uses it executes. It also makes refactoring more risky because there's no point where the compiler can tell you you're referencing a name that's no longer defined, or no longer has a certain method/field.
This is the kind of area where it's really useful if the compiler can help you, and Python's ridiculously weak name resolution rules make that completely impossible.
source http://wiki.c2.com/?AplLanguag...
[6] L(L':')L,L drop To:
[7] LLJUST VTOM',',L mat with one entry per row
[8] S1++/\L'(' length of address
[9] X0/S
[10] LS((L)+0,X)L align the (names)
[11] A((1L),X)L address
[12] N0 1DLTB(0,X)L names)
[13] N,'',N
[14] N[(N='_')/N]' ' change _ to blank
[15] N0 1RJUST VTOM N names
[16] S+/\' 'N length of last word in name
As mangled as the above if you look at source link with it properly formatted with the correct characters it isn't any better.
Actually thinking about it, it was always easier to just code a new function than try to read someone else's old stuff.
The base concept is bulls**it on its own.
It's more like spoken or written human languages to me:
You need to study, learn and practice before being proficient.
If you think that you need a fast solution, then the language you know the best is among the right solutions.
Assembly isn't more error prone than English.
It just depends whether you are or not an idiotic programmer or a easy-going speaker.
Sent as ripples into the electromagnetic field. No single photon has been harmed in the process.
Python program can be very self-diagnostic. Something goes wrong, it presents as an exception traceback from an uncaught exception.
A lot of bug reports I get go like this: Someone sends me a screenshot with a traceback, I look up the line of the error, find that the error is obvious, fix it, commit the fix, and I still have time for a cup of coffee before 5 minutes have passed. The reporter may not be happy because they can't get on with their work until I cut a new version, but other than that this sort of bug is of very little consequence: no data files have been corrupted or anything like that.
Then there's the other kind of bug, the subtle kind where everything seems to be working fine, but someone checked the output and it just isn't right: the totals on the report don't add up or something. These are the hard ones. And then you have to dig in and hypothesise and experiment and bisect and so on. Of course those bugs happen in Python programs as well.
But I bet the kind of bugs that put Python over average are the first kind, and that Python is below average on the second kind. Which is a good tradeoff.
Comparing PHP with Scala is like comparing "Game of Thrones" with "Ulysses".
Any n00b can program something useful in PHP within an hour. That's the whole point of PHP. That's why we have such amazingly feature complete systems like WordPress. Given, the architecture of these PHP systems is so bizarre any reasonably seasoned programmer will not believe his eyes when he looks at the actual code - but it does work (most of the time) and it is useful.
Scala is a programming language that forces you to know what you are doing. Yeah, no shit it has less bugs. If I don't know what a JVM and what bytecode is, there is little chance I'll even get started with Scala. Only an experienced Programmierung will get the point of Scala in the first place. Thus Scala code has less bugs. No surprise here.
My 2 cents.
We suffer more in our imagination than in reality. - Seneca
True, but equally true that older code has been exercised for far longer, and may have been reviewed by many different people.
FTFA:
"Project age is included as older projects will generally have a greater number of defect fixes; the number of developers involved and the raw size of the project are also expected to affect the number of bugs and finally the number of commits is bound to."
What those who want activist courts fear is rule by the people.
2) C++
3) PHP
4) Javascript-based fameworks
5) Anything used to write an Excel or Word macro by the HR department
This is an unfair comparison: PHP specifically targets producing buggy products, and in the unlikely event that an HR compartment gets anything to work, it is even more unlikely to involve a computer.
Sent from my ASR33 using ASCII
Do javascript problems actually get fixed? Based on my experience with our ever deteriorating Internet, bugs in javascript "programs" live on forever.
You can't see ANYTHING from a car, You've got to get out of the goddamned contraption and walk...Edward Abbey
For whatever reason APL always reminds me of Arthur C Clarke's classic story "The Nine Billion Names of God." If anyone ever writes a readable APL program perhaps the stars in the sky will, without any fuss, go out.
You can't see ANYTHING from a car, You've got to get out of the goddamned contraption and walk...Edward Abbey
$subject says all, in that order, Not counting auto* script clusterf*ck ;-)
As a long-time but hobbyist programmer (started with machine code before moving to A86 assembler), I found the discussion thoughtful and illuminating. I will therefore offer a snide comment that may contain a grain of truth, based on something I heard when I first began to code, but updated a bit. "Strong typing is for weak minds. But coders overestimate the strength of their minds."
In general, it will be whichever language is used most by new, barely literate programmers.
That in of itself makes the results next to useless.
In particular, considering C++ pre 2011 and after (c++11) as the same language from a prone-to-bugs POV is ridiculous. Sure, since it's backwards compatible you can continue to shoot yourself in the foot like it's 2010 (or 2000 - sheesh!) if you really want to, but if you're using C++ nowadays and having problems like memory leaks or dangling pointers then YOU are the problem, not the language.
I'm sure other languages have similar issues - if you don't use the latest features put there to help you, then who's fault is it?
This summary also makes me question the methdology:
Seems a bit suspicious that it's all the widely used languages, regardless of whether they are low level systems programming languages like C/C++, or high level scripting languages like Python, are per-their-way-of-measurement, more buggy than all the way more obscure ones such as Clojure, Haskell, Scala...
Shit. I think I've been trolled.
This is one of those flamebait topics that is basically pointless to debate. It's too general. There are too many ways to define a bug, and many of them depend heavily on indirect/abstract qualities of the language, such as what sort of people use it or what sort of problems it's most commonly used to solve. It's just impossible to remove enough of the unknowns and side-effects of one sort of bug to give a useful answer on any other.
For example. if you're going to judge a language on the code-to-fix ratio of commits, does that tell you how prone the code is to having bugs in the first commit, or how easy it is to find and commit fixes? Those two alone say very different (and somewhat opposite) things based on the same metric measurement.
I'm only going to scratch the surface here and try to list off a few of the things you could say make a language "more buggy":
- typical user has below-average coding skill in general
- typical user doesn't follow standards well
- poorly defined standards
- hides bugs with coercion, ignoring return values, and other methods of ignoring behaviors which could easily be either a shortcut or a bug
- scope is wide or unbounded (a very relative metric)
- tends to be used to solve more complex problems
- used in new fields where users are less educated in the problem they are trying to solve
- poor debugging feedback
- poor quality crash dumps
- poor quality of available or more commonly used compilers
- compiles directly to target platform rather than using an established post-compiler
- use of optimizers that haven't undergone rigorous testing (edge-bugs in optimizers are a PAIN to debug)
- poor documentation / built-in help
- average reliability of most common target platform
Even with this limited set, I don't see how you could assign weights to be able to get any more than even a rough comparison between two languages. I think I could compare any two languages and make a case for either being better with regard to bugs. (and instead of you suggesting two for me to try, do it yourself... take those two you have already jumped on after reading that last statement and go try for yourself instead of challenging me)
I work for the Department of Redundancy Department.
It would be no different in C++ with end users programming in the modern idiom on top of mature application libraries that support and encourage the modern idiom.
C++ is the GTA III of programming languages.
cout << "Open, world!" << '\n';
Looking at programming languages is good but this report implies there are other factors more important at play. What is the demographic of a good programmer? What is the marker of a good programmer who does not produce bugs? Ivy league vs. "Scheme certificate in 90 days" training programs? Wyoming programmers vs. California programmers? Just graduated 20 somethings vs. 50 year olds? Traditional CS programs vs. explicit software development training programs?
We all have our biases, but let's see what, if anything, pops up.
putting the 'B' in LGBTQ+
Perl is in there on the low side with the strictly typed languages. I guess this is because in perl no one can actually read the code in order to work out the bug later - hence a win for minimizing bug fixes.
Nah .. its' because even if you pound on the keyboard with sheer frustration at Perl EG %&%^@$#&%^&(* you still produce a valid program that solves a problem.
I am Slashdot. Are you Slashdot as well?
While there are professional Haskell programs, there are lots of amateur projects on github that don't get much love from the authors. There is certainly a real effort to filter out the many small and insignificant projects, I feel that someone learning Haskell that is a bit obsessed with lots of small git commits can create a large enough commit history to get through the survey's attempts to gather statistics on different languages.
Basically I think the differences between the languages can be explained by the differences in the behavior of the people who are drawn to those languages.
A moderately experienced programmer with a methodical approach should be able to write code with few bugs and detect and address those bugs quickly, no matter which language they use. Of course carefully writing your web blog back-end in Brainfuck is a lot more work than doing it in PHP and I'd expect it to take a lot longer time. In terms of quality it's hard to quantify it based on language alone, except perhaps in very large projects with many contributors.
“Common sense is not so common.” — Voltaire
convert problems I don't know how to solve into problems I *do* know how to solve. That's what programming is.
So using that methodology, I have to ask here: which programming languages are the most popular?
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
And the most buggy software is always C, C++, Objective C or anything else that encourages a human to manipulate pointers and/or memory.
Most buggy framework? WPF, by far. I haven't had to test UWP yet, but it looks like an even bigger, overly elaborate clusterfuck.
Sometimes newer is not better. I despised MFC, loved Winforms, which is inflexible and dull as dishwater, but simple, obvious and easily testable (everything Microsoft appears to hate). It's all been downhill after that.
And don't get me started on ASP. I'm pretty sure the rise of the recent spate of rather nicely designed javascript frameworks happened because the ASP collection ranges from mediocre to Godawful.
Please do not read this sig. Thank you.
I'm stuck using MS PowerShell at work, but PowerShell never appears in these reports about popularity, bug avoidance, etc. Not sure whether it is low usage, type of projects, or that it's for Windose.
This survey can't tell us anything about how hard it was to find those bugs ( time between reporting and fixing ) though. That would be a very interesting statistic if it could be produced.
You live and learn, or you don't learn much.
Linux and gcc could never be written in a high level language. Imagine writing Linux in lisp.
Ehm...not sure about Linux, but wouldn't GCC in Lisp be merely extending the MELT framework and the RTL syntax into the rest of the compiler?
Ezekiel 23:20
I have written a compiler in Prolog, and I assure you it was better than writing it in C.
The dangers of excessive individualism are nothing compared to the oppressiveness of excessive collectivism
keep crying fagbots
Here's the thing. It's sort of an open secret. Open Source code sucks. It's always sucked. The only time it doesn't suck is when rigid standards are applied, or you have huge groups of professional programmers working on it. IBM and Microsoft both contribute huge amounts of open source code every year. Most of it is pretty solid, well written by pros.
That's the exception though.
I've worked with hundreds of open source projects over the years, even tried to manage a couple of them. And the reality is that the code that comes in is generally learning code. Stuff written by guys and gals who are picking up the language, whatever language it is for the first time. And that's not bad. It gives them valuable experience on a living app, and it gets me the functionality I want to download and use. Hopefully, without having to redevelop it too much. See, at the end of the day, I'm a pragmatist. If it does what I want it to do, and it doesn't incinerate my hardware or cause my vm to stop responding... I just don't care anymore.
Anyway, that's why you can't depend on open source code, most of the time, for something like this. If you are going to use Github to do it, you're going to need a much larger sample, if it's even possible to come to the conclusions the premise of the survey is trying to figure out. I would be curious to know if projects with only one person on them were more likely to be buggy than projects with active teams. Or if there's a difference in the rate of defects with professional teams vs amateur programmers. I don't think looking at the language itself is ever going to yield anything actionable, but there's a lot of meta around this that could be interesting.
This signature has Super Cow Powers
A programmer is a programmer. Some are born to program, others are programming to implement an idea. Some look at the code as an atomaton, where others just look at code as a set of frequently travelled links, ignoring wrong (you should not do take those traversal states).
You can program badly in any language. If you report C or C++ has having the most errors, its for a few reasons vis:
Most used languages of all the languages is C and C++. As a percentage of bugs reported for number of lines of code, add a new measure, the dimension of programmer's skill, you should find no differences between them and frequency of errors.
C and C++ have advantages of portability, and minimal execution overhead. C and C++ are not going away soon. Rust is a new language which, if you do multi-tasking, has built-in safety / type / decendant call checks for variables that are constant and or mutable. But the number of github entries for Rust is insignificant, when compared to the C's.
I measure the quality of code by the skill of the programmer and inversely proportional to the number of lines in a module.
Leslie Satenstein Montreal Quebec Canada
WordPress is an insane mess. The simple task of migration to another domain is nigh impossible with WP. With relatively well built systems such as Joomla it's done in a few minutes.
However, unlike my attempt 7 years ago with Joomla, I've actually manged to make a decent living of WordPress in the last 4 years and continue to do so. Thus I have decided to deal with the mess. It's sort of like bloated ERP Setups. They may be expensive and they may be outdated, but they provide an endless opportunity for stuff that needs to be done and can get billed. ... And at least it is open source.
Bottom line: Count me in on the never ending WP craze.
We suffer more in our imagination than in reality. - Seneca
yes well statistics like that (like most actually) are usually pretty much built on incomplete datasets so they show (if at all at least) what 'might' be the common denominator. I could read (without rtfm) that the languages that are around the longest and used most have had more faulty code written over the time period in which they have been used and as you so eloquently say, it doesnt really seem to stat the skill / xp / yearsofservice and knowledge of one or multiple levels per specific dev involved per language per project per number of bugs (lawl) etc... its new year, i guess the news need something to write about but i suppose they read something else
Free speech was meant to be free for all... how can anyone grow up in a nanny state ?
All the ones used by humans? Seriously, the survey taken in this manner is more than likely, biased. Just because someone uses Github doesn't mean they put all their bugs in Github (which skews the numbers), furthermore, its likely that newer languages may be using techniques which allow a lot more bugs, but they get found because of process and aren't being checked in with bugs because of it. Translation, modern methods have a lot more to do with lower bug counts in newer languages, then anything I'd imagine.
Here we go again - people with their eyes closed saying how dark it is.
It's like saying that French is impossible to read or speak when you've never even tried to learn the language.
I've worked several large, complex systems written in APL and maintained by one or two people. I've also seen any number of failed attempts to re-write an APL-based system, usually with teams several times the size of the APL one.
What's a "good" abstraction is often in the eye of the beholder. My favorite abstractions have driven others batzo. Generally one has to target something that is likely to be digestible to a typical developer. If you target the elite and/or only others who think like you, it gives the organization fewer hiring choices. It may be great job security for yourself, but it is not pleasant for the organization. Don't be selfish.
Table-ized A.I.
The missing factor here is that they did nothing to factor in the exposure (how many hours the software has actually been used). That has a huge effect on the number of defects discovered. All this could really mean is that C++, C, and Objective-C, also PHP and Python are what most heavily used software written in.
An engineer who ran for Congress. http://herbrobinson.us