Which Programming Languages Are Most Prone to Bugs? (i-programmer.info)

← Back to Stories (view on slashdot.org)

Which Programming Languages Are Most Prone to Bugs? (i-programmer.info)

Posted by EditorDavid on Sunday December 31, 2017 @07:00PM from the analytics-from-GitHub dept.

An anonymous reader writes: The i-Programmer site revisits one of its top stories of 2017, about researchers who used data from GitHub for a large-scale empirical investigation into static typing versus dynamic typing. The team investigated 20 programming languages, using GitHub code repositories for the top 50 projects written in each language, examing 18 years of code involving 29,000 different developers, 1.57 million commits, and 564,625 bug fixes.

The results? "The languages with the strongest positive coefficients - meaning associated with a greater number of defect fixes are C++, C, and Objective-C, also PHP and Python. On the other hand, Clojure, Haskell, Ruby and Scala all have significant negative coefficients implying that these languages are less likely than average to result in defect fixing commits."

Or, in the researcher's words, "Language design does have a significant, but modest effect on software quality. Most notably, it does appear that disallowing type confusion is modestly better than allowing it, and among functional languages static typing is also somewhat better than dynamic typing."

28 of 247 comments (clear)

Min score:

Reason:

Sort:

Honorable Mention by Ukab+the+Great · 2017-12-31 19:01 · Score: 3, Funny

Brainfuck
1. Re:Honorable Mention by El_Muerte_TDS · 2017-12-31 20:01 · Score: 3, Funny
  
  I have never heard of a large scale production problem happening in a application written in brainfuck. So by that metric it is not really error prone.
In before Fractal of Bad Design by Waccoon · 2017-12-31 19:05 · Score: 4, Interesting

You already have to be a genius to understand functional languages, so of course those people make fewer mistakes.
I love it when functional fans insist it's more analogous to how the brain really thinks. That's why so few people can figure out how to do things that way.
1. Re: In before Fractal of Bad Design by _merlin · 2017-12-31 20:25 · Score: 4, Informative
  
  Pascal and ANSI C are very similar, but pre-ANSI C is a completely different beast, far more similar to BCPL. In fact, ANSI C could almost be described as Pascal with C syntax.
  Pre-ANSI C didn't have prototypes - it assumed any undeclared name was an external function. It didn't automatically convert int to long if the function expected it, etc. - you had to explicitly cast. You had to be careful to cast results of functions correctly, too. All it had was a set of rules for how argument types were stacked, and it was up to you not to pass something a function wasn't expecting. This is closer to assembly language programming than Pascal or ANSI C.
2. Re:In before Fractal of Bad Design by angel'o'sphere · 2017-12-31 21:55 · Score: 2
  
  Functional programming is not more complicated then other imperative programming styles.
  But unfortunately functional languages like Haskell often have a strange syntax, that is all.
  
  --
  Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
3. Re: In before Fractal of Bad Design by Anne+Thwacks · 2018-01-01 00:00 · Score: 4, Informative
  
  Assembly to C was quite easy as well.
  Which is hardly surprising, since C is just PDP11 assembler tidied up a bit.
  Which is interesting considering the PDP11 was "a hardware Fortran machine", and the i386 architecture is a close copy of the PDP11 - the 286 even copied the early (variable page size) PDP11 memory management scheme that was the fashion before people figured out it was not really good for virtual memory, and then the 386 copied the later (fixed page size) PDP11 MM which is needed for virtual page based memory. The PDP11 was designed on the assumption that memory bandwidth was the bottleneck in throughput.
  The fact that C has been successfully ported to almost all more recent processors is largely because the concepts of what a CPU is and does have been developed by people who grew up on the PDP11 architecture. RISC architecture may be different, but, in reality, the RISC architecture is mostly just used to simulate CISC anyway. RISC was designed on the assumption that instruction decode was the bottleneck - which has not really been the case since the 1980's - hence the failure of RISC to displace CISC.
  I am writing this on a Sparc64 (RISC) machine - the story is more complex that I describe here.
  
  --
  Sent from my ASR33 using ASCII
4. Re: In before Fractal of Bad Design by Z00L00K · 2018-01-01 01:50 · Score: 2
  
  And one way around it was to declare the function before it was used - that did help a bit sometimes. Not always, but sometimes - and depending on compiler.
  But Pascal, C, Java, C++ and Ada all belong to the same programming paradigm, with or without the object orientation twist.
  Then we have Basic and Fortran which sometimes have the paradigms from Pascal et.al. but also other ways not related to them.
  Cobol is in turn an animal of itself.
  And then you can turn to Erlang, Haskell and Prolog for yet another way to do things.
  
  --
  If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
5. Re: In before Fractal of Bad Design by imnotanumber · 2018-01-01 06:50 · Score: 3, Informative
  
  So you are saying that the years since BWK wrote that article have given us even more reasons to dislike Pascal, such as the fact that the only versions that are useful are either dead for 20+ years (Turbo Pascal) or need vendor-proprietary extensions (Delphi)?
  No. You have fallen prey to the common hype.
  There is, the GCC of Pascal world, that is Free Pascal https://www.freepascal.org/ : an Open Source version with a modern syntax and concepts. The complaints on the article seem someone complaining about Linux arguing that you have to compile the Kernel to add mouse support.
  Also, Lazarus https://www.lazarus-ide.org/ is a modern Open Source IDE that picked where Delphi stop and you can develop and compile applications for Windows, Linux and OSX.
6. Re: In before Fractal of Bad Design by imnotanumber · 2018-01-01 10:50 · Score: 2
  
  So there is a hodgepodge of mutually incompatible dialects of Pascal that are available, some of which are useful?
  Practically, Delphi and Free Pascal (with Lazarus being the IDE that uses Free Pascal as a compiler) have the majority of mind share in the Pascal World, and they are mostly compatible. A few years ago Delphi used the Free Pascal Compiler to target iOS. Of course, those who sell Delphi are not very keen to advertise the Open Source competitor...
  
  This is not helping Pascal's case, and makes it look more and more like my C++.NET analogy was accurate.
  Most criticisms come from people that are completely out of touch with the modern Pascal Compilers/IDEs. Just take a look at http://newpascal.org/assets/mo... to learn its current state.
2018 by AHuxley · 2017-12-31 19:08 · Score: 4, Insightful

Rediscovers how great Ada would have been for the consumer.

--
Domestic spying is now "Benign Information Gathering"
1. Re:2018 by serviscope_minor · 2017-12-31 20:49 · Score: 2
  
  Is that really the case? GNAT is part of GCC isn't it, which allows you to use the output and the libraries without counting as a derived work.
  
  --
  SJW n. One who posts facts.
2. Re:2018 by HiThere · 2018-01-01 05:33 · Score: 2
  
  The first problem I had with Ada was that strings of different lengths were of different types. That could be solved with bounded or unbounded strings, but then you couldn't compare against literals. There were other problems, and there were always ways around them, but I had to used unchecked conversion too often for things that were perfectly safe. The entire thing was a mess due to over-concern with simple type conversions. Dynamically allocated storage is also a mess, though, given the original ideas, quite reasonable. Follow an Ada program through a simple addition of a node to a tree and you should see what I mean.
  Another part of the problem was that the optional annexes couldn't be used as they usually weren't implemented. It's great to not have to load them if you don't need them, it's not so great if they aren't available when you need them.
  There are lots of other minor design flaws. In a way you can forgive them, because this was being designed at about the same time as C++ was being written, and, unfortunately, it was designed by a committee. So the kitchen sink got thrown in. But they did design everything in a way that enabled maximal security...even where it didn't make any sense. (Why can't I compare two literal strings of different lengths? They've probably changed that by now, though.)
  
  --
  
  I think we've pushed this "anyone can grow up to be president" thing too far.
Complexity by Anonymous Coward · 2017-12-31 19:24 · Score: 5, Insightful

Or could it be that the software written in C++ usually tends to be large complex software where performance is important along with various other complicating factors. While the software written in ruby for example tends to be simpler?
Sounds like this 'study' started with a conclusion already in mind.
1. Re:Complexity by Joce640k · 2017-12-31 23:43 · Score: 2
  
  1993 just called and wants it C vs. C++ flamewar back.
  
  --
  No sig today...
2. Re:Complexity by Anne+Thwacks · 2018-01-01 00:10 · Score: 2
  
  Truth is I've seen very few projects where using C++ actually resulted in decent code in the end and quite a few where it wasn't at all helpful.
  Probably because a huge amount of C++ code is in massive embedded systems which are closed source and you can't see the code. I can't either in the general case, but in the cases I have seen, the quality of code is roughly what you would expect from the quality of the team. (In all probability, larger teams will need one or more specialists to do the more complex debugging).
  Are there any large Open Source projects in Cobol? No. That is mostly because Cobol is not fashionable with the OSS crowd. My point is fashion is the biggest factor in language selection for OSS. Not so much for closed source (where the prejudice of the CTO is the biggest factor). I doubt actual performance of the language is relevant to language selection in more than 1% of projects - and those are mostly academic experiments, rather than commercial work.
  
  --
  Sent from my ASR33 using ASCII
3. Re:Complexity by HiThere · 2018-01-01 05:47 · Score: 2
  
  Sorry, that hypothesis doesn't fly. It may be harder to debug C or assembler than C++, but most other languages provide more usable debugging facilities than does C++, and most of them are easier to write unit tests for. The unit testing for C++ is basically an add-on. Even assert statements in C++ are crippled, unless you use an extension. (Assert statements should include an optional message that is printed with the error, and which can dump variables of interest in a formatted way.)
  So C++ is more difficult to debug. There *are* more external tools that you can use to do the debugging, but that's because they are needed.
  
  --
  
  I think we've pushed this "anyone can grow up to be president" thing too far.
Haskell and C++ programmers are different. by shess · 2017-12-31 19:31 · Score: 4, Insightful

Something the linked article didn't seem to address it that the population for each language will differ. The average Haskell programmer is going to be very different from the average C++ programmer, or, god forbid, the average Python programmer.
Also, while they did try to address problem domains, I don't think they addressed systemic issues. For historical reasons, there are many projects which use C or C++ simply because of what they need to interface with to get the job done. For instance, there simply aren't going to be that many browser projects which aren't written in C++.
Personally, I think the interesting take-home is not the difference between languages, it's how small the number of commits for security and memory issues was.
1. Re:Haskell and C++ programmers are different. by serviscope_minor · 2017-12-31 20:58 · Score: 4, Interesting
  
  Also, while they did try to address problem domains, I don't think they addressed systemic issues.
  I don't think they do: none of them have things like zero overhead abstractions, zero cost memory allocation and so on. And some of them (like go) lack the kind of abstractions present in many modern languages.
  For instance, there simply aren't going to be that many browser projects which aren't written in C++.
  Of the three remaining extant enignes: Firefox, Webkit/Blink and Edge and Trident all except firefox are written in C++. Firefox is partly Rust now.
  Rust I think is one of the very very few languages aimed a the same problem domain as C++ by people who understand enough C++ to know what the problem domain was. Look for example at Pike's rants on GO and how was designed to replace C++ and didn't: many C++ programmers sikmmed the features and said something like "oh that'll make my program slowe, more verbose, buggier and harder to write". Rust on the other hand is the same machine model as C++ but with a very very different type system.
  It's never going to replace C++ across the board that's for sure but it's proven capable of replacing C++ in a niche where formerly there were no contenders.
  
  --
  SJW n. One who posts facts.
Conclusions only valid on Open Source Projects? by mykepredko · 2017-12-31 19:58 · Score: 4, Interesting

This is an interesting study, but I don't know if the results can be extrapolated to include closed source software.
My problem with this is that I don't see any evidence of:
a) Projects in the study have a published project plan with somebody managing it at a high level (I would think the Linux Kernel could be thought of as having a plan with strong central management ). I tend to believe that projects in which multiple individuals (with varying levels of understanding of the software, the app's background and issues experienced during development) would be at a much lower quality level than something managed by a strong, continuous team - this doesn't seem to be a consideration when I RFTA (popularity of projects seems to be a bigger issue).
b) Different development tools used by different developers. In terms of the C/C++ typing issues, Windows software developed and built in Visual Studio, Eclipse Text Editor with MinGW or something like Komodo Edit with Cygwin and user written make files will identify different typing issues and may generate code that works differently, especially in regards to identifying and handling typing issues. I would like to know how many bug fixes are the result of something that isn't flagged and works fine on VS and doesn't work when built in MinGW, leading to a fix.
b.1) I'm not 100% sure of the methodology used in this study, but wouldn't a file that originally had tabs for indentation that an editor automatically changes it into spaces be misidentified as a "fix" if it's uploaded back into the repository? This is a combination of b) and c).
c) Different coding styles. I know of several Open Source projects in which a developer has re-formatted code simply because they don't think it's in the "correct" style and they have difficulty reading it resulting in them changing it so they can follow it better. To be fair, I'm sure a lot of us have done that because some people have very different and strongly felt ideas about how code should be formatted.
d) Lack of formal testing methodologies. I don't think many Open Source projects have strong, automated regression testing processes and methodologies before allowing a new release.
e) Difference in functional use of different languages. I would think that methods written in C, C++ and Objective C would be providing more low-level functionality than Clojure, Haskell or Scala. Ruby probably fits somewhere between the two groups.
Comments?

--
Mimetics Inc. Twitter
Python by _merlin · 2017-12-31 20:16 · Score: 5, Insightful

I know I'll get flamed for this, but Python is really error-prone in a particular area, and that's its ridiculously weak name resolution rules. In a language like C, Perl, or even PHP, names are resolved during the compile phase. The compiler knows which definition of a name is going to be used at any point. Python doesn't have this - when it runs across a name, it walks up the scope hierarchy looking for a candidate.
This means that code can run happily for months or even years, until it just crashes with an undefined name error. This could be because of a rarely-used code path with a typo in it, botched refactoring of a rarely-used code path, or a particular set of rare circumstances where a global name isn't set before the code gets to a certain place.
The usual response is that unit tests should catch this. But let's face it, 100% unit test coverage is pretty rare, particularly for the kind of fast turnaround stuff that Python's frequently used for. Also, unit testing isn't necessarily going to simulate a corner case where a global doesn't get set before code that uses it executes. It also makes refactoring more risky because there's no point where the compiler can tell you you're referencing a name that's no longer defined, or no longer has a certain method/field.
This is the kind of area where it's really useful if the compiler can help you, and Python's ridiculously weak name resolution rules make that completely impossible.
1. Re:Python by Just+Some+Guy · 2018-01-01 04:09 · Score: 3, Interesting
  
  In Python, static checkers like pylint and flake8 are astoundingly useful for finding the low hanging fruit you've described. I have both wired into Emacs so that potential errors are syntax highlighted. If you're writing Python and not doing the same, you're doing yourself a disservice.
  
  --
  Dewey, what part of this looks like authorities should be involved?
"Errors should never pass silently" by munch117 · 2017-12-31 22:06 · Score: 2

Python program can be very self-diagnostic. Something goes wrong, it presents as an exception traceback from an uncaught exception.
A lot of bug reports I get go like this: Someone sends me a screenshot with a traceback, I look up the line of the error, find that the error is obvious, fix it, commit the fix, and I still have time for a cup of coffee before 5 minutes have passed. The reporter may not be happy because they can't get on with their work until I cut a new version, but other than that this sort of bug is of very little consequence: no data files have been corrupted or anything like that.
Then there's the other kind of bug, the subtle kind where everything seems to be working fine, but someone checked the output and it just isn't right: the totals on the report don't add up or something. These are the hard ones. And then you have to dig in and hypothesise and experiment and bisect and so on. Of course those bugs happen in Python programs as well.
But I bet the kind of bugs that put Python over average are the first kind, and that Python is below average on the second kind. Which is a good tradeoff.
Stupid comparison. by Qbertino · 2017-12-31 23:06 · Score: 2

Comparing PHP with Scala is like comparing "Game of Thrones" with "Ulysses".
Any n00b can program something useful in PHP within an hour. That's the whole point of PHP. That's why we have such amazingly feature complete systems like WordPress. Given, the architecture of these PHP systems is so bizarre any reasonably seasoned programmer will not believe his eyes when he looks at the actual code - but it does work (most of the time) and it is useful.
Scala is a programming language that forces you to know what you are doing. Yeah, no shit it has less bugs. If I don't know what a JVM and what bytecode is, there is little chance I'll even get started with Scala. Only an experienced Programmierung will get the point of Scala in the first place. Thus Scala code has less bugs. No surprise here.
My 2 cents.

--
We suffer more in our imagination than in reality. - Seneca
1. Re:Stupid comparison. by iggymanz · 2018-01-01 07:56 · Score: 2
  
  You know little about the real world, the majority of web frameworks that use PHP such as drupal are badly written garbage. It is the language of the careless, the language used by builders of sites that get infected and that spread malware and cause identity theft.
  PHP developers are like those that join the school band and want to play the triangle, blocks or cymbals.
Re: age of code by Dausha · 2018-01-01 00:05 · Score: 2

FTFA:
"Project age is included as older projects will generally have a greater number of defect fixes; the number of developers involved and the raw size of the project are also expected to affect the number of bugs and finally the number of commits is bound to."

--
What those who want activist courts fear is rule by the people.
Re:Mainstream languages, duh by vtcodger · 2018-01-01 00:11 · Score: 2

Yep, that was my thought when I read the article. Software used by a wide audience will generally require a lot more bug fixes than similar software used by only a few. Users find the damndest ways to use and abuse software. The more users, the more things they want changed and the more actual bugs they identify.
Face it -- most of the world runs on C, C++ and, increasingly, Python. Of course there are lots of bug fixes. And BTW I loath C++. IMO C++ code is almost always unreadable except possibly by its author. And C isn't that much better. I'm not defending my favorite languages here.

--
You can't see ANYTHING from a car, You've got to get out of the goddamned contraption and walk...Edward Abbey
Re:The actual problem languages by Anne+Thwacks · 2018-01-01 00:15 · Score: 3, Funny

1) Javascript
2) C++
3) PHP
4) Javascript-based fameworks
5) Anything used to write an Excel or Word macro by the HR department
This is an unfair comparison: PHP specifically targets producing buggy products, and in the unlikely event that an HR compartment gets anything to work, it is even more unlikely to involve a computer.

--
Sent from my ASR33 using ASCII
Re:typing by Jeremi · 2018-01-01 08:33 · Score: 2

I'm pretty sure the "weak minds" quote was making an implicit argument for strong typing, not against it.

--

I don't care if it's 90,000 hectares. That lake was not my doing.