Slashdot Mirror


The Effect of Programming Language On Software Quality

HughPickens.com writes: Discussions whether a given programming language is "the right tool for the job" inevitably lead to debate. While some of these debates may appear to be tinged with an almost religious fervor, most people would agree that a programming language can impact not only the coding process, but also the properties of the resulting product. Now computer scientists at the University of California — Davis have published a study of the effect of programming languages on software quality (PDF) using a very large data set from GitHub. They analyzed 729 projects with 80 million SLOC by 29,000 authors and 1.5 million commits in 17 languages. The large sample size allowed them to use a mixed-methods approach, combining multiple regression modeling with visualization and text analytics, to study the effect of language features such as static vs. dynamic typing, strong vs. weak typing on software quality. By triangulating findings from different methods, and controlling for confounding effects such as team size, project size, and project history, they report that language design does have a significant, but modest effect on software quality.

Quoting: "Most notably, it does appear that strong typing is modestly better than weak typing, and among functional languages, static typing is also somewhat better than dynamic typing. We also find that functional languages are somewhat better than procedural languages. It is worth noting that these modest effects arising from language design are overwhelmingly dominated by the process factors such as project size, team size, and commit size. However, we hasten to caution the reader that even these modest effects might quite possibly be due to other, intangible process factors, e.g., the preference of certain personality types for functional, static and strongly typed languages."

16 of 217 comments (clear)

  1. Or, to put it another way... by jeffb+(2.718) · · Score: 5, Insightful

    "We aren't saying that functional, static and strongly typed languages are inherently superior. We're just saying that if you don't prefer them, maybe you aren't really cut out for programming."

    1. Re:Or, to put it another way... by Z00L00K · · Score: 5, Insightful

      So far my preference lies with static strongly typed languages, and from the question of code quality it's certainly helpful.

      However the real strength of static strongly typed languages is when it comes to maintenance. The original programmer knows what he's using, but someone inheriting an old code base will need to put down a large amount of time to figure out how it's actually built and what a certain type do. A dynamically typed solution tends to be elusive and can change the semantics depending on how it's used, which can range from confusing to outright hilarious.

      --
      If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
    2. Re:Or, to put it another way... by Anonymous Coward · · Score: 2, Insightful

      Java is horrible for many reasons.

      When it comes to language choice, you have to decide if performance matters. Because if it matters, you pick C and don't look back. If it matters enough to ensure there are no warnings and errors, C++, OBJC/OBJC++, and "natively compiled" C# become viable. Only those languages can compile into a clean program, but only C is gauranteed to compile cleanly on all OS's because every OS and Compiler has different "standard" bits. Microsoft is a bit worse for adhering to a standard, but the GNU compilers are much worse at consistency, even across versions. Something compiled with 2.95.x won't co-exist with anything newer.

      That's just one problem with the C family.

      Java (and similar efforts like C#) takes all the problems of C, and adds the overhead of trying to interpret it to native or pseudo-native instructions. So If performance was your concern for C, "Being able to run on any OS without recompile" is Java and C#'s. Unfortunately, that's not the case either. OpenGL use tends to throw this idea out the window because now you have to add different rendering pipelines INSIDE Java/C# for each OpenGL or OpenGL ES variant, thus getting rid of the concept of "pure Java" or "Pure C#"

      So if you're going to develop a game from scratch, you may as well skip trying to do anything in Java and just use C the first time. You're better off writting your own graphics wrappers per OS, than trying to fudge it inside Java .

      Now, interpreted things (Ruby, Javascript, Perl, PHP, Python, etc) have one advantage over everything else. You can generally just hit "run" and figure out what it does. They're much easier to reverse engineer. The tradeoff is that they are 100x slower than native. So where you use interpreted languages without stupid hacks (eg webGL, asm.js in web browsers) you get slow performance compared to something native, as interpreted languages lack the ability to actually use native threads, and instead rely on "single-threaded" behavior, even though it may multithread some parts of the underlying support code in C/C++. So, "Nice octa-core processor you have there, too bad my dual core runs circles around it because it has a higher clock speed."

      I really hate this entire "lets make everything a web app" approach lately. There are various things that can, and should, get away with it (eg email), but let's quit fooling outselves that we're going to get "native" console performance in a stupid web browser. Yay I can run a 15 year old game in the browser, big whoop. Call me when the latest 3D FPS game runs in a browser at the same performance as a console.

      Language choice should consider the target platform's "today" performance first. If today's consoles and desktops have at least 2 cores, there is no reason NOT to multithread it. Software like x264, ffmpeg, make use of all available CPU and GPU power they can reasonably expect to use. OpenCL use is still limited, and may never get any practical use outside of sciences since the GPU is better used for graphics, and for some damn reason everyone is hellbent on putting weak GPU's in tablets, notebooks and entry level desktops to ensure OpenCL doesn't get used.

  2. Also which languages that beginners choose. by EmperorOfCanada · · Score: 4, Insightful

    I would say that there are three other critical factors; which languages beginners chose, which languages are rarely used, and potentially even more importantly which languages become the programmer's only language ever.

    If someone is new to programming then their programming is probably going to be poor. So certain languages tend to be "gateway" languages such as PHP, Python, VB(in the past), C#, etc. It is doubtful that someone is going to start out their programming career with the C in OpenCL or Haskell.
    I have seen many people learn PowerBuilder and never learn anything else, and while they might master powerbuilder they never really master programming. I have also seen the same thing with accountants who master the VB in excel resulting in some of the strangest agglomerations of code I have ever seen.
    But also certain languages are sort of throwaway for many programmers such as whatever the language is inside Make scripts; as most programmers that I know have not mastered make and do what they have to do to get things to compile. The same with bash; I have only met a few programmers who truly knew bash. They did what they had to do and ran away after that.

    So it would be very difficult to tease out the quality of a language based on these statistics. But regardless of the results the religious zealots who think their language is the very best and that all others are for children won't be swayed by facts anyway.

  3. KISS by polar+red · · Score: 4, Insightful

    The KISS-principle is probably the most important thing to keep software quality up, more so than tools and language.

    --
    Yes, I'm left. You have a problem with that?
  4. Language by ledow · · Score: 4, Insightful

    You can code sloppily in any language.

    All this tells me is that there's so little difference as it not to be a major factor in your choice of language. As such, other more practical considerations (such as hiring programmers, project time, and even speed of the final code) should take far more precedence than the triviality of what language you happen to use.

    As with all things "programming language" - apply them to real language. I'm certain that in some languages, it's easier to mis-speak at a critical moment, or to say the wrong thing, or be misconstrued. I'm also certain that some languages are more prevalent, easier to learn, clearer in their intent and grammar, etc.

    But it doesn't mean at any point that you should change what you're doing to the language of the moment, nor that you should choose what language to do a project in taking any notice of the structure or grammar of the language over who you have who can speak it and how well everyone can be understood if they speak it together.

    Also, there are languages and dialects that make specific tasks easier (for instance, IT has a language all of it's own, talking about SCSI, buses, cloud, etc.). If everyone is able to "speak the lingo", then that's a good choice, but it's not the be-all and end-all of a good project.

    As such, all the programming language discussion is really like saying "We should all speak and write only in Chinese, because the Chinese for death and dearth sound more different and we won't get confused". Don't. Program in the language that you're comfortable with, that the people you hire can read and write fluently, and that is most common and available.

    Personally, that's always been C / C99 for me. So I always find it hard to justify the use of other languages except when there's a functional difference that gives a distinct advantage (e.g. a scripting language for scripting, or string-handling in Perl, etc.)

    TL;DR version: Who cares what language? Stop arguing about it and start coding.

  5. Development effort not considered by pem · · Score: 4, Insightful
    They discuss prior studies that looked at development effort, but hand-waved away the fact that dynamic languages take less development effort.

    This may well be because their study cannot discern the amount of programmer effort per check-in, but it is a fatal flaw. Open development methods mean that a lot of dirty laundry gets checked into repositories. If dynamic languages have more bugs per check-in, but require significantly less work per check-in, then measuring bugs per check-in without measuring effort per check-in is meaningless, and that's before you even get to the functionality provided by the checked-in code.

  6. Re:More factors to normalise out. by Anonymous Coward · · Score: 1, Insightful

    Poppycock. It's caused by those languages being tools for shooting yourself in the foot. And the fact idiots still use those languages in areas where performance isn't an absolute priority is simple idiocy. Meanwhile, the smart people use HLLs and they strike the top five C/C++ issues off the chart entirely just by using an HLL. Perf issues are pretty much all that's left, so in fact people writing in HLLs tend to care quite a lot about performance, but only insofar as performance actually matters.

  7. Re:You need enough rope to hang yourself by SQLGuru · · Score: 4, Insightful

    This. Languages that enforce their rules at compile time vs run time should inherently lead to higher quality code just by "accident". You can still write bad (or good) code in any language, but a language that lets you do whatever you want requires you to be much more rigorous in your testing strategy to ensure higher quality......and we all know how much developers love to test (and to document).

  8. Language by TVmisGuided · · Score: 4, Insightful

    You can code sloppily in any language.

    True, but some languages make it more difficult to do so. Ada, for example, won't allow code to compile with (what should be) obvious logic or syntax errors that most C/C++ compilers will happily ignore, and hence allow to go undiscovered until runtime...errors that could be catastrophic in the real world.

    Ada has acquired a reputation as a niche-market language, but that niche market takes heavy advantage of Ada's strengths: strong typing and a requirement that the developer properly design the software before writing code. Unfortunately, deciding to develop commercial software in Ada also comes with a fairly steep price tag, because it's a niche market...thus perpetuating the cycle.

    DISCLAIMER: I am not affiliated with any company which produces or sells Ada compilers.

    --
    All the world's an analog stage, and digital circuits play only bit parts.
  9. How did they measure quality? by MobyDisk · · Score: 5, Insightful

    The problem with these kinds of studies is that there is no actual way to objectively measure software quality. You can correlate all the data you want, but garbage in means garbage out.

    For this study they used two thinfactor gs to determine software quality: one is the number of bugfix commits. Ugh. I'm not even clear if the number of bugfix commits means higher quality, or lower quality. That could go either way. It might mean you had better testers, or that you committed things in small batches, or that you had more branches. The other factor was a natural language processor that read the check-in comments. While this is a really cool idea, you would need a lot of research just to prove that this approach actually works before you can start using the technique to draw conclusions about some other data.

    So while this was very cool, and very ambitious, the results are completely meaningless until someone can prove that this technique actually measures software quality at all.

  10. Concurrency bugs found in highly concurrent langs by Count+Fenring · · Score: 4, Insightful

    Also striking - they point out that functional languages, in particular Scala, Erlang, and Clojure have more concurrency bugs, without bringing up that concurrency support is basically the primary feature those languages are selected for. I'd love to see the defect number correlated with the percentage of code dealing with concurrency.

  11. Re:More factors to normalise out. by K.+S.+Kyosuke · · Score: 3, Insightful

    The car analogy should include the fact that since the vast majority of people isn't facing the need to solve the Indy 500 problem, the very same vast majority of people is better served with learning to use more practical tools. A computer analogy would be using assembly language for a bootloader vs. using Python for whatever non-bootloader problem you happen to have.

    --
    Ezekiel 23:20
  12. Re:Take away for me by psmears · · Score: 4, Insightful

    the preference of certain personality types for functional, static and strongly typed languages.

    Translation: because only very high-skill programmers attempt to use the very unpopular functional languages (like lisp and erlang) the resulting code is, on average, of better quality.

    There is another possible interpretation: that programmers who are very concerned about quality - and hence are happy when their language gives them lots of information about potential mistakes - like using languages with features (such as a strong type system) designed to allow detection of certain types of mistake. That is, it's specific features of the languages, rather than the fact that the languages are "unpopular", that draws quality-focussed programmers to them.

    Of course, that is just as much conjecture as any other interpretation :-)

  13. C++: the biggest joke ever played on developers by Squidlips · · Score: 1, Insightful

    It has snob appeal for the uber geeks but the language sucks in so many ways. It killed Netscape 6.0 and that should have been the last of the language but it still lives, alas...

  14. does have a significant, but modest effect by Anonymous Coward · · Score: 2, Insightful

    i am unfamilliar with how these words together make sense. The fact that no one els epointed it out makes me feel like its me. But it's driving me nuts. It's like a person giving a lecture has their pants drop to the floor and no one says anything. I have to post becuase it seriously is driving me nuts. If someone can explain hwo this phrase makes sense i would appreciate it.