Slashdot Mirror


The Effect of Programming Language On Software Quality

HughPickens.com writes: Discussions whether a given programming language is "the right tool for the job" inevitably lead to debate. While some of these debates may appear to be tinged with an almost religious fervor, most people would agree that a programming language can impact not only the coding process, but also the properties of the resulting product. Now computer scientists at the University of California — Davis have published a study of the effect of programming languages on software quality (PDF) using a very large data set from GitHub. They analyzed 729 projects with 80 million SLOC by 29,000 authors and 1.5 million commits in 17 languages. The large sample size allowed them to use a mixed-methods approach, combining multiple regression modeling with visualization and text analytics, to study the effect of language features such as static vs. dynamic typing, strong vs. weak typing on software quality. By triangulating findings from different methods, and controlling for confounding effects such as team size, project size, and project history, they report that language design does have a significant, but modest effect on software quality.

Quoting: "Most notably, it does appear that strong typing is modestly better than weak typing, and among functional languages, static typing is also somewhat better than dynamic typing. We also find that functional languages are somewhat better than procedural languages. It is worth noting that these modest effects arising from language design are overwhelmingly dominated by the process factors such as project size, team size, and commit size. However, we hasten to caution the reader that even these modest effects might quite possibly be due to other, intangible process factors, e.g., the preference of certain personality types for functional, static and strongly typed languages."

6 of 217 comments (clear)

  1. You need enough rope to hang yourself by msobkow · · Score: 4, Informative

    The more flexibility and power a language provides, the more opportunities you have to hang yourself with it.

    Personally what I hate are loosely, dynamically typed languages. They provide no compile-time checking at all that I can detect, which means that in order to even guess whether the code is "correct" you have to run through all the possible use cases. I realize that it's an ideal to test all possible inputs (especially boundary conditions), but that just isn't practical for most project schedules and budgets.

    As powerful as functional languages can be, the restrictions imposed by them can lead to difficulty implementing certain behaviours in the code. In fact, one Erlang project I worked on proved to have such an extreme difficulty implementing an algorithm that we had to cancel the project, even though the rest of the project had been completed. (That function was *the* heart of the system: the scheduling algorithm>)

    Much as the researchers discovered, I've never really found the programming language itself to have much of an impact on the code quality or readability of the code if the code was competently written. That said, even the best of languages can be turned into unmaintainable gobbledygook by a dedicated bonehead, especially consultants who know damned well they'll be long gone before the project enters maintenance/enhancement mode.

    What I found really degrades quality is not the language, but an overemphasis on code style at some companies. Instead of code reviews focusing on the functionality of the code being reviewed, they spend all their time nit-picking about variable names and whether to use camel-case or underscores.

    I consider the maintainability and readability of code to be at least as important as any metrics about the number of bugs in a project. If you can't read and understand the code easily, fixing a bug when it is discovered becomes a hellish nightmare.

    --
    I do not fail; I succeed at finding out what does not work.
    1. Re:You need enough rope to hang yourself by Anonymous Coward · · Score: 2, Informative

      I have to disagree with you here. Style and consistency are important. I often do third party code reviews for security and I can tell you the code I get where the there is consistent style and convention being used almost always exhibits fewer problems.

      I am sure there is a point in organizational maturity where people first start focusing on style but the group has not mastered it yet, it probably is a distraction there, once past that inflection point however, it makes it easier for peer reviewers to spot the bad decisions and questionable logic in code.

      if user = authenticate(user,password)

          doSomeStuff(user)
      else

        doSomethingDifferent(user)
      end

      Might be perfectly correct code. If you are trying to get code released though you might buzz by what this is doing on a fast read or read it wrong; unless this is a typical convention in use and then you probably would understand immediately. In other shops though you'd see this:


      user = authenticate(user,password)
      if user

          doSomeStuff(user)
      else

        doSomethingDifferent(user)
      end

      I don't have a preference actually but If you have people doing it both ways in the same code base, its a recipe for overlooked bugs.

      You pass the user variable to the authenticate function and then immediately assign user to the result of the function. If this new user is falsy, then you pass it to the doSomethingDifferent function. That's just nasty, and at that point I don't really care about the style issue you brought up.

  2. Re:More factors to normalise out. by Wootery · · Score: 3, Informative

    C/C++ certainly let you shoot yourself in the foot regarding correctness, but they generally don't make it easy to shoot yourself in the foot regarding performance. (C++ templates, exceptions, and RTTI being exceptions.)

    And the fact idiots still use those languages in areas where performance isn't an absolute priority is simple idiocy.

    Other legitimate reasons include legacy codebase, existence of useful libraries accessible from only these languages, extreme practical portability (yes you can technically run C# on Android, or Java on iPhone, but it takes proprietary external tools), etc.

  3. Re:Take away for me by Mr+Z · · Score: 4, Informative

    BTW, this ACM Queue article was linked from the blog post I linked above. It's another good, somewhat relevant read, IMHO. It makes largely the same point, though: It's more the programmer than the language.

  4. Re:Or, to put it another way... by i+kan+reed · · Score: 3, Informative

    I'm going to put out an alternate conjecture:

    Interfaces are the magic that tend to make strongly typed languages work well. That, for example, C and C++ don't positively contribute to the trends this article discusses(though I can certainly imagine that people who know how to manage their own memory tend to have a more robust understanding of code).

    The ability to abstractly describe the kinds of behavior that are needed to fulfill a class of task in an application lends itself to a framework that's intuitive to complete. In other languages you expose yourself to a lot of time spent manually lining up the requirements of external pieces to what you're writing now.

    Again, all conjecture, but it comes from my own observation of when I tend to make mistakes.

  5. Re:Other factors. by naasking · · Score: 3, Informative

    Having closures does not make a functional language, instead, what makes a functional language is referential transparency.

    Scheme, Lisp, OCaml are all functional languages that are not referentially transparent. Pure functional languages require referential transparency, but impure functional languages do exist.

    JavaScript is a functional language, but it's also procedural and object-oriented.