Slashdot Mirror


MySQL & Open Source Code Quality

dozek writes "Perhaps another rung for the Open Source model of software development, eWeek reports that an independent study of the MySQL source code found it to be "in fact six times better than that of comparable commercial, proprietary code." You can read the eWeek write-up or the actual research paper (reg. required)."

22 of 446 comments (clear)

  1. Hmmmm by Anonymous Coward · · Score: 1, Interesting

    Woot. Pretty amazing, but with so many eyes and not at all confined by the "9-5" grind, it is almost expected.

  2. Lines of Code? by ksa122 · · Score: 3, Interesting
    Reasoning performed its independent analysis using defect density as a prime quality indicator. Defined as the number of defects found per thousand lines of code, MySQL's defect density registered as 0.09 defects per thousand lines of source code.
    Can any measurement that uses lines of code to compare code that could be written in different languages or for different types of applications be very accurate?
  3. All that's missing ... by JSkills · · Score: 4, Interesting
    All that's missing - to go along with the defects per lines of code comparision - is a comparison of features and performance benchmarking to other commercially built database products. Now that would be the complete comparison.

    As strong proponent of MySQL, I'd be very curious to see how it stacks up in those regards.

  4. Stanford Checker by eddy · · Score: 4, Interesting

    Anyone know how this one is faring? Will it ever be released? It's based on GCC, right? How many students can it pass between until it's "distribution"?

    The reason I'm asking is because I saw that one member of the team has jumped over to a company called Coverity where one can read:

    Originally developed by a team of researchers in the Computer Systems Lab at Stanford University, Coverity's patent-pending source code analysis technology successfully detected over 2000 bugs in Linux including hundreds of security holes.

    I just think it'd be horrible if they used the GPL'ed GCC to develop their methods (having access to a full portable compiler onto which to do research and development is hardly a "small thing"), and then lock these same methods away from the community.

    I'm grateful for their work on checking linux, but really... this smells bad, IMHO.

    (If you don't know what I'm taking about, don't assume it's off-topic, okay? The Standford Checker is a related topic to the Reasoning analysis of MySQL, and I'm not sure we'll ever have a _better_ fitting topic to discuss this)

    --
    Belief is the currency of delusion.
    1. Re:Stanford Checker by Error27 · · Score: 4, Interesting

      I wrote a similar tool to the Stanford Checker called smatch.

      I post the bugs and stuff that it finds on kbugs.org. The most recent kernel that I've posted is 2.6.0-test11.

      One thing that I was working on a couple weeks ago was invalid uses of spinlocks. Here are my results from that. I found quite a few places that don't unlock their spinlocks on error paths etc.

    2. Re:Stanford Checker by owenomalley · · Score: 2, Interesting

      > I just think it'd be horrible if they used the
      > GPL'ed GCC to develop their methods (having access
      > to a full portable compiler onto which to do
      > research and development is hardly a "small
      > thing"), and then lock these same methods away
      > from the community.

      Yeah, that is the way it is going to go. Dawson and his students and employees use gcc for a parser and have no intention of releasing their tool under any open source license. They claim that they modified gcc to write out Abstract Syntax Trees (ASTs) that are then read in to their tool (the Coverity/Stanford checker), which Coverity is selling commericially. Richard Stallman has long fought to keep gcc from publishing useful ASTs to prevent things like this from happening, but it is obviously impossible to stop in the long run and he should just concede the point.

      We should pressure Dawson and Coverity to at least release the modified gcc parser that will dump the AST. ASTs enable all kinds of program analysis tools, such as doxygen and static analysis tools. Furthermore, we should pressure FSF to roll the changes back into the GCC mainline.

  5. As John Carmack put it... by rafael_es_son · · Score: 5, Interesting

    The main difference between open and *MOST* closed code is the fact that the early release of closed code means mucho mas money to corporate pigs and dogs, thus, proper requirements analysis, design, coding and testing are usually pummeled in the name of happy-go-lucky capitalism. "It will be ready when it is ready." -Carmack "I love America!" -Murphy

    --
    HAD
  6. Re:Duh! by I8TheWorm · · Score: 5, Interesting

    I've used mySQL, Oracle, MS SQL, DB2, and MSDE. I'm not sure I get your comment about MS SQL server. Like any other RDBMS, a little performance tuning goes a long way. As a matter of fact, until Oracle's release of 10g, MS SQL beat all commercial offerings in the TPC benchmarks.

    MS has a buggy os and an awful model for business practice, but I think MS SQL server is a fairly nice offering. It's too bad it only runs on Windows servers though.

    --
    Saying Android is a family of phones is akin to saying Linux is a family of PCs.
  7. Must have been baaad commercial code then.. by jordan · · Score: 4, Interesting

    Because there are portions of the MySQL code that are just painful to look at.

    Take for instance the part that takes as input the key index size and calculates internal buffer sizes. The option's size is an unsigned long long, but they cast it to an unsigned long all over the place, do in-place bitshifting on the cast (and cause it to wrap -- try specifying 4G for your key index sometime and you'll get 0), and the quality of code in that case is just painfully horrible to look at or even figure out what it's doing.

    I could only shudder to think what the quality of the commercial product looked like, in comparison. Hell, I'll have nightmares if I consider the quality of MySQL++ as a comparison..

    --jordan

  8. How about other OSDBs by Anonymous Coward · · Score: 1, Interesting

    How buggy is MySQL is compared to say PostSQL, FileBird, etc. MySQL tends to crumple under load, while PostSQL keeps going.

  9. Re:6 times better? by Urkki · · Score: 2, Interesting
    • And is code that contains no bugs at all infinitely much better than code that contains a single bug which hardly ever occurs?

    Fortunately for the "model", there is no substantial piece of code that contains just one rarely occuring bug, let alone code that contains no bugs at all. Therefore such infinities never need to be considered in real life cases.

    But if you think of it theoretically, if that one rarely occuring bug potentailly causes your company go bankrupt (like being sued for huge damages), then I'd say the bugless version is infinitely better.
  10. Re:Debatable scale by Zathrus · · Score: 4, Interesting

    Faults per 1000 lines of code may seem like a valid scale, but I think it is indicatory at best, not proof.

    It's actually a really miserable scale because of your 3rd point. If they ran the code bases through something like cindent and standardized the code formatting and removed all comments and whitespace then it's a somewhat more valid comparison. I didn't look at the actual research paper -- maybe they did. Odds are, your other two points are valid though.

    Additionally, they only say that the commercial code is "comparable". What does that mean (again, maybe answered in the paper)? Do they have roughly the same features? Are the query optimizers of roughly the same quality? Do they support the same platforms? I can't think of a major commercial database that doesn't exceed MySQL in all of these areas (ok, excepting SQL Server which fails on the 3rd only). Maybe it was a minor player in commercial databases. Dunno.

    These are the kinds of points that are raised when someone bashes OSS. There's no reason that they shouldn't be raised when the inverse is true as well. MySQL has progressed nicely and is worthy of consideration for light to moderate database loads now, I don't question that. All I'm saying is don't take things at face value.

    So, small victory, but the race goes on.

    The nice thing is that this is small and succinct -- it's suitable for showing to upper level management. That's a big win IMHO -- because normally the text bites they read are biased against free/open software.

  11. Re:Now apply to IE patches.... by thebatlab · · Score: 3, Interesting

    That open source patch was quite shoddily and hastily written. It wasn't even a patch really. Using it as representative of open source is not fair in any way whatsoever to other successful open source products.

    "Now apply the 'Rule of 6 times' to Microsoft's closed source IE patches..."

    There is no 'Rule of 6 times'. An analysis concluded that MySQL had a very limited number of defects in their code base. Kudos to them. This doesn't define a rule to be used in the open source vs. closed source holy war.

  12. Re:Debatable scale by G4from128k · · Score: 2, Interesting

    It is very true that we can measure the "quality" of software with many different dimensions. The parent posts' suggestions of assessing design, error type, and parsimony (lack of dilution of errors with verbose code) are good.

    But the existence of alternative scales does not detract from the original assessment of defects/line unless we have separate knowledge that OSS is unfavorably biased. Do we have reason to believe that OSS is more poorly designed than commericial software, or that OSS has more serious bugs, or that OSS is especially verbose? Without that additional information, it is just as likely that commerical software has a worse design, more serious bugs, and bloated code in addition to a higher defect density (I know I can think of at least one dominant vendor that is guilty of all three sins). In fact, a higher defect density is probably a good indicator for both worse design and the presense of more serious bugs.

    Yes, the race still goes on. It would be nice to benchmark MySQL on these other dimensions of quality and benchmark other OSS projects. But without an a priori reason to suspect that OSS is worse on these other dimensions, I think we can conclude that the report is a victorious validation for MySQL and its team.

    --
    Two wrongs don't make a right, but three lefts do.
  13. Toy DBMS by leandrod · · Score: 2, Interesting

    Seen lots of intelligent comments about lenght of lines and potential bloat skewing the results, but there is one more issue to consider: design.

    No matter how good the coding itself, if the design is broken, the tool is broken, period.

    And MySQL has a broken design. So broken that the upgrade path isn't MySQL X or something the like, but MaxSQL -- in fact, rebranded SAPdb. That SAPdb is at most at Oracle v7.2 levels tells lots about MySQL.

    I could be more specific, but do your own research in Google -- lack of SQL compliance, lack of features to enable declarative coding at the server instead of procedural client code, and so on.

    Now, the interesting part. Suppose MySQL AB would have a sudden insight and repent of their un-SQL, anti-relational ways. Unlikely, you say; yet possible. Now suddenly they have to recode, or change drastically the current code. The resulting tool will be probably much bigger than the current, because SQL is baroque; or even worse than much bigger, because of MySQL backwards compatibility.

    The sheer bloat will make even this faulty measure of bugs/KLoC skyrocket. Now, run the comparision again...

    Not to say SQL compliance shouldn't be attained. In fact, bloat in the SQL DBMS is a more than good enough tradeoff against bloat in the application. The ideal would be a RDBMS, but while there isn't a MyDataphor a SQL DBMS should do.

    Even today, I don't care about comparing to, say, Oracle or MS SQL Server. IBM DB2 would be a better baseline, but best of all the real competitors: PostgreSQL and Alphora Dataphor.

    --
    Leandro Guimarães Faria Corcete DUTRA
    DA, DBA, SysAdmin, Data Modeller
    GNU Project, Debian GNU/Lin
  14. MySQL and Commercial Licenses by Anonymous Coward · · Score: 3, Interesting

    I'm a little confused. I thought I understood how to make profit with the GPL, but now I'm not sure.

    MySQL GPL'ed all their products. (presumably so they could get developers and bug-fixes to their product for no charge.) However, they offer "commercial" licenses for people who want to integrate MySQL into their software, but don't want to GPL it. How can they do that? Presumably, any improvements/bugfixes/modifications that came from the community would be GPL, and therefore cannot be re-integrated under a more restricted license. I'm a little confused here. How can they take code that has been released under the GPL and turn around and release it under a more restrictive license?

    1. Re:MySQL and Commercial Licenses by Anonymous Coward · · Score: 1, Interesting

      I wonder then why anyone would want to assign over the copyright to them if they're just going to plow it into the commercial side and make money on it. I see it as either:

      1. They really think MySQL is sincere when they say that they are "Quid Pro Quo" and are using the profits to improve both the GPL and non-GPL versions. (Using "versions" for lack of a better word, i think the codebases are identical)

      2. MySQL deserves it since had they not GPL'ed the database to begin with, there would be no patch. Essentially, they already gave the "Quid" part and letting them profit from your patch is the "Quo Pro" part.

      3. Acknowlegement. Being able to say, "Yeah, I created the xyz feature patch that made MySQL self-aware" probably helps the developer land some more private contracts. (This assumes that MySQL allows you to keep attribution rights.)

  15. Re:If you would RTFA... by Dun+Malg · · Score: 5, Interesting
    they quantified it by dividing verified defects by lines of code.

    Problem with that is that it assumes the same "code density". Granted, it's probably not going to differ by a factor of six, but remember the old question about programmer productivity:
    who's more productive: the coder who solves a given problem with 100 lines of code written in one hour, or the coder who solves it with 10 lines in two hours?

    I mean, simple stuff like doing this:

    bool function(int i);
    main(void)
    {
    int i;
    if(function(++i))
    //blah blah blah
    }
    ...instead of:
    bool function(int i);
    main(void)
    {
    int i;
    bool foo;
    foo = false;
    i++;
    foo = function(i);
    if(foo)
    //blah blah blah

    }

    ...will give you a threefold difference in line count (specifically counting lines in the main() function). Throw in an identical line using malloc in each, both forgetting to free it later, and you've got a "bug density" of .33 for the former, and .14 for the latter. Heck, you could have two un-freed malloc's in the latter an it'd still only be at .25! I'm not saying the study is wrong-- I'd rather have the code out where I can see it, no matter WHAT the "bug density"-- I'm just saying that I wouldn't take any statistic that is derived using "lines of code" as a variable as a serious, hard number.
    --
    If a job's not worth doing, it's not worth doing right.
  16. Re:On paper it looks better by Tassach · · Score: 2, Interesting
    Access has foreign keys, but unless they added it in the latest version, it does not support real transactions. Add to that the fact that it's locking model is fundamentally broken, you have something which is just powerful enough to let you do things with it that you shouldn't. MySQL suffers from the exact same problem.

    I shouldn't complain -- I've made a lot of money over the years cleaning up the messes left by inexperienced people who thought Access or MySQL were real databases.

    --
    Why is it that the proponents of "one nation under God" are so eager to get rid of "liberty and justice for all"?
  17. Re:Duh! by Anonymous Coward · · Score: 1, Interesting

    Can be easily worked around. Who cares.

    Real DBAs care. The point is you shoudn't have to work around it.

    A pain in the butt, but in order to have stored procedures you need a procedural language

    Ahh the last cry of the desparate - "that's a stupid feature - nobody needs that!" Despite the fact that having to code around it in the app makes the whole thing slower (I thought MySQL was all about speed?) And increases development time.

    In order to have a trigger you have to have a procedural language.

    Another "that's a bogus feature" excuse.

    Any foreign key constraint may be expressed as a join and this is usually considerably faster.

    Do you even know what a foreign key is, or how it's used? It certainly doesn't sound like it. "You don't need the database to ensure the integrity of your data, because you can just check it manually!"

  18. Re:If you would RTFA... by neelm · · Score: 3, Interesting

    So what you are saying is you would rather have your DB crash over not supporting some feature in a way which is only applicable in select situations?

    As a real world programmer (versus someone living in an academic world of theory) I prefer the what-I-have-works-and-I'm-Working-on-the-rest approach. In the real world, stability and performance are paramount to feature set. Also, when you consider the domain of creating web driven applications, some features of a DB become less important because the stateless nature of a http connection. Server-side cursors don't do well in a cookie.

    > MySQL may be well-written, but it's still a piece of crap by the standards of any professional DBA.

    Which is why I give little attention to certifications.

  19. Re:If you would RTFA... by frostman · · Score: 2, Interesting

    A funny thing to add to this...

    I'm doing my first MySQL work (done a lot of Oracle and a little PostgreSQL) and I was *flabbergasted* when I realized that, when you update a table but the data has not actually changed, you get success and zero rows updated.

    Which is exactly what you get (and should get) when you try to update and no rows are found to update.

    I suppose with no triggers anyway, it might be a tiny bit faster to skip the actual update when the data hasn't changed, but to real DB folks this is not only counter-intuitive, it's *scary*.

    'Course this is 3.23, maybe they changed that in 4. I read that they added booleans in 4... though just as an alias for ItsyBitsyInt.

    MySQL is fast and free and there is a lot of community support for beginners. And if you have oodles of RAM, the HEAP tables are a sweet thing indeed. As such it's good. But I sure hope nobody ever makes me use it for anything mission-critical... and I fear for people using this as an "enterprise" DB.

    (donning flame-proof suit...)

    --

    This Like That - fun with words!