MySQL & Open Source Code Quality
dozek writes "Perhaps another rung for the Open Source model of software development, eWeek reports that an independent study of the MySQL source code found it to be "in fact six times better than that of comparable commercial, proprietary code." You can read the eWeek write-up or the actual research paper (reg. required)."
Six times better? I didn't know it was possible to quantify code quality in that matter. Interesting.
"Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman
...until I release my MySQL source code to the open source community. Then that 6x multiplier will drop down to 2x.
Yeah, it's really that bad. Gets the job done, though. Hell to maintain. Probably would've helped if I documented any of it.
Maybe I should read that Code Complete book I keep meaning to read sometime.
Creator of the popular web game Proximity
Perhaps another rung for the Open Source model of software development
Uhh... no.
It's is a glowing report for this particular open source project but that brush shouldn't be used to paint all open source. That will just lull open source developers into a false sense of euphoric contentment. Code quality didn't get this far by having a fixed target, that target should be a carrot on a stick that will never quite be reached.
Trolling is a art,
Undoubtedly()
{
when();
you = measure(quality);
in.defects();
per->lines_of(code, anyone);
can = write(good, solid, code);
}
...they quantified it by dividing verified defects by lines of code. MySQL had 0.09 bugs/KLOC while the "commercial" defect density was 0.53 bugs/KLOC. (Their use of the term "commercial" confused me since MySQL is, after all, a "commercial" project, just an open-source one.)
All's true that is mistrusted
I do believe that Open Source is better than proprietory. Faults per 1000 lines of code may seem like a valid scale, but I think it is indicatory at best, not proof.
* It does not take into account the design of the software. This is often as important as the actual quality of the code.
* It does not take into account the kind of errors. This is related to the first, but a buffer overflow that allows root access is worse than a failed instruction.
* It does not even take the length of lines into account. Shortening the lines could lower the number, without actually changing anything.
So, small victory, but the race goes on.
the pun is mightier than the sword
This just looks like some quasi-scientific statement, trying to express things as a number that really don't fit such a representation. For example, as the number of defects decreases, it becomes increasingly more difficult to find the ones that are left. And is code that contains no bugs at all infinitely much better than code that contains a single bug which hardly ever occurs?
The main difference between open and *MOST* closed code is the fact that the early release of closed code means mucho mas money to corporate pigs and dogs, thus, proper requirements analysis, design, coding and testing are usually pummeled in the name of happy-go-lucky capitalism. "It will be ready when it is ready." -Carmack "I love America!" -Murphy
HAD
Since we're measuring Defects per 1000 lines, perhaps calling them "Gates" or "Ballmers" might be more appropriate.
I've used mySQL, Oracle, MS SQL, DB2, and MSDE. I'm not sure I get your comment about MS SQL server. Like any other RDBMS, a little performance tuning goes a long way. As a matter of fact, until Oracle's release of 10g, MS SQL beat all commercial offerings in the TPC benchmarks.
MS has a buggy os and an awful model for business practice, but I think MS SQL server is a fairly nice offering. It's too bad it only runs on Windows servers though.
Saying Android is a family of phones is akin to saying Linux is a family of PCs.
Need a particular reason? Take your pick. http://sql-info.de/mysql/gotchas.html
Up until recently, MySQL had no transaction or atomic operation support. As such, you need to write application code to trap problems. Whereas with Oracle, when you run an atomic operation, you know without certainty whether the query failed in its entirety. I also believe stored procedure support is somewhat lacking in MySQL (however, there is that new Java function support). The MySQL 3 tree does not enforce constraints which is something most essential for data integrity. MySQL does not have subrow locking, whereas enterprise databases do. Once again, MySQL is great. I use it. However, it is not enterprise.
"Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman
"but the Slashdot database regularly becomes confused, such as posting a comment to the wrong story"
That's not the db... around here, we call them "trolls"...
;)
This is proof positive that the marketing engine has started churning in the Linux / Open Source arena. The quoted statistics are meaningless. Here are is a short list of things (in no particular order) that are wrong with this "study" (who paid for it anyway?):
Lines of code is meaningless as a reliable measure of anything. The most this number can be used for is for assessing the high level complexity (i.e. simple, non-trivial, or hard) of an application / code construct. It is absolutely pointless to compare two different applications against each other by lines of code. This means that you can say that one is non-trivial and the other is complex or you can say that both are complex, but there is no valid way of determining (by using this particular metric) that one application is more complex than the other. I believe this is the fundamental flaw in this "study".
The study igores capabilities. If application A has feature a, b, and c, and application B has features a, b, c, d, e, f, g, h , is it even meaningful to compare the number of defects detected between applications A and B? And no - normalizing it by lines of code is not valid (see previous point).
Testing methodology : from the defects quoted in the article, it appears as if they "study" did white box testing on MySQL. This is hardly complete. While null pointer dereferences are certainly terrible, I would be also very very concerned about bugs pertaining to SQL capabilites, data integrity, performance, etc. If I go out and do a comparison of RDBMS's for a client, my report wouldnt be complete at all without covering these areas. How come the "study" doesnt mention any of these things?
Lets face it : this is a paid propaganda article by the marketing machinery. Much like Microsoft has done in the past.
There is no such thing as luck. Luck is nothing but an absence of bad luck.
It does indeed sound a bit like that, and with good reason. If you notice, the "indepedent review" was carried out by Reasoning, Inc., and we've heard of them before in these parts.
For the benefit of those who haven't seen this trollfest^H^H^H^H^H^H^H^H^Hstory in its previous incarnations, Reasoning's services spot what some people call "systematic" errors, things like NULL pointer dereferencing or the use of uninitialised variables. As many people note every time this subject comes up, any smart development team will use a tool like Lint to check their code anyway, as a required step before check-in and/or as a regular, automated check of the entire codebase, and so any smart development team should find all such errors immediately. IOWs, it's grossly unfair to compare open and closed source "code quality" on this basis. Any project that has errors like this in it at all isn't serious about quality, and it shouldn't take an external study to point this out.
Serious code quality is not dictated by how many mechanical errors there are that slip through because of weaknesses in the implementation language. Rather, it is indicated by how many "genuine" logic errors -- cases where the output differs unintentionally from the specifications -- there are. Of course, no automated process can identify those, but to get a meaningful comparison of code quality, you'd need to investigate that aspect, rather than kindergarten mistakes.
There are other objections to their principal metric as well. For starters, source code layout is not normally significant in C, C++ or Java, so any metric based on line count is going to be flawed at best. But the big objection is that they're talking about childish mistakes, and comparing supposedly world class software based on childish mistakes isn't helpful (except to dispel the myth that some big name products have sensible development processes).
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.