MySQL & Open Source Code Quality
dozek writes "Perhaps another rung for the Open Source model of software development, eWeek reports that an independent study of the MySQL source code found it to be "in fact six times better than that of comparable commercial, proprietary code." You can read the eWeek write-up or the actual research paper (reg. required)."
Six times better? I didn't know it was possible to quantify code quality in that matter. Interesting.
"Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman
...until I release my MySQL source code to the open source community. Then that 6x multiplier will drop down to 2x.
Yeah, it's really that bad. Gets the job done, though. Hell to maintain. Probably would've helped if I documented any of it.
Maybe I should read that Code Complete book I keep meaning to read sometime.
Creator of the popular web game Proximity
Perhaps another rung for the Open Source model of software development
Uhh... no.
It's is a glowing report for this particular open source project but that brush shouldn't be used to paint all open source. That will just lull open source developers into a false sense of euphoric contentment. Code quality didn't get this far by having a fixed target, that target should be a carrot on a stick that will never quite be reached.
Trolling is a art,
Undoubtedly()
{
when();
you = measure(quality);
in.defects();
per->lines_of(code, anyone);
can = write(good, solid, code);
}
MySQL is not touted as Enterprise because its not Enterprise. Sure, it's fine for running Slashdot, but I wouldn't want it storing mission critical data. Oracle may be slower, but I'd much rather trust it to make sure my data is properly stored than MySQL.
"Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman
...they quantified it by dividing verified defects by lines of code. MySQL had 0.09 bugs/KLOC while the "commercial" defect density was 0.53 bugs/KLOC. (Their use of the term "commercial" confused me since MySQL is, after all, a "commercial" project, just an open-source one.)
All's true that is mistrusted
This article must have been written by supporters of closed software. The ratio of 0.57/0.09 is 6.333~ and the article states it is 6. Clearly FUD. Let the flaming begin!
And line of code for line of code there are less known errors in MySQL than there are assumed/predicted/mean errors in their commercial counterparts, but that doesn't answer the question of how does MySQL compare performance-wise to Oracle or <flameretardent coating>MS SQL 2003</flameretardent coating>
Just my 0.03 (adjusted for inflation)
Music is everybody's possession.
It's only publishers who think that people own it.
Fuck Beta
~John Lenno
I agree with you that you can't simply measure quality but...
If you just RTFA, you'll see that is not "6 times better" but "6 times less bugs found then the average on commercial products"
The only thing wrong in the article:
They should replace the term "commercial" with "closed source", because Mysql is also a commercial product and what makes it different is the open source model.
As strong proponent of MySQL, I'd be very curious to see how it stacks up in those regards.
Anyone know how this one is faring? Will it ever be released? It's based on GCC, right? How many students can it pass between until it's "distribution"?
The reason I'm asking is because I saw that one member of the team has jumped over to a company called Coverity where one can read:
I just think it'd be horrible if they used the GPL'ed GCC to develop their methods (having access to a full portable compiler onto which to do research and development is hardly a "small thing"), and then lock these same methods away from the community.
I'm grateful for their work on checking linux, but really... this smells bad, IMHO.
(If you don't know what I'm taking about, don't assume it's off-topic, okay? The Standford Checker is a related topic to the Reasoning analysis of MySQL, and I'm not sure we'll ever have a _better_ fitting topic to discuss this)
Belief is the currency of delusion.
I do believe that Open Source is better than proprietory. Faults per 1000 lines of code may seem like a valid scale, but I think it is indicatory at best, not proof.
* It does not take into account the design of the software. This is often as important as the actual quality of the code.
* It does not take into account the kind of errors. This is related to the first, but a buffer overflow that allows root access is worse than a failed instruction.
* It does not even take the length of lines into account. Shortening the lines could lower the number, without actually changing anything.
So, small victory, but the race goes on.
the pun is mightier than the sword
This just looks like some quasi-scientific statement, trying to express things as a number that really don't fit such a representation. For example, as the number of defects decreases, it becomes increasingly more difficult to find the ones that are left. And is code that contains no bugs at all infinitely much better than code that contains a single bug which hardly ever occurs?
The main difference between open and *MOST* closed code is the fact that the early release of closed code means mucho mas money to corporate pigs and dogs, thus, proper requirements analysis, design, coding and testing are usually pummeled in the name of happy-go-lucky capitalism. "It will be ready when it is ready." -Carmack "I love America!" -Murphy
HAD
I don't think MySQL is intended to be `comparable' to OracleSQL, but someone else may be able to clarify.
philcrissman.com.
Since we're measuring Defects per 1000 lines, perhaps calling them "Gates" or "Ballmers" might be more appropriate.
I've used mySQL, Oracle, MS SQL, DB2, and MSDE. I'm not sure I get your comment about MS SQL server. Like any other RDBMS, a little performance tuning goes a long way. As a matter of fact, until Oracle's release of 10g, MS SQL beat all commercial offerings in the TPC benchmarks.
MS has a buggy os and an awful model for business practice, but I think MS SQL server is a fairly nice offering. It's too bad it only runs on Windows servers though.
Saying Android is a family of phones is akin to saying Linux is a family of PCs.
Neener neener!
Now, I'm sure we can all be very mature about this...
Alito: A vote for Alito is a punch in the eye to put that bitch back in her place!
This "proves" that MySQL is better than commercial offerings. Good. A lot of people knew that. Hats off to the developers. But...
1. This cannot be generalized into a property of all open source projects.
2. It's more a tribute to the architecture and original core developers of MySQL than anything else.
3. Realize that even though MySQL is an open source product, MySQL AB is the *company* that organizes and pays for MySQL development. So, again, you can't generalize this into something that covers late night hackers working on personal projects in their basements (the open source geek fantasy).
MySQL is awesome! But let's be careful about this story, okay? It's the over-generalization that gives OSS/Linux advocates a bad name ("The Gimp is equivalent to Photoshop!").
Because there are portions of the MySQL code that are just painful to look at.
Take for instance the part that takes as input the key index size and calculates internal buffer sizes. The option's size is an unsigned long long, but they cast it to an unsigned long all over the place, do in-place bitshifting on the cast (and cause it to wrap -- try specifying 4G for your key index sometime and you'll get 0), and the quality of code in that case is just painfully horrible to look at or even figure out what it's doing.
I could only shudder to think what the quality of the commercial product looked like, in comparison. Hell, I'll have nightmares if I consider the quality of MySQL++ as a comparison..
--jordan
Need a particular reason? Take your pick. http://sql-info.de/mysql/gotchas.html
So how many of the eWeek people do you think saw the code to MS SQL Server or Oracal SQL? I am hightly doubting that they even were able to get to the front door to knock on either of the doors to ask if they could see the code. I mean this just looks like pure propoganda to anybody that has half a brain and keeps up with the industry.
Don't get me wrong I love MySQL, but these types of articles are just as bad as the people saying that MacOS X isn't that secure because of the less users on it. Or the guy claiming that MS is way superior in the Internet Server world. These type of articles are just there to cause controversy and seperate us as a community Mac/Windows/Linux combined.
I am not putting any merrit in this article and neither should you.
Up until recently, MySQL had no transaction or atomic operation support. As such, you need to write application code to trap problems. Whereas with Oracle, when you run an atomic operation, you know without certainty whether the query failed in its entirety. I also believe stored procedure support is somewhat lacking in MySQL (however, there is that new Java function support). The MySQL 3 tree does not enforce constraints which is something most essential for data integrity. MySQL does not have subrow locking, whereas enterprise databases do. Once again, MySQL is great. I use it. However, it is not enterprise.
"Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman
There's a hell of a difference between 235,667 lines of code and 35 million lines of code. Just like there's a difference between 1000 lines of code and 235,667 lines of code. That is, the more line of code, the more likely a defect will survive.
"but the Slashdot database regularly becomes confused, such as posting a comment to the wrong story"
That's not the db... around here, we call them "trolls"...
;)
First off, I think MySQL is a fantastic product. Its the perfect mix of speed and ease of use well suited for small to medium sized datastores where speed and relaibility are a must. That being said, I think it's unfair to describe this product alongside others such as Oracle, MSSQL (blow me guys, its a great product) and even PostgreSQL and SAP DB (which is be best OpenSource option in my opinion). The codebase for MySQL will never acheive the magnitude of the aforementioned products so it should be used that way. Just my 2 cents.
My mother never saw the irony in calling me a son-of-a-bitch.
I'm in the midst of upgrading a SQL Server 2000 installation. MS issued their latest patch in August - a mere 56 MB patch. Hopefully that will fix some of the flakiness I've been seeing.
[Insert pithy quote here]
That open source patch was quite shoddily and hastily written. It wasn't even a patch really. Using it as representative of open source is not fair in any way whatsoever to other successful open source products.
"Now apply the 'Rule of 6 times' to Microsoft's closed source IE patches..."
There is no 'Rule of 6 times'. An analysis concluded that MySQL had a very limited number of defects in their code base. Kudos to them. This doesn't define a rule to be used in the open source vs. closed source holy war.
those are just bugs! what about lack of features?
at least there's row-level locking now... finally.
2 1337 4 u!
MS SQL is basically a revamped Sybase. So, on UNIX & Linux you could use Sybase ASE.
open sourcers don't necessarily get paid to release code, so they don't have the luxury of releasing shit just so they can keep their jobs by releasing updates for the next 5 years. when a commercial product finally DOES become useable they make a whole new buggy/bloated product that they can release fixes and patches for.
"...if you don't like your job, you don't strike. You just go in every day and do it really half-assed..." -Homer
This is proof positive that the marketing engine has started churning in the Linux / Open Source arena. The quoted statistics are meaningless. Here are is a short list of things (in no particular order) that are wrong with this "study" (who paid for it anyway?):
Lines of code is meaningless as a reliable measure of anything. The most this number can be used for is for assessing the high level complexity (i.e. simple, non-trivial, or hard) of an application / code construct. It is absolutely pointless to compare two different applications against each other by lines of code. This means that you can say that one is non-trivial and the other is complex or you can say that both are complex, but there is no valid way of determining (by using this particular metric) that one application is more complex than the other. I believe this is the fundamental flaw in this "study".
The study igores capabilities. If application A has feature a, b, and c, and application B has features a, b, c, d, e, f, g, h , is it even meaningful to compare the number of defects detected between applications A and B? And no - normalizing it by lines of code is not valid (see previous point).
Testing methodology : from the defects quoted in the article, it appears as if they "study" did white box testing on MySQL. This is hardly complete. While null pointer dereferences are certainly terrible, I would be also very very concerned about bugs pertaining to SQL capabilites, data integrity, performance, etc. If I go out and do a comparison of RDBMS's for a client, my report wouldnt be complete at all without covering these areas. How come the "study" doesnt mention any of these things?
Lets face it : this is a paid propaganda article by the marketing machinery. Much like Microsoft has done in the past.
There is no such thing as luck. Luck is nothing but an absence of bad luck.
It is really embarassing to have bad code with your name on it, released to the public.
Not only that, but there is a small percentage of coders when presented with an ugly solution to a problem, will pretty it up, just "because". And it is a good way to get known in the OSS world.
Unlike the corporate world, working but ugly code is hidden deeper and deeper, and people go out of their way to avoid it.
Seen lots of intelligent comments about lenght of lines and potential bloat skewing the results, but there is one more issue to consider: design.
No matter how good the coding itself, if the design is broken, the tool is broken, period.
And MySQL has a broken design. So broken that the upgrade path isn't MySQL X or something the like, but MaxSQL -- in fact, rebranded SAPdb. That SAPdb is at most at Oracle v7.2 levels tells lots about MySQL.
I could be more specific, but do your own research in Google -- lack of SQL compliance, lack of features to enable declarative coding at the server instead of procedural client code, and so on.
Now, the interesting part. Suppose MySQL AB would have a sudden insight and repent of their un-SQL, anti-relational ways. Unlikely, you say; yet possible. Now suddenly they have to recode, or change drastically the current code. The resulting tool will be probably much bigger than the current, because SQL is baroque; or even worse than much bigger, because of MySQL backwards compatibility.
The sheer bloat will make even this faulty measure of bugs/KLoC skyrocket. Now, run the comparision again...
Not to say SQL compliance shouldn't be attained. In fact, bloat in the SQL DBMS is a more than good enough tradeoff against bloat in the application. The ideal would be a RDBMS, but while there isn't a MyDataphor a SQL DBMS should do.
Even today, I don't care about comparing to, say, Oracle or MS SQL Server. IBM DB2 would be a better baseline, but best of all the real competitors: PostgreSQL and Alphora Dataphor.
Leandro Guimarães Faria Corcete DUTRA
DA, DBA, SysAdmin, Data Modeller
GNU Project, Debian GNU/Lin
I'm a little confused. I thought I understood how to make profit with the GPL, but now I'm not sure.
MySQL GPL'ed all their products. (presumably so they could get developers and bug-fixes to their product for no charge.) However, they offer "commercial" licenses for people who want to integrate MySQL into their software, but don't want to GPL it. How can they do that? Presumably, any improvements/bugfixes/modifications that came from the community would be GPL, and therefore cannot be re-integrated under a more restricted license. I'm a little confused here. How can they take code that has been released under the GPL and turn around and release it under a more restrictive license?
yes, sure. Stuff like stored procedures or views are just toy features nobody really needs for database development... Better do all those things in you application code, makes it so much easier;) Come on, if you really need some ultra-fast small reduced-to-the-max sql database you might look at sqlite, if going for some bigger real life application you might discover that those bloated features actually do make sense... and one day you might find yourself posting things like "foreign key constraints would be so cool to have in mysql" as some of us did ages ago...
sick of sigs... *sigh*
By including the use of 'stdio.h' to which we (SCO) own the rights to, you have violated the DMCA.
MrHanky, you now must either pay us for the use of said file ($699) or ceist and decist.
We hold rights to your future earnings from your use of our file, and we option the rights to your childrens earnings.
Thank you
Daryl
soo so sorry... It just popped into my head...
Why worry? Each of us is wearing an unlicensed "nucular" accelerator on his back.
Sig changed for readability by G.W.
Sorry, but my opinion is pretty strong on this. Going from anything Oracle to MySQL is NOT trivial.
there are foreign key constraints, but only on certain table types, and only in certain versions, and only on certain column types.
on mysql 3.x, the table types that support foreign key constraints don't support transactions, and vice versa.
That they're filing suit against MYSQL for violating their IP on code quality.
it really depends on how heavily your developers have embraced 8i. as another poster mentioned - if they are really exploiting it then you will have a big migration task. if your applications only perform basic SQL statements - then you could probably get away with it. actually, if all you do is perform basic SQL, then you aren't utilising oracle to its full potential and you'd probably get a better ROI (return on investment) by moving to MySQL.
Porting between dbms products depends primarily on two issues:
1. usage of vendor extensions
2. usage of standard relational functionality
Generally speaking, if you've minimized #1 in your application you can easily port between Oracle, DB2, SQL Server, Sybase, Postgesql, etc: sure, you could hit some issues with jdbc drivers, and may need to port a few idioms (partitioning for example), but it shouldn't be a killer. But going from any of the above list to mysql isn't suggested: you'll get hung up on #2 (it doesn't support standard SQL or DDL)
Realistically, if I wanted to go to a less expensive product than oracle I'd look down this list:
- db2 (1/3 to 1/2 oracle cost)
- sybase (cheaper than oracle, but dwindling market share)
- firebird (very low cost)
- postgresql (free)
All of the above are mature relational databases that you could port oracle applications from.
But you mentioned 'mission critical'. At this point I'd be very cautious about either postgesql or mysql in a mission-critical role. How important is it to you that you can recover 100% of your data in the event of a database crash? I'd put my money (and career) on db2 or oracle delivering that kind of quality over mysql...
It does indeed sound a bit like that, and with good reason. If you notice, the "indepedent review" was carried out by Reasoning, Inc., and we've heard of them before in these parts.
For the benefit of those who haven't seen this trollfest^H^H^H^H^H^H^H^H^Hstory in its previous incarnations, Reasoning's services spot what some people call "systematic" errors, things like NULL pointer dereferencing or the use of uninitialised variables. As many people note every time this subject comes up, any smart development team will use a tool like Lint to check their code anyway, as a required step before check-in and/or as a regular, automated check of the entire codebase, and so any smart development team should find all such errors immediately. IOWs, it's grossly unfair to compare open and closed source "code quality" on this basis. Any project that has errors like this in it at all isn't serious about quality, and it shouldn't take an external study to point this out.
Serious code quality is not dictated by how many mechanical errors there are that slip through because of weaknesses in the implementation language. Rather, it is indicated by how many "genuine" logic errors -- cases where the output differs unintentionally from the specifications -- there are. Of course, no automated process can identify those, but to get a meaningful comparison of code quality, you'd need to investigate that aspect, rather than kindergarten mistakes.
There are other objections to their principal metric as well. For starters, source code layout is not normally significant in C, C++ or Java, so any metric based on line count is going to be flawed at best. But the big objection is that they're talking about childish mistakes, and comparing supposedly world class software based on childish mistakes isn't helpful (except to dispel the myth that some big name products have sensible development processes).
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
In my opinion, there is no substantial difference in code quality between open source software and proprietary software.
/usr/bin/env. Kind of confusing %-)
/usr/local, which is the default location) and Perl 5, just to compile some C/C++-Code?
:-) is Sun JVM vs. GCJ's libjava. I compiled a very complex multithreaded application using GCJ; it worked fine on uniprocessor machines, but it randomly deadlocked on my multiprocessor server. Finally I found out, that libjava is broken on SMP machines. That doesn't mean, that libjava's code quality is bad; but it still means, that some other Java-Libraries (those of some virtual machines) are more mature, and possibly better tested.
I have seen a lot of very buggy commercial software (including nVidia drivers, IBM's LANManager Services for OS/2, lots of Microsoft's services and utilities in Windows 2000 (for example, "TCP/IP Helper Service") and Netscape 4.7).
On the other hand, I have also seen very bad code quality in open source products - for example, GTK+ (actually, the really bad thing about GTK+ is primarily its install scripts, makefiles and such). Compiling and installing GTK+ on anything else than on a GNU/Linux-machine is some kind of an adventure, while its commercial counterpart, Qt from trolltech, can be compiled quite easily.
- I set the PKGCONFIG env variable before running 'configure'. It worked quite well until line 27.000 (or so) in 'configure', where the variable's content was suddenly gone (BTW, I really dislike debugging 28.000+ line shellscripts). I tried to 'configure' with bourne shell and with korn shell 93.
- It assumes, you have Perl installed (if it's not in your PATH, 'configure' creates funny things like "#! -w" instead of "#!/path/to/perl -w"). The error message produced due to this bug was something like '/usr/bin/env: no such file or directory' - because the perl script was directly started using
- 'configure' forgot to add '-fPIC' to CFLAGS, for this reason all shared libraries where broken. I had to add this option manually.
- Nothing works with 'make'. I had to install 'gmake' (GNU make) instead.
- The actual source code of the core libraries finally compiled, after I had upgraded to gcc 3.3.2. The source code of the 'demo' programs was totally broken, and gcc refused to compile it - once more I had to change the makefiles manually.
-----
One or two weeks later I compiled trolltech's Qt library on the same computer. It was as simple as './configure --platform=platformname && make && make install'.
Why do I need to debug 28.000+ lines of shellscript-code and a lot of makefiles, why do I need to install gmake, pkgconfig (by the way, pkgconfig and most other things in GTK+ don't work well if you don't install everything to
Qt does mainly the same as GTK+, but it simply compiles, using only shellscripts, 'make' and a C/C++ compiler.
Another example regarding code maturity (rather maturity than quality, notice the difference
-----
Some fundamental things about Software:
- The more people read the code, the more people can potentially find and fix bugs (good about open source).
- If a lot of people are allowed to write the code, somebody has to coordinate the work of all these people. Lots of different versions of the same module, written and/or modified by lots of different people need to be combined or coordinated otherwise (bad about most open source projects, because hardly somebody knows, how trustworthy anyone of the developers is; good abous some closed source projects (e.g. Trusted SunOS kernel, IBM SLIC kernel and other trusted code), because only a small group of really good programmers is allowed to write or modify code).
Conclusion: It's good to have only a small group of 'trusted' developers, who write or modify the code, and then to let everyone else read and verify the code.
regards,
octogen
"OMG! And Windows 98 didn't support fast user switching!"
Your analogy limps. Did most other operating systems support fast user switching in 1998? No, and especially not Windows' biggest competition on the desktop.
On the contrary, PostgreSQL has had decent foreign key and transaction and subquery support since 1999.
MySQL STILL doesn't support subqueries in a production version. Foreign keys are only supported by one table type. It doesn't support views. I could go on, but if you really want to see the differences, look at mysql's crash-me comparison chart. The differences that aren't cosmetic, even talking the last MySQL alpha, are pretty annoying.
Yes I do. And I have revived and made perform to to spec god knows how many cretinous foreign key designs by a combination of
The difference between this and a classic foreign key constraint is that this approach always uses efficiently multiple CPUS while a foreignkey is usually a single CPU bound task, it also maintains much less large scope (global or per table) locks and is generally faster for retrieves by a factor of between 10 and 100 times. Due to the TPC vendors have overoptimized join at the expense of many other different things in order to have nice benchmarks..
And in btw, learn the difference between a "real DBA" and a database designer. I mean the one that is the justification for the 20+% salary difference.
Cheers (lessons start at 500 per hour),
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/