MySQL & Open Source Code Quality
dozek writes "Perhaps another rung for the Open Source model of software development, eWeek reports that an independent study of the MySQL source code found it to be "in fact six times better than that of comparable commercial, proprietary code." You can read the eWeek write-up or the actual research paper (reg. required)."
Six times better? I didn't know it was possible to quantify code quality in that matter. Interesting.
"Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman
Perhaps another rung for the Open Source model of software development
Uhh... no.
It's is a glowing report for this particular open source project but that brush shouldn't be used to paint all open source. That will just lull open source developers into a false sense of euphoric contentment. Code quality didn't get this far by having a fixed target, that target should be a carrot on a stick that will never quite be reached.
Trolling is a art,
Undoubtedly()
{
when();
you = measure(quality);
in.defects();
per->lines_of(code, anyone);
can = write(good, solid, code);
}
And line of code for line of code there are less known errors in MySQL than there are assumed/predicted/mean errors in their commercial counterparts, but that doesn't answer the question of how does MySQL compare performance-wise to Oracle or <flameretardent coating>MS SQL 2003</flameretardent coating>
Just my 0.03 (adjusted for inflation)
Music is everybody's possession.
It's only publishers who think that people own it.
Fuck Beta
~John Lenno
I agree with you that you can't simply measure quality but...
If you just RTFA, you'll see that is not "6 times better" but "6 times less bugs found then the average on commercial products"
The only thing wrong in the article:
They should replace the term "commercial" with "closed source", because Mysql is also a commercial product and what makes it different is the open source model.
"Defect" is also a difficult term to define. Some errors are much worse than others. It's not all about numbers, folks. Don't get me wrong, I'm not saying that MySQL isn't a great product. I just get skeptical when I hear things talked about in terms of "better" and "best."
"Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman
I do believe that Open Source is better than proprietory. Faults per 1000 lines of code may seem like a valid scale, but I think it is indicatory at best, not proof.
* It does not take into account the design of the software. This is often as important as the actual quality of the code.
* It does not take into account the kind of errors. This is related to the first, but a buffer overflow that allows root access is worse than a failed instruction.
* It does not even take the length of lines into account. Shortening the lines could lower the number, without actually changing anything.
So, small victory, but the race goes on.
the pun is mightier than the sword
This just looks like some quasi-scientific statement, trying to express things as a number that really don't fit such a representation. For example, as the number of defects decreases, it becomes increasingly more difficult to find the ones that are left. And is code that contains no bugs at all infinitely much better than code that contains a single bug which hardly ever occurs?
I'm under the impression that most "bugs" in software (certainly most bugs in my code) aren't bugs like these in the article (null dereferences, uninitialized variables, etc), but they're algorithm bugs. As in, there's a subtle interplay between different parts of complicated algorithms that can be easy for programmers to miss. Those types of bugs are going to be much harder to find, and certainly not going to be found in analysis such as this one.
And that's because...of nothing in particular? At least give a reason *why* you have an opinion, if you're gonna do that. Is it some vague feeling of fear? What's the reason?
"Murphy was an optimist" - O'Toole's commentary on Murphy's Law
I don't think MySQL is intended to be `comparable' to OracleSQL, but someone else may be able to clarify.
philcrissman.com.
So how many of the eWeek people do you think saw the code to MS SQL Server or Oracal SQL? I am hightly doubting that they even were able to get to the front door to knock on either of the doors to ask if they could see the code. I mean this just looks like pure propoganda to anybody that has half a brain and keeps up with the industry.
Don't get me wrong I love MySQL, but these types of articles are just as bad as the people saying that MacOS X isn't that secure because of the less users on it. Or the guy claiming that MS is way superior in the Internet Server world. These type of articles are just there to cause controversy and seperate us as a community Mac/Windows/Linux combined.
I am not putting any merrit in this article and neither should you.
Not only is it hard to define defect (and it is very obvious that some defects are worse than others), but this code review sounds like it only spots "grammatical" or style errors in the code. It doesn't sound like it could find a defect in an algorithm implementation or logic. To me, these are where the true defects are, in the logic/reasoning breakdowns.
Geek used to be a four letter word. Now it's a six-figure one.
There's a hell of a difference between 235,667 lines of code and 35 million lines of code. Just like there's a difference between 1000 lines of code and 235,667 lines of code. That is, the more line of code, the more likely a defect will survive.
First off, I think MySQL is a fantastic product. Its the perfect mix of speed and ease of use well suited for small to medium sized datastores where speed and relaibility are a must. That being said, I think it's unfair to describe this product alongside others such as Oracle, MSSQL (blow me guys, its a great product) and even PostgreSQL and SAP DB (which is be best OpenSource option in my opinion). The codebase for MySQL will never acheive the magnitude of the aforementioned products so it should be used that way. Just my 2 cents.
My mother never saw the irony in calling me a son-of-a-bitch.
This could also mean that the code is bloated.
Which could also be a symptom of atomicity (or rather, lack thereof) of transactions.
0.09 vs 0.53 bugs/KLOC can also mean mysql has six times the amount of code per line, compared to an average "commercial" program. Those numbers should be divided by a code-density-factor.
I'm not sure what you mean by "grammatical" or style errors. If you're talking about syntax errors, those should prevent the code from compiling. I'm not aware of how coding style can be an error (unless you're programming in Python).
The specific errors in MySQL were dereferencing null pointers, failure to deallocate memory (memory leaks), and use an uninitialized variable. These aren't the only bugs that such an analysis can find; they're the ones that were found in MySQL. And they're definitely errors in logic.
Certainly, there are bugs that such an analysis can't find. If you define PI as 3.15, your calculations are going to be off. If you create a function to determine the circumference of a circle as 2 * PI * Diameter, you've got a bug. I suspect that those are the types of errors in logic that you were referring to, and you're right that they will not be caught by a code analysis. However, that doesn't mean that comparing the frequence of the errors that CAN be caught between two programs is an invalid act. From my experience, programmers who make fewer of the former errors also make fewer of the latter. Analyzing catchable errors is a good metric for the frequency of errors in a given source tree, even if all errors aren't caught.
"The legitimate powers of government extend only to such acts as are injurious to others." Thomas Jefferson.
open sourcers don't necessarily get paid to release code, so they don't have the luxury of releasing shit just so they can keep their jobs by releasing updates for the next 5 years. when a commercial product finally DOES become useable they make a whole new buggy/bloated product that they can release fixes and patches for.
"...if you don't like your job, you don't strike. You just go in every day and do it really half-assed..." -Homer
This is proof positive that the marketing engine has started churning in the Linux / Open Source arena. The quoted statistics are meaningless. Here are is a short list of things (in no particular order) that are wrong with this "study" (who paid for it anyway?):
Lines of code is meaningless as a reliable measure of anything. The most this number can be used for is for assessing the high level complexity (i.e. simple, non-trivial, or hard) of an application / code construct. It is absolutely pointless to compare two different applications against each other by lines of code. This means that you can say that one is non-trivial and the other is complex or you can say that both are complex, but there is no valid way of determining (by using this particular metric) that one application is more complex than the other. I believe this is the fundamental flaw in this "study".
The study igores capabilities. If application A has feature a, b, and c, and application B has features a, b, c, d, e, f, g, h , is it even meaningful to compare the number of defects detected between applications A and B? And no - normalizing it by lines of code is not valid (see previous point).
Testing methodology : from the defects quoted in the article, it appears as if they "study" did white box testing on MySQL. This is hardly complete. While null pointer dereferences are certainly terrible, I would be also very very concerned about bugs pertaining to SQL capabilites, data integrity, performance, etc. If I go out and do a comparison of RDBMS's for a client, my report wouldnt be complete at all without covering these areas. How come the "study" doesnt mention any of these things?
Lets face it : this is a paid propaganda article by the marketing machinery. Much like Microsoft has done in the past.
There is no such thing as luck. Luck is nothing but an absence of bad luck.
It is really embarassing to have bad code with your name on it, released to the public.
Not only that, but there is a small percentage of coders when presented with an ugly solution to a problem, will pretty it up, just "because". And it is a good way to get known in the OSS world.
Unlike the corporate world, working but ugly code is hidden deeper and deeper, and people go out of their way to avoid it.
yes, sure. Stuff like stored procedures or views are just toy features nobody really needs for database development... Better do all those things in you application code, makes it so much easier;) Come on, if you really need some ultra-fast small reduced-to-the-max sql database you might look at sqlite, if going for some bigger real life application you might discover that those bloated features actually do make sense... and one day you might find yourself posting things like "foreign key constraints would be so cool to have in mysql" as some of us did ages ago...
sick of sigs... *sigh*
A flawless implementation of a crap algorithm is still crap. I don't care if your bubble-sort routine has no memory leaks or buffer overruns; it still scales O(N^2). Likewise, a so-called "database" which does not implement key features like transactions and stored procedures is fundamentally flawed even if there are zero coding errors.
MySQL may be well-written, but it's still a piece of crap by the standards of any professional DBA.
Why is it that the proponents of "one nation under God" are so eager to get rid of "liberty and justice for all"?
Sorry, but my opinion is pretty strong on this. Going from anything Oracle to MySQL is NOT trivial.
Sorry, but until MySQL has a mode where ALL tables are transaction safe, or at least throws an error when you try to create a fk reference to a non-transaction safe table, it's transactions are too prone to data loss due to human error.
It's a good data store, but the guys programming it have to "get it" that transactions can't be optional in certain types of databases, and neither can constraints, or fk enforcement.
MySQL has a tendency of failing to do what you thought it did, and failing to report an error so you know. This is a legacy left over from being a SQL interpreter over ISAM files. It makes MySQL a great choice for content management, but a dangerous choice for transactional systems.
--- It is not the things we do which we regret the most, but the things which we don't do.
No it doesn't. It "proves" that on average, by line, MySQL has fewer errors in code. It says nothing of the severity of the errors in either package.
Furthermore- MySQL is not even close to being equal in feature set to almost any commercial DB; replication/backup sucks, it's not ACID compliant, it had no transaction support until recently, no stored procedures, no triggers.
How on earth could you possibly compare it to almost any commercial SQL DB which has all these...and say MySQL is better?
A lot of people knew that.
No, every two bit web designer thinks its the greatest thing since sliced bread, since they think a select w/group+sort is an advanced query. Every professional DBA I've met refuses to work with MySQL and/or hates it, and they can go on for an hour about why. When are you people going to realize that PostgreSQL is so much better than MySQL, save some incredibly risky performance options?
MySQL is awesome! But let's be careful about this story, okay? It's the over-generalization that gives OSS/Linux advocates a bad name ("The Gimp is equivalent to Photoshop!").
But you just said "This proves that MySQL is better than commercial offerings!"
Please help metamoderate.
it really depends on how heavily your developers have embraced 8i. as another poster mentioned - if they are really exploiting it then you will have a big migration task. if your applications only perform basic SQL statements - then you could probably get away with it. actually, if all you do is perform basic SQL, then you aren't utilising oracle to its full potential and you'd probably get a better ROI (return on investment) by moving to MySQL.
Porting between dbms products depends primarily on two issues:
1. usage of vendor extensions
2. usage of standard relational functionality
Generally speaking, if you've minimized #1 in your application you can easily port between Oracle, DB2, SQL Server, Sybase, Postgesql, etc: sure, you could hit some issues with jdbc drivers, and may need to port a few idioms (partitioning for example), but it shouldn't be a killer. But going from any of the above list to mysql isn't suggested: you'll get hung up on #2 (it doesn't support standard SQL or DDL)
Realistically, if I wanted to go to a less expensive product than oracle I'd look down this list:
- db2 (1/3 to 1/2 oracle cost)
- sybase (cheaper than oracle, but dwindling market share)
- firebird (very low cost)
- postgresql (free)
All of the above are mature relational databases that you could port oracle applications from.
But you mentioned 'mission critical'. At this point I'd be very cautious about either postgesql or mysql in a mission-critical role. How important is it to you that you can recover 100% of your data in the event of a database crash? I'd put my money (and career) on db2 or oracle delivering that kind of quality over mysql...
It does indeed sound a bit like that, and with good reason. If you notice, the "indepedent review" was carried out by Reasoning, Inc., and we've heard of them before in these parts.
For the benefit of those who haven't seen this trollfest^H^H^H^H^H^H^H^H^Hstory in its previous incarnations, Reasoning's services spot what some people call "systematic" errors, things like NULL pointer dereferencing or the use of uninitialised variables. As many people note every time this subject comes up, any smart development team will use a tool like Lint to check their code anyway, as a required step before check-in and/or as a regular, automated check of the entire codebase, and so any smart development team should find all such errors immediately. IOWs, it's grossly unfair to compare open and closed source "code quality" on this basis. Any project that has errors like this in it at all isn't serious about quality, and it shouldn't take an external study to point this out.
Serious code quality is not dictated by how many mechanical errors there are that slip through because of weaknesses in the implementation language. Rather, it is indicated by how many "genuine" logic errors -- cases where the output differs unintentionally from the specifications -- there are. Of course, no automated process can identify those, but to get a meaningful comparison of code quality, you'd need to investigate that aspect, rather than kindergarten mistakes.
There are other objections to their principal metric as well. For starters, source code layout is not normally significant in C, C++ or Java, so any metric based on line count is going to be flawed at best. But the big objection is that they're talking about childish mistakes, and comparing supposedly world class software based on childish mistakes isn't helpful (except to dispel the myth that some big name products have sensible development processes).
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
...they quantified it by dividing verified defects by lines of code.
/. article, the task I was working on was adding some more tests to a test suite for a package that I'm porting to a number of different systems.)
If I write a script to go through my C and perl code, and make sure that there's a newline before and after every brace, that will approximately double the lines of code, and will thus cut my error rate in half.
This isn't a joke; I've done this on a couple of projects where they measured output by lines of code, just to illustrate the real impact of such measures.
OTOH, if I deleted the comments from my code, that would approximately double my error rate, so I guess I won't do that.
I'm also reminded of a project that I worked on a while back in which nearly every routine had some sort of error, sometimes several, and I didn't fix any of them. This would look really bad, I know. But you can probably guess what my task was. I was writing a test suite for a compiler. Most of the tests were to verify that the compiler would catch a particular kind of error. So of course my code contained that error, and the test script verified that the result was the proper error message.
This is one of the fundamental problems with nearly every definition I've ever seen of "quality code". They usually don't measure the suitability of the code for the task. If your task is to measure a system's response to failures, you code will of course intentionally produce those errors in order to determine the system's responses. So what is an error in other situations is exactly correct code. Counting errors detected without asking what the task was gives you exactly the wrong results in such a case.
I'm not sure I'd want my name associated with a project that didn't include this sort of test code in the basic distribution. If there are problems with an installation, I want to know about them before the users start using the stuff. And I want to know in a manner that will pinpoint the problems, not from the usual bug report that typically describes some symptom that is only remotely related to the actual problem. So nearly everything that I work on has a component with a high error rate, run under the control of a script that verifies the correctness of thee error messages. If the installation doesn't handle the errors correctly, the users are given output that will tell me what the problem is.
I'd only be impressed by a study that handles such a test suite correctly. One that counts such "errors" is worse than useless; it actively discourages useful test suites.
(Actually, just before reading this
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
The code scanners I've looked at will flag potential errors even if it's impossible to reach the error condition in code, so it's possible that some or all of that stuff may never have actually happened, but it's generally better to program defensively anyway. All it takes is for some bozo to change your if condition and all of a sudden you're segving all over your customer's important data. 15 null pointer derefences in nearly a quarter million lines of code is a pretty low number though. I've seen more than that in a single thousand line file written by "professionals."
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
I've worked on several projects interacting with SQL databases and I've only seen one really take advantage of the power of the database. Most of them are using Oracle as a glorified DBASE III, and as a glorified DBASE III, MySQL is much less expensive. And I've seen entire companies built around DBASE III applications.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
I find mathmatical notation to be clearer and more succinct than the longhand equivilent. "O(n)" is, IMHO, a superior way of saying "scales linearly". All of the really good engineers I've worked with over the years have held the same opinion.
As to not having used MySQL in a long time, that's true. I don't use MySQL because I see no purpose for it. If I need a fast non-relational, non-transactional data store I'll use an ISAM solution. If I need a real relational database I'll use Sybase or Oracle (or MS-SQL if I have no choice, or even PostgreSQL if I have to make an open-source zealot happy). The only time I'd use MySQL was if I needed a semi-relational database with half-assed transactions, no stored procedures or triggers, broken referential integrity, a plethora of non-standard behaviors, and rampant data integrity issues.
If MySQL had stuck to it's original vision of being a SQL frontend to an ISAM database, it might actually be worthwhile. Instead it's become a bastard hybrid that's too bloated to be a good ISAM db and too broken to be a good relational db. I'll admit that there are jobs that MySQL can do well -- however, my professional opinion is that there are better tools for that class of tasks.
Why is it that the proponents of "one nation under God" are so eager to get rid of "liberty and justice for all"?
Most of your points on MySQL are out of date. Its featureset has progressed a great deal since, apparently, you last looked into it. Even way back when this was a hot topic (when PostgreSQL, another excellent open source DB, was an up-and-comer), MySQL developers were already saying that most of people's concerns were being addressed in upcoming releasesd... Those releases have since come and gone (mostly in the form of 4.0, though IMHO, 4.1 is MySQL's finest moment, and its current release status as alpha is kind of funny given that it's been rock stable for a year).
Just off the top of my head, you mention ACID. MySQL now offers a choice of back-end table managers that range from the original fast, but strictly non-ACID version to Berkely DB (which is fairly fast and supports transactions, but I think falls short of ACID in terms of rollback) and the fully ACID InnoDB, which is the (now open source) back end from the Progress database.
So take your pick, depending on your app. Do you want speed? Transactions? Full ACID? Better yet, you can make that choice on a table-by-table basis!
MySQL also has the best full-text-searching features I've seen in any DB, open or closed.
There are limitations, and I might choose another DB for certain specific tasks (e.g. Oracle for statisics in the DB) but MySQL is a great first choice for most projects.
A flawless implementation of a crap algorithm is still crap.
No.. a flawless implementation of a crap algorithm just doesn't scale well. Of course bug rate is not the only criteria used when evaluating software, but people spend hundreds of man-hours fixing bugs.
It demonstrates that the quality of open source code is not automatically worse than professional proprietary code (which some people believe is the case). The important thing is that it's at least an attempt at formal study (and not simply personal collating of anecdotal reports).
Understand the limitations of the tool you choose to work with and live with it or use a different tool. Nobodies forcing you to use any specific tool.
No, but somebody is trumpeting the lory of that tool as end-all-be-all, while simply ignoring the points of people who dare to break rank, spurn the kool-aid, and point out flaws. I swear every damn slashdot article about open source tools has some thread like this. And every time we get some version of "love it or leave it." What happened to actually trying to improve on the basis of valid criticism?
The article and the posts following it seem to promote MySQL as a production database to compete with Oracle. It is clear that while it is a nice database with good features and is useful for many projects, it lacks many things which DBAs like about RDBMS systems like Oracle. It is also clear that if any of the posts here and linked articles accurately describe MySQL behaviour it violates some very basic rules of software design.