Slashdot Mirror


MySQL & Open Source Code Quality

dozek writes "Perhaps another rung for the Open Source model of software development, eWeek reports that an independent study of the MySQL source code found it to be "in fact six times better than that of comparable commercial, proprietary code." You can read the eWeek write-up or the actual research paper (reg. required)."

25 of 446 comments (clear)

  1. Re:Duh! by pyite · · Score: 3, Informative

    MySQL is not touted as Enterprise because its not Enterprise. Sure, it's fine for running Slashdot, but I wouldn't want it storing mission critical data. Oracle may be slower, but I'd much rather trust it to make sure my data is properly stored than MySQL.

    --

    "Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman

  2. If you would RTFA... by Theatetus · · Score: 5, Informative

    ...they quantified it by dividing verified defects by lines of code. MySQL had 0.09 bugs/KLOC while the "commercial" defect density was 0.53 bugs/KLOC. (Their use of the term "commercial" confused me since MySQL is, after all, a "commercial" project, just an open-source one.)

    --
    All's true that is mistrusted
    1. Re:If you would RTFA... by leviramsey · · Score: 2, Informative

      MySQL does have transactions, and has had them for quite some time. Stored procedures are due in a future version.

    2. Re:If you would RTFA... by Tassach · · Score: 2, Informative

      Everyone does not know this, and everyone does not understand it, or I wouldn't have spent a substantial percentage of my carreer cleaning up other people's messes.

      --
      Why is it that the proponents of "one nation under God" are so eager to get rid of "liberty and justice for all"?
    3. Re:If you would RTFA... by scrytch · · Score: 4, Informative

      If only it were MySQL just lacking features that would, after much mudslinging at the ideas themselves, be grudgingly retrofitted into a new table type. MySQL's brokenness goes deeper than that.

      MySQL's attitude toward data integrity can be summed up as "if the constraint can't be satisfied, do it half-assed anyway". I find myself having to write application code to manage data integrity with MySQL, something I can take for granted with a real database.

      --
      I've finally had it: until slashdot gets article moderation, I am not coming back.
    4. Re:If you would RTFA... by jc42 · · Score: 2, Informative

      Yeah, and we might also add that in some circumstances, a bubble sort is a very good way to sort data. There's a lot of data around that is normally sorted into just one (usually time) order, and which is also produced in an order very close to that. For such data, "efficient" sorts are usually very inefficient, and a bubble sort can beat them easily.

      Most theoretical work on sorting has assumed randomly-sorted input data. That's an important case, sure. But there are many situations where it's not a valid assumption. And a sort that's good on random data is not necessarily very good on non-random data.

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
    5. Re:If you would RTFA... by a_ghostwheel · · Score: 3, Informative

      Wrong. Insertion sort (linear or binary) will be efficient way to sort "almost-sorted" data. Plus, commenting phrase "efficient sorts are usually very inefficient" - you have to realize (if you dont know this) - sort algorithms are classified into stable ones (e.g. merge sort) and non-stable (e.g. quick sort).

      Stable algorithms have identical efficiency no matter what kind of order input data had. Non-stable algorithms have predefined best and worst cases.

      But, overall - you will not be able to come up with the data where bubble sort will be best way to sort - usually you will end up using merge or quick sort for large data sets and insertion sort for small data set (some quick-sort implementations use insertion sort during last stages of sorting - when data has been "almost" sorted).

    6. Re:If you would RTFA... by Tassach · · Score: 3, Informative
      Most theoretical work on sorting has assumed randomly-sorted input data
      Bullshit. Every textbook comparison of sort algorithm I've ever seen assumes three cases: nearly-sorted data, random data, and inverse-sorted data. Even if bubblesort were the fastest for nearly-sorted data (Working from memory, I'm pretty sure it would run in O(n) as it's best case), it's still O(n^2) for the other two cases. Quicksort, heapsort, and insertion sort all scale differently; but even assuming their best-case performance is worse than bubblesort's best-case, their worst-case performance is FAR better - typically O(n log(n)) or thereabouts. IIRC, The AWK Programming Language has some excellent sample code which graphs the performance of the major sorting algorithms for different kinds of input.

      I seem to recall that insertion sort is also O(n) on nearly-sorted input, so it would be a much safer choice than bubblesort for the situation you describe. You have to consider best- and worse- case scenerios as well as the nominal path. IMHO, bubble sort has no place outside of an instructional setting.

      --
      Why is it that the proponents of "one nation under God" are so eager to get rid of "liberty and justice for all"?
    7. Re:If you would RTFA... by B'Trey · · Score: 2, Informative

      Well, actually, I'm not speaking only from my experience. I'm also speaking from a number of other published articles and reports. Rational has a few out concerning Purify. Reasoning (who did the MySQL study) has a couple out as well. There have also been independent studies which confirm the (possibly biased) findings of the companies who write the software analysis suites. If you're really interested in the subject, Google is your friend.

      There are no guarantees, in this business or any other. But in general, people who are meticulous in their coding tend to be meticuous in all areas. If they bother to run software analysis tools and correct the bugs there, they usually bother to spend time evaluating the design and looking for bugs there as well. They also tend to test their code once written, which helps to identify those errors in coding logic that a software analysis suite can't find.

      And regardless of whether or not an unitialized pointer is an easy bug to find or not, if it exists in the code it's still likely to cause an application crash. Would you rather run code which has one of those type errors every seventeen hundred lines, or code which has one every eleven thousand one hundred lines?

      --

      "The legitimate powers of government extend only to such acts as are injurious to others." Thomas Jefferson.

    8. Re:If you would RTFA... by Anonymous Coward · · Score: 1, Informative

      > MySQL's brokenness goes deeper than that.

      Nice troll. Those morons claim, among other things, that MySQL is broken because if you define an integer as "NOT NULL" that because MySQL doesn't automatically assume the integer should be NULL, it is a bug. WTF? It's doing exactly what you ask. If you define a column as "NOT NULL," it should never contain a NULL. According to their claim, columns that are defined as NOT NULL should contain a NULL by default! Just because their favorite database contains that bug doesn't mean MySQL should also be broken.

      It also claims that it's a bug that you can refer to enums using the integer index just as you can with associative arrays in many programming languages. Well, duh. That's a feature. They make some unbelievable wild-claims with regard to this feature.

      Those idiots also claim that LENGTH() in MySQL is broken because it ... wait for it ... returns the length of a string! Apparently the database they're used to doesn't have a working LENGTH() function so they think MySQL should return the length of the table's field rather than the length of the data in the field.

      The only valid points they had was the lack of an error when you use invalid integers in an insert or update and the lack of warning when you try to use transactions with tables that don't support them.

      Better luck next time with your troll.

    9. Re:If you would RTFA... by Sxooter · · Score: 2, Informative

      Firebird is closer to Postgresql in capabilities, and closer to MySQL in terms of size (Postgresql is friggin huge, and sucks up disk space quickly, so it's a bad choice for embedded db applications with limited space unless you're willing to do a lot of hacking to make it "lose weight").

      It felt a lot like postgresql to me. I didn't do anything fancy like writing a stored proc or a trigger or something like I've done in Postgresql.

      --

      --- It is not the things we do which we regret the most, but the things which we don't do.
  3. Don't generalize! by Junks+Jerzey · · Score: 3, Informative

    This "proves" that MySQL is better than commercial offerings. Good. A lot of people knew that. Hats off to the developers. But...

    1. This cannot be generalized into a property of all open source projects.
    2. It's more a tribute to the architecture and original core developers of MySQL than anything else.
    3. Realize that even though MySQL is an open source product, MySQL AB is the *company* that organizes and pays for MySQL development. So, again, you can't generalize this into something that covers late night hackers working on personal projects in their basements (the open source geek fantasy).

    MySQL is awesome! But let's be careful about this story, okay? It's the over-generalization that gives OSS/Linux advocates a bad name ("The Gimp is equivalent to Photoshop!").

  4. Re:Six times better? by man_of_mr_e · · Score: 3, Informative

    Sadly, this isn't what most people assume it means. Reasoning's software only finds "obvious" defects, such as null pointer assignments. It doesn't (and can't) determine if a bit of code does what it's supposed to do, only that it does whatever it does without any danger of crashing.

    Basically, it's no different from running your code through BoundsChecker or CodeWizard, or any number of other such tools that check for obvious errors (Null pointers, obvious buffer overflows, dangling references, etc..)

    While I have no doubt that MySQL's code is perhaps "cleaner" than your typical unpublished code, I have plenty of doubt that MySQL's code is "better" than unpublished code in terms of efficiency, logic errors, etc..

  5. Re:Duh! by James+Thompson · · Score: 5, Informative

    Need a particular reason? Take your pick. http://sql-info.de/mysql/gotchas.html

  6. Re:Duh! by pyite · · Score: 5, Informative

    Up until recently, MySQL had no transaction or atomic operation support. As such, you need to write application code to trap problems. Whereas with Oracle, when you run an atomic operation, you know without certainty whether the query failed in its entirety. I also believe stored procedure support is somewhat lacking in MySQL (however, there is that new Java function support). The MySQL 3 tree does not enforce constraints which is something most essential for data integrity. MySQL does not have subrow locking, whereas enterprise databases do. Once again, MySQL is great. I use it. However, it is not enterprise.

    --

    "Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman

  7. Re:Duh! by Frymaster · · Score: 3, Informative
    Take your pick. http://sql-info.de/mysql/gotchas.html

    those are just bugs! what about lack of features?

    • no subqueries
    • no stored procedures
    • no triggers
    • no foreign key constraints
    • no updates on joins

    at least there's row-level locking now... finally.

  8. MS SQL Sybase ASE by Anonymous Coward · · Score: 3, Informative

    MS SQL is basically a revamped Sybase. So, on UNIX & Linux you could use Sybase ASE.

  9. Re:Just wait... by I8TheWorm · · Score: 2, Informative

    Good practices nonetheless, and not really win32 specific. Another fairly good one is The Pragmatic Programmer: From Journeyman to Master by Andrew Hunt, David Thomas.

    --
    Saying Android is a family of phones is akin to saying Linux is a family of PCs.
  10. Re:Duh! by proj_2501 · · Score: 2, Informative

    there are foreign key constraints, but only on certain table types, and only in certain versions, and only on certain column types.

    on mysql 3.x, the table types that support foreign key constraints don't support transactions, and vice versa.

  11. Re:Toy DBMS by kpharmer · · Score: 2, Informative

    > Even today, I don't care about comparing to, say, Oracle or MS SQL Server. IBM DB2 would be a better
    > baseline, but best of all the real competitors: PostgreSQL and Alphora Dataphor.

    I think you've got your dbms' mixed-up:
    Oracle, Informix, and DB2 are all of comparable complexity and power: Oracle's partitioning is the simplest and its clustering the most complex. DB2 & Informix have more complex partitioning - but can scale beowulf-style to hundreds (if not thousands of separate servers).

    SQL Server is less functional than the above servers, though obviously similar to Sybase (due to its heritage).

    Postgresql is less functional than SQL Server - though it's a fine product anyway.

    MySQL is less functional than Postgresql.

    Not aware of any other database that occupies the limited transaction support / limited ANSI support niche that mysql does. MSQL perhaps?

  12. Re:MySQL and Commercial Licenses by Anonymous Coward · · Score: 1, Informative

    They can do it because in order to get a fix/patch/feature into MySQL proper you have to sign your rights over to them...

    MySQL is a very unopensource, open source project.

  13. Code Quality by octogen · · Score: 2, Informative

    In my opinion, there is no substantial difference in code quality between open source software and proprietary software.

    I have seen a lot of very buggy commercial software (including nVidia drivers, IBM's LANManager Services for OS/2, lots of Microsoft's services and utilities in Windows 2000 (for example, "TCP/IP Helper Service") and Netscape 4.7).

    On the other hand, I have also seen very bad code quality in open source products - for example, GTK+ (actually, the really bad thing about GTK+ is primarily its install scripts, makefiles and such). Compiling and installing GTK+ on anything else than on a GNU/Linux-machine is some kind of an adventure, while its commercial counterpart, Qt from trolltech, can be compiled quite easily.

    - I set the PKGCONFIG env variable before running 'configure'. It worked quite well until line 27.000 (or so) in 'configure', where the variable's content was suddenly gone (BTW, I really dislike debugging 28.000+ line shellscripts). I tried to 'configure' with bourne shell and with korn shell 93.

    - It assumes, you have Perl installed (if it's not in your PATH, 'configure' creates funny things like "#! -w" instead of "#!/path/to/perl -w"). The error message produced due to this bug was something like '/usr/bin/env: no such file or directory' - because the perl script was directly started using /usr/bin/env. Kind of confusing %-)

    - 'configure' forgot to add '-fPIC' to CFLAGS, for this reason all shared libraries where broken. I had to add this option manually.

    - Nothing works with 'make'. I had to install 'gmake' (GNU make) instead.

    - The actual source code of the core libraries finally compiled, after I had upgraded to gcc 3.3.2. The source code of the 'demo' programs was totally broken, and gcc refused to compile it - once more I had to change the makefiles manually.

    -----

    One or two weeks later I compiled trolltech's Qt library on the same computer. It was as simple as './configure --platform=platformname && make && make install'.

    Why do I need to debug 28.000+ lines of shellscript-code and a lot of makefiles, why do I need to install gmake, pkgconfig (by the way, pkgconfig and most other things in GTK+ don't work well if you don't install everything to /usr/local, which is the default location) and Perl 5, just to compile some C/C++-Code?

    Qt does mainly the same as GTK+, but it simply compiles, using only shellscripts, 'make' and a C/C++ compiler.

    Another example regarding code maturity (rather maturity than quality, notice the difference :-) is Sun JVM vs. GCJ's libjava. I compiled a very complex multithreaded application using GCJ; it worked fine on uniprocessor machines, but it randomly deadlocked on my multiprocessor server. Finally I found out, that libjava is broken on SMP machines. That doesn't mean, that libjava's code quality is bad; but it still means, that some other Java-Libraries (those of some virtual machines) are more mature, and possibly better tested.

    -----

    Some fundamental things about Software:
    - The more people read the code, the more people can potentially find and fix bugs (good about open source).
    - If a lot of people are allowed to write the code, somebody has to coordinate the work of all these people. Lots of different versions of the same module, written and/or modified by lots of different people need to be combined or coordinated otherwise (bad about most open source projects, because hardly somebody knows, how trustworthy anyone of the developers is; good abous some closed source projects (e.g. Trusted SunOS kernel, IBM SLIC kernel and other trusted code), because only a small group of really good programmers is allowed to write or modify code).

    Conclusion: It's good to have only a small group of 'trusted' developers, who write or modify the code, and then to let everyone else read and verify the code.

    regards,
    octogen

  14. Re:Duh! by proj_2501 · · Score: 2, Informative

    "OMG! And Windows 98 didn't support fast user switching!"

    Your analogy limps. Did most other operating systems support fast user switching in 1998? No, and especially not Windows' biggest competition on the desktop.

    On the contrary, PostgreSQL has had decent foreign key and transaction and subquery support since 1999.

    MySQL STILL doesn't support subqueries in a production version. Foreign keys are only supported by one table type. It doesn't support views. I could go on, but if you really want to see the differences, look at mysql's crash-me comparison chart. The differences that aren't cosmetic, even talking the last MySQL alpha, are pretty annoying.

  15. Re:Duh! by arivanov · · Score: 2, Informative
    Do you even know what a foreign key is, or how it's used?

    Yes I do. And I have revived and made perform to to spec god knows how many cretinous foreign key designs by a combination of

    • removing the foreign key
    • using join for selects to guarantee that only records with valid referential criteria are retrieved. This is equivalent to having a foreign key constraint in the sense that apps do not see any records that do not obey the foreign key contsraint.
    • garbage collector running in a different thread or often different machine that goes around and kills zombies whose referential integrity has been violated.

    The difference between this and a classic foreign key constraint is that this approach always uses efficiently multiple CPUS while a foreignkey is usually a single CPU bound task, it also maintains much less large scope (global or per table) locks and is generally faster for retrieves by a factor of between 10 and 100 times. Due to the TPC vendors have overoptimized join at the expense of many other different things in order to have nice benchmarks..

    And in btw, learn the difference between a "real DBA" and a database designer. I mean the one that is the justification for the 20+% salary difference.

    Cheers (lessons start at 500 per hour),

    --
    Baker's Law: Misery no longer loves company. Nowadays it insists on it
    http://www.sigsegv.cx/
  16. Re:MySQL and Commercial Licenses by mcc · · Score: 2, Informative

    Well, let's see.

    The FSF demands you to sign your righs over because they want to be able to effectively and easily defend the copyrights of all GNU software in court. For example, if GNU software is having its copyright infringed, they want to be able to go right ahead and act with immediate legal authority on that software, rather than having to track down every single contributor to that project-- some of whom may no longer be contactable-- and get permission to proceed with a legal action. They are open about this. They tell you this up front.

    MySQL AB demands that you sign your rights over because they want to be able to take the code you contribute, repackage it as a commercial product, and sell it for their own profit. They are open about this. They tell you this up front.

    While I don't think there's anything necessarily bad about what Mysql is doing, it seems pretty easy to me to state that there's a fundamental difference in "openness" between these two situations.