Slashdot Mirror


MySQL to Counter Oracle's Purchase of InnoDB

Miff writes "Computerworld is reporting that MySQL is hoping to counter Oracle's acquisition of InnoDB by providing its customers with an alternative." From the article: "Axmark said the storage engine is 'pluggable,' meaning other storage engines can be substituted instead. He said the code for InnoDB is under the GPL (General Public License), so 'the code is always out there. It will always be out there.'"

215 comments

  1. It looks bad, but... by lifterx · · Score: 2, Interesting

    I think there's hope for MySQL. With Oracle's products becoming more affordable and the recent purchase of InnoDB it looks bad for MySQL but I think this could be there chance to become more independant of other companies.

    I think this could be the big push that they needed to seriously consider teaming up with the Postgres guys, which could be good news for everyone interested.

    --
    SonicNonsense.com - Random stuff from a bunch of random people.
    1. Re:It looks bad, but... by jadavis · · Score: 1

      I think this could be the big push that they needed to seriously consider teaming up with the Postgres guys, which could be good news for everyone interested.

      What is your plan for that, and what do you expect the result to be?

      --
      Social scientists are inspired by theories; scientists are humbled by facts.
    2. Re:It looks bad, but... by Anonymous Coward · · Score: 0

      Just to throw the cat amongst the pigeons, maybe it's getting near time to re-visit Ingres again.......

  2. InnoDB by damiceious · · Score: 4, Interesting

    So, I'm not an expert on the GPL, but it sounds like we're looking at a fork? Where the OSS community gets to continue to use a public GPL`ed version, and Oracle will develop what they bought? I wouldn't think that Oracle would undermine MySql OS'ness, but the article said without a continuance of their license Mysql Development could slow down, or be setback. Maybe someone can explaint the highlights of the GPL that apply here?

    1. Re:InnoDB by fbg111 · · Score: 5, Interesting

      A few questions come to mind:

      1. Does Oracle need InnoDB? Would Oracle gain features or capabilities they don't already have by incorporating it into their database? If so, then perhaps we're looking at a fork.

      2. If InnoDB is forked, does MySQL have the developer talent to continue advancing InnoDB, or will the OSS community do it for them, or will it stagnate?

      --
      Flying is easy, just throw yourself at the ground and miss. -Douglas Adams
    2. Re:InnoDB by linuxhansl · · Score: 5, Interesting
      The issue is this:
      MySQL makes some of its revenue by selling non-open-source licenses to customers who, for whatever reason, do not widh to publish their contribution.

      Now, you can only release code under any license of your choice if you own all the copyrights.

      Once Oracle owns the copyrights to InnoDB (and if Oracle does not extend the same relicensing rights to MySQL that InnoDB did), the only option MySQL has is redistributing a derived work under the GPL, they are legally no longer allowed to release under any other license. This in turn cuts off one of their revenue streams.

    3. Re:InnoDB by Anonymous Coward · · Score: 2, Insightful

      2. PostgreSQL has a huge lead here over MySQL, that is, as an community-driven open source project. They already have a large distributed network of developers, whereas I understand MySQL has been developed almost entirely by the AB.

      It will take a long time to reach the robustness of the PostgreSQL development project, and I bet a lot of people might switch at this juncture.

    4. Re:InnoDB by Krehbiel · · Score: 1
      1. Does Oracle need InnoDB? Would Oracle gain features or capabilities they don't already have by incorporating it into their database?

      Oracle has all the database technology they could want. I don't think they need any share of a market worth approximately $0. I can only think of a single reason for Oracle to buy InnoDB: MySQL has been encroaching, and they need to slow it down.

    5. Re:InnoDB by Anonymous Coward · · Score: 0

      MySQL makes some of its revenue by selling non-open-source licenses to customers who, for whatever reason, do not widh to publish their contribution.

      It's not just for customers who "don't wish to publish their contribution [to the MySQL software]". The client libraries are GPL also, so even a normal app that accesses MySQL through the MySQL AB client libraries has to be either GPL or buy the commercial license.

    6. Re:InnoDB by RosenSama · · Score: 1

      It could just be I'm dense on this topic, but I thought changing the license of something only affects future releases and doesn't, retroactively, affect prior releases. Couldn't MySQL still fork InnoDB and continue to release their branch dually under open-source and non-open-source licenses?

  3. Silly by jimmyhat3939 · · Score: 5, Interesting
    I think this battle between Oracle and MySql is kind of silly. The two databases serve different purposes:

    • MySql is excellent for anything ranging from the casual user (a few tables, 1000 rows in each) up to fairly complex transactional work (a small or medium-sized company).
    • Oracle has a bunch of extra features, like an excellent fuzzy text search engine and certain optimizers for complex queries that MySql doesn't (and IMHO shouldn't) have. Oracle is the DB of choice for non-M$ medium-to-large databases.

    There are other differences. Setup and configuration of MySql is much simpler, and you don't have to go as crazy creating complex partition schemes on your hard disks to get decent performance. But again, that's as it should be -- for simpler projects you want the free alternative.

    --
    Free 411! 1-800-411-SAVE

    --
    Free Conference Call -- No Spam, High Quality
    1. Re:Silly by Overly+Critical+Guy · · Score: 1

      Why is it silly for projects to compete? Each project tries to cater to as many people as it can. That's how it works.

      --
      "Sufferin' succotash."
    2. Re:Silly by wenzi · · Score: 2, Insightful

      We won't know until we know what Oracle plans to do with InnoDB. Oracle is having trouble competing with MySQL. Look at the last realease that was free to use and deploy. They ( As does MSSQL ) have to compete withh MySQL.

      Oracle could repackage InnDB as a gateway db, with Oracle compatibility to get users hooked on an Oracle product that would later lead to sales.

      IMHO, Oracle does not need the technology, as Oracle has tons of capabilities. There is probably another reason why they bought InnoDB. We won't know what they will do until they actually plan to do.

      --
      -- I doubt, therefore I might be.
    3. Re:Silly by Dual_View · · Score: 2, Informative

      That's the whole point. They're databases, and that's where the common ground stops. "Apples and oranges", is the phrase that comes to mind. They cater to different subsets of the public. Right now, this "competition" of theirs is just a self-imposed illusion.

      Then again, you bring up another good point in and of itself. It's very likely that both MySQL and Oracle are looking at each other and considering how to market to the other's primary segment of clients.

      But also, both databases are open-source. This should make things even more fascinating, unless the open-source community itself starts taking sides.

    4. Re:Silly by einhverfr · · Score: 3, Informative

      FWIW, Oracle has had their competitive sights on MySQL for some time. In 2000 iirc, they started releasing migration path tools to help people move from MySQL to Oracle. It is within this context that I see the InnoDB acquisition taking place.

      (BTW, I don't like Oracle. Any RDBMS that treats an empty string and a NULL as equivalent should be avoided.)

      Oracle and MySQL compete in a number of interesting and important markets, and as we all know, application mandates have a tendency to grow as do datasets. So people see MySQL as a growing RDBMS and use it and sometimes get trapped into it by weird MySQLism's. This keeps people with the RDBMS when it is not even close to meeting real requirements and having to code around the deficiencies.

      Whether the markets overlap and to what extent, Oracle sees MySQL as a strategic threat.

      --

      LedgerSMB: Open source Accounting/ERP
    5. Re:Silly by gtoomey · · Score: 1
      Oracle's revenues are >1000 times Mysql AB.

      When Larry counts his $18 Billion I'm sure he'll call you for advice.

    6. Re:Silly by otisg · · Score: 1

      Oracle has a good fuzzy text search engine? I'm not aware of it, but please point me to it.
      I typically use Lucene when I need to perform full-text and other IR-style searches on a data corpus.

      --
      Simpy
    7. Re:Silly by the_womble · · Score: 1

      Oracle may not be being that silly. It is not that unusual to find Oracle used where its extra features are not needed, these are clients Oracle could lose to MySQL.

    8. Re:Silly by scott_karana · · Score: 2

      As a database newb, may I ask what the problem with not distinguishing empty strings with NULLs is?

    9. Re:Silly by Master+of+Transhuman · · Score: 4, Informative


      An empty string is a value; a NULL is the absence of a value.

      In fact, in relational theory, according to Chris Date (although Codd himself supported the concept to some degree), NULLs shouldn't exist. This is because a table expresses facts - logical expressions - about an entity or a relationship, and a NULL is not a fact, it is the absence of a fact. An entity or relation about which you do not know the relevant facts should not be in a table which expresses facts about that entity or relation.

      NULLS also lead to screwed-up SELECT results sometimes and worse, sometimes you can't detect that the results are screwed up.

      This usually produces a religious war discussion, and I don't know enough to argue the case either way, so I won't say anything more about it. I'll just say that with Codd dead, Chris Date is the main man when it comes to relational theory, as far as I can tell, and he makes a good argument against NULLS.

      Pick up his book "Database in Depth" published by O'Reilly, which is not really a book for newbies, but does have some fairly clear explanations of the issues. It's smaller and cheaper (by about three-plus times - $30 vs $105) than his college textbook on the subject.

      --
      Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
    10. Re:Silly by Anonymous Coward · · Score: 0

      mysql shill - mysql isn't excellent for anything, in comparison to other excellent databases. And mysql isn't really free - go read up on their license, dumbass.

    11. Re:Silly by Baricom · · Score: 3, Informative

      The two value types can have different semantic meanings. Typically, empty strings signify that no value applies for the column, while NULLs signify that the value is unknown.

      Say you have a table with a MiddleName column. Using the rule above, if you store NULL, that means you don't know what the person's middle name is. in contrast, if you store the empty string, it means the person doesn't have a middle name.

      The distinction is controversial, and some database administrators feel that distinguishing between NULL and the empty string adds unnecessary complexity to an application.

      Disclaimer: While I'm confident that I know a little more about databases than the average developer, IANADBA.

    12. Re:Silly by fbg111 · · Score: 5, Insightful

      It's a bit simpler - Oracle is for anyone who knows what data integrity is and requires it, MySQL is for anyone else. PostgreSQL is the free, acceptable alternative to Oracle.

      --
      Flying is easy, just throw yourself at the ground and miss. -Douglas Adams
    13. Re:Silly by jadavis · · Score: 4, Insightful

      MySQL and Oracle compete like public transportation and the automotive industry[1]. They're different concepts used in very different ways, so they are not direct competitors like Ford and Chevy. However, one is still an alternative to the other, so they are indirect competitors.

      In some ways indirect competition is more destructive because it's a positive feedback system. The auto industry takes off in the U.S., so we build lots of infrastructure that make cars more desirable, which means more cars are purchased. However, if one person purchases a Ford that doesn't reduce the desirability of a Chevy to someone else.

      That extends to databases easily. For the sake of argument, let's say MySQL users tend to put more code in the applications, and less logic in the database than Oracle users. Then their application no longer has much need for Oracle, reducing the desirability of Oracle, leading to (perhaps) another application built upon the same database system.

      [1] I am making a very loose analogy. If you're trying to figure out which database I'm calling public transportation, and why, you've already put more thought into this than I have.

      --
      Social scientists are inspired by theories; scientists are humbled by facts.
    14. Re:Silly by Anonymous Coward · · Score: 0

      Hmmm.

      create table t ( c1 varchar(5) );

      Table created.

      insert into t values ( NULL );

      1 row created.

      select count(*) from t where c1 = '';

      COUNT(*)
      ----------
      0

      delete t;

      1 row deleted.

      insert into t values ('');

      1 row created.

      select count(*) from t where c1 is NULL;

      COUNT(*)
      ----------
      1

    15. Re:Silly by Anonymous Coward · · Score: 0

      1000 rows in each



      That's just plain wrong. While MySQL might not work well with complicated queries or large joins and is missing many key features, one feature it does have is the ability to work with huge datasets. That's what the thing was created to handle. In 1998 we had a database that hit Linux's 2Gbyte limit on a Pentium 90. (Aside: we had grants for software but not for hardware which is why we were using a server that at the time was four years-old) It worked just fine until the 2Gbyte file limitation of Linux caused a problem. Several of our nearly 4,000 tables were continuously changing. We were saving real-time telemetry data from a telescope, and we tried Oracle, DB2, and Informix before we finally found MySQL. I had read that Tcx was using it to store over 500 tables each with more than 7 million rows, so we gave it a shot. Admittedly we aren't DB experts and probably didn't squeeze every bit of performance we could out of the other databases, but they just couldn't keep-up. I left that place in 2002, and MySQL was still happily humming along without problem. Your 1k row limit is just wrong.


    16. Re:Silly by Nurgled · · Score: 1

      I tend to use NULL to mean that I know that there is no value. Setting a field so that it allows NULL is like saying "not all things of this type have this property". Setting a field NOT NULL says that all things of this type DO or MUST have this property.

      This doesn't allow for "I don't know", but some would argue that you shouldn't be storing things you don't know about in a database in the first place. :)

    17. Re:Silly by QuietLagoon · · Score: 2, Insightful
      MySQL has been used very successfully on some projects that have Oracle very concerned. With MySQL 5.0's features, Oracle is seeing a competitor that is eating up the market from underneath Oracle's position.

      The key phrase to google is "disruptive technology". Clayton Christensen is the expert on the topic.

    18. Re:Silly by DarkSarin · · Score: 0

      Disclaimer: I am not a dba.

      That said, I have one serious thought about why NULLS should exist.

      In research involving surveys it is very important to code data appropriately. If you give a person a 137 item quesionnaire (which I have done), and they do not respond to an item that is categorically coded, a NULL is the correct input. A zero or even a blank space is not the correct response, since you actually DO NOT have a response. When computing a mean value based on a set of items it important to clarify a NULL, since it means (when handled properly) that the mean should be computed on a different number of items. You cannot do this easily if you have a zero or whitespace.

      The truth is that in some data sets you NEED to represent that absence of fact. NULL is the best way to do this in my opinion.

      Granted, I am not an expert on data storage, but I know a fair bit about research and logically it seems that there are definitely times when you absolutely need to distinguish between a non-response and some response. When you are doing correlational work, this is an important thing to do.

      As an example: I did some research where I was examining the relationship between two variables (call them x and y). X had 119 responses, by y only had 110. The responses were on a scale from 1 to 5. If I were to code a non-response as a 0, this would provide very different results than if I were to code a non-response as a NULL. Whitespace, depending on the tool used to examine the relationship, could either ruin the analysis (yielding a NaN response) or a miscount of the number of responses (thus providing the wrong correlation, since you need the number of responses to calculate a correlation. Thus, in my mind, the proper way to store this data is have a non-response as a NULL.

      Perhaps someone with a better theoretical understanding can provide some reasons why this is not necessary, but I think that sometimes you honestly have an absence of fact that needs to be stored as such.

      --
      "We don't know what we are doing, but we are doing it very carefully,..." Wherry, R.J. Personnel Psychology (1995)
    19. Re:Silly by Anonymous Coward · · Score: 0

      Use -1 for "no response" and add "WHERE Response>=0" to your queries. Removes the NULLs. Some may consider this more "correct" and it doesn't really cause any problems in the data.

    20. Re:Silly by Anonymous Coward · · Score: 1, Interesting

      Having been at one end of the 'launch' it was marketed more as a reaction to the similarly licensed free edition of MS SQL Server (Express Edition) rather than mySQL.
      Probably correctly, they think that people use mySQL because it's open source, rather than just because it's free.

      Specifically, they think they're losing out in 2 ways - firstly to projects that are starting small using SQLServer, and then MS getting the server sale when the project expands (rather than locking them into Oracle early on - and yes, they did compare themselves to drug dealers!). Secondly, they're losing developer mindshare to SQLServer in MS codeshops and education. I'm not sure it would really make much difference there - those guys just want the tight integration of the VisualStudio/.NET tools and probably are not even going to be aware what changing D/B might offer them - you need to know about those cool analytic functions to use them rather than write client side code to do the same, but they're more something you'll use if you're already on Oracle than a deal breaker.

      It's a moot point anyway, everyone knows a database is only a persistence layer and frameworks like Hibernate are going to do away with developers ever needing to know how or what database is used. Knowing nothing about the systems you connect to always results in better software.

    21. Re:Silly by barryvoeten · · Score: 2, Insightful

      Us as technicians may perform this discussion. Parent has a point.

      Now, reality comes in. Get your end users to fill in the forms. Is that done properly ? As far as programs are concerned, they can do that, but, it is not their integrity we need a database for.

      Real-life integrity can only be obtained by estimating the integrity of the weakest part of the chain user-application-db. I suppose it's not the database that is critical in most cases. It leaves you to choose any db you think is reasonable for the job. But remember: garbage in, garbage out.

    22. Re:Silly by Anonymous Coward · · Score: 0

      I'd rather have NULLs as currently offered by available SQL DBMSs, because defining relations in a way that do not need NULLs is above my not-too-bright head. But if I'm given the choice between Codd (advocating different types of NULL - all together at least 14 of them!) and Date ("Thou shalt not need NULL in your database"), I choose Date. It just doesn't look good to have unknown number of types for "No value".
      And of course, an empty string is not NULL, and even Oracle will change the behavour (not an easy task, given that about gazillion aplications might stop working as a result). But the word has been out for eight years not, starting with introduction of version 8.0, that NULL and empty string will be different in the future.

    23. Re:Silly by Eivind+Eklund · · Score: 2, Interesting
      The problem with NULL is that it doesn't fit into binary logic. Both
      0 < NULL
      AND
      0 > NULL
      is false.

      This mess up a bunch of logic.

      For your case above, I'd code the answers something like this:

      CREATE TABLE answers (
      question_id INT UNSIGNED NOT NULL REFERENCES questions(question_id),
      answer INT UNSIGNED NOT NULL,
      user_id INT UNSIGNED REFERENCES users(user_id),
      PRIMARY KEY(question_id, user_id)
      );
      An answer is represented by a row (question_id, user_id) existing, lacking information is represented by the row not existing.

      Eivind.

      --
      Doubting the existence of evolution is like doubting the existence of China: It just shows that you're uninformed.
    24. Re:Silly by rand.srand() · · Score: 1

      When representing the outside world, it seems clear to me that you'd need a way to represent something being logically consistent as missing, unknown, or unknowable. My height is 6 foot, my gender is male, my last gynecological visit is NULL.

      The argument is, perhaps that field should be normalized into a separate table of visit dates which does not contain an entry for me. That is the true NULL, that it isn't even present in the database.

      I think it's more design philosophy than an issue of right/wrong. There are many complex problems that NULL solves quite handily by being able to represent that a relationship is impossible.

    25. Re:Silly by arkanes · · Score: 1

      It's a seperate product from Oracle (the company), not an integral part of Oracle (the database). I've used only very shallowly and other products even less so I can't speak to how well it compares with other stuff out there, but it's certainly more convenient than needing to interoperate with outside applications.

    26. Re:Silly by arkanes · · Score: 2, Insightful

      It may be accurate to say that while it is theoretically possible to design a database such that all data can be represented, and no nulls are present or required, the availability of NULL has real practical value (but Oracle making empty string be the same is still stupid).

    27. Re:Silly by Proteus · · Score: 1

      This concept exists in Perl as well, as undef (an undefined value). This is treated as false, but is different from '0' or the empty string.

      The concept of no data is an important one, and should not be easily dismissed. I have read many arguments about why it shouldn't exist in relational models (needs to be normalized away, or whatnot), but most of the arguments fall down in the face of implementation considerations.

      Let me try an example. Say I have an application that needs to process data in CSV format and store it in a database; imagine these are scientific measurements of time and temperature for an experiment. There is a vast difference between an integral '0' as a temp measurement and a NULL value. One means the temperature was zero degrees, and the other means that no measurement could be made.

      Preserving the fact that no data is supplied for a column is essential, and cannot reasonably be normalized away in such a simple instance without creating unneccesary complexity (e.g. creating two tables that share a key, one with all the times and one with all the correlated measurements). Such a normalization approach also makes it difficult to tell if a measurement is absent because of an error in process (row not inserted, accidentally deleted) as opposed to a measurement that was absent in the source data.

      NULLs are also useful in relationships as an indicator that no relationship exists yet. In general, I think the elimination of NULLs might sound great in theory, but in practice would be unreasonable - it's typically bad to sacrifice simplicity for the sake of theoretical correctness.

      --
      We may not imagine how our lives could be more frustrating and complex—but Congress can. – Cullen Hightower
    28. Re:Silly by Anonymous Coward · · Score: 0

      Having recently seen Hugh Darwen on this subject, the current relational theory stance is that tables should consist of Keys and no more than one other value.
      This eliminates the need for NULLs (instead you have an absent row).

      As with the original relational theory up to 3NF and SQL, making this actually fly is an implementation issue (the underlying implementation could use SQL and NULLs so long as they're hidden from the user).

      In addition, an absence of fact should be stored as a positive statement.
      i.e. if job can be NULL we need to store a separate relation for 'job is unknown' and 'has no job' (and on - i.e. 'refused to give information').

      The resultant queries to join this all back up into a logical entity made blocks of SQL look compact - it's one of those works of theoretical beauty that reminds you of pure OO. (Another area where performance problems are all the fault of the vendors).

      So I think SQL and NULL will be around for a while.

    29. Re:Silly by Anonymous Coward · · Score: 0

      Idiot. Maybe you should educate yourself and read what other smart(er) people has written in the subject. Pick up any book by C.J. Date.

      Read real books! Educate yourself!

    30. Re:Silly by Anonymous Coward · · Score: 0

      It's called Oracle Text. It's fairly powerful, but a bit of a pain to work with.

    31. Re:Silly by dfetter · · Score: 1
      It's a moot point anyway, everyone knows a database is only a persistence layer and frameworks like Hibernate are going to do away with developers ever needing to know how or what database is used. Knowing nothing about the systems you connect to always results in better software.
      I really hope that's sarcasm. While it's possible to do as you describe above, it doesn't scale in any direction: size, performance, or maintainability.
      --
      What part of "A well regulated militia" do you not understand?
    32. Re:Silly by einhverfr · · Score: 1

      My height is 6 foot, my gender is male, my last gynecological visit is NULL.


      There might be nothing wrong with this approach to your data model. After all, you can always search for gender to rule this out. The information is therefore present.

      However, it might make more sense here to break off this column to a separate table so you can track *all* gynecological exams of female patients (or for that matter any primary care visit of any patient). Generally if you have NULL's meaning "not applicable" this is meaning you have optional fields that are best left of the main table. But this is mostly my opinion. Practical considerations may lead to different results (indeed that NULL could be in an outer join view).

      Personally I think that Date makes a fairly unpersuasive argument against ever using NULLs. NULL, if it has exactly one semantic meaning (meaning, "we don't know") is a fairly powerful tool. The larger problem however is that some ambiguity occurs when NULL can mean either "we don't know" or "this doesn't apply." Ambiguity in a database structure means that the information entered may not be entirely retrievable, or at least may show artifacts when retrieved.

      Now, in an outer join, means "we don't know." It may also mean "not applicable" but the RDBMS doesn't know that. It just doesn't know of a row on one side that joins with the row on the other. Nothing more or less. RDBMS's assume limited knowledge in outer joins and assume that you are providing queries that produce meaningful data. Similarly NULL > NULL always returns NULL because if you ask me if a > b but don't tell me what a and b are, I won't know the answer :-)

      I am thinking about writing a whitepaper on NULL's and posting it on my site. It will address Date's arguements (his valid point is that of ambiguity), and my counter arguments.

      --

      LedgerSMB: Open source Accounting/ERP
    33. Re:Silly by einhverfr · · Score: 1

      Having recently seen Hugh Darwen on this subject, the current relational theory stance is that tables should consist of Keys and no more than one other value.

      The problem here is that if you go down this road, data integrity enforcement becomes a real nightmare. Tables are logical units which consist of columns which all should belong together. If you have columns that don't always belong with the table, break them off.

      In other words, treating each column as a relational set creates many more problems than it solves.

      The second issue is that you have storage overhead that becomes problematic, as well as search performance.

      So I think the keys plus a column is a little overboard. Plus even if you go down this road, don't you still have NULL's, at least when you have to do your 43-line outer join to retrieve the data you want? And indeed don't you run into the exact same issues you would with NULL's and current implentations (ambiguous meanings: is that value meaningless in this query or is it merely unknown)?

      This being said... There is a valid point here. Multivalued Dependencies somewhat break the relational model in SQL but not badly. Often times, it can be helpful to break these off into separate tables for various reasons including performance and data management. For example... Lets say I have a CRM database and I want to be able to store more demographics info on cities and states. So I might break the cities and states out of the customer table, set up additional tables for these, and then add demographics info tables so that I can do more complex joins and look for purchasing patterns that might provide insight into appropraite marketing efforts (yes, it is possible to do this without breaking these off, but it is more elegant to do this, IMO).

      --

      LedgerSMB: Open source Accounting/ERP
    34. Re:Silly by fbg111 · · Score: 2, Informative

      But remember: garbage in, garbage out.

      True, but the purpose of the relational database is to prevent the 'garbage in' part. It relies upon the Data Admin or Data Architect (not the DBA, which is different) knowing the data in the problem domain thoroughly enough to design the database and its constraints so that garbage is not accepted, and the only data that goes in does so according to well-considered business rules and to the relational algebra of the model. Putting data integrity constraints in the db means that every app that connects with that db is forced to adhere to the same constraints, so that even if the app developers on one particular app screw up and forget to validate a particular input field correctly, the bad data is still prevented by the DB's constraints. The DB will throw an error and the app will fail on the insert, and good developers will build into their apps try/catch logic and other methods of gracefully handling such errors. The DA and DBA's job responsibility is maintaining and ensuring data integrity, and building the constraints into the DB is the method of assuring that. In a nutshell, there should never be 'garbage in', and hence no 'garbage out' either.

      --
      Flying is easy, just throw yourself at the ground and miss. -Douglas Adams
    35. Re:Silly by Master+of+Transhuman · · Score: 1


      I think you'll find that the issue of correctness is preferred over concerns over storage space and performance.

      The point of the TRM system was to address the latter issues while maintaining correctness of the relational model.

      I'm not clear enough myself on NULLs to determine whether or not they should exist. I suspect not, but I don't yet have the theoretical background to make that determination.

      --
      Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
    36. Re:Silly by Master+of+Transhuman · · Score: 1

      "most of the arguments fall down in the face of implementation considerations"

      Well, correctness is indeed more important than implementation considerations. The point of the TRM system was to deal with implementation issues while preserving the correctness of relational models.

      "typically bad to sacrifice simplicity for the sake of theoretical correctness"

      Wrong. Theoretical correctness frequently leads to simplicity because it deals with all the issues and doesn't let things fall through the cracks - where they then lead to complexities in dealing with them.

      As for NULLs, I don't the relational theory background yet to make a decision. I suspect they are not necessary in any situation, provided that the database design is correct, but I can't prove that yet as I haven't read all the arguments.

      --
      Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
    37. Re:Silly by einhverfr · · Score: 1


      I think you'll find that the issue of correctness is preferred over concerns over storage space and performance.


      Well storage and performance in the keys plus a column model is only the most minor of the issues. The larger one has to do with data constraints, and in particular required v. optional information. Maintaining such a system would be difficult or worse.


      I'm not clear enough myself on NULLs to determine whether or not they should exist. I suspect not, but I don't yet have the theoretical background to make that determination.


      In Cartesian operations, a NULL set is a set with no members. This term was brought into the RDBMS world to indicate a set with no members in an outer join, or a set with no members in a column.

      So from a purely cartesian perspective, I see the keys plus a column in that all you have is a set of information and information for intersecting this with other sets of information. I just don't think it could be used to create workable data models. Indeed higher normal forms (4NF, 5ND BCNF, etc) seek to ensure that you have discrete sets of information with appropriate keys without going to the extreme of breaking every non-key into a separate table.

      In the end the real question is atomicity of the set of information. You can argue mathematically that the keys plus a column approach is "correct" but I would argue that it is not correct in most cases because it does not allow you to build atomic sets of information and easily enforce required information. So while it might be useful as a *learning tool* I don't see it as a reasonable solution for a production database.

      --

      LedgerSMB: Open source Accounting/ERP
    38. Re:Silly by JabberWokky · · Score: 1
      That's a silly comparison. Every upstart that has taken down an established company has started out with far less revenue that the big company. Sometimes they even become big companies themselves (or create enough room for somebody else to sneak in and become the new big company). The trick is knowing which ones to watch.

      --
      Evan

      --
      "$30 for the One True Ring. $10 each additional ring!" -- JRR "Bob" Tolkien
    39. Re:Silly by Daedala · · Score: 1

      Your problem is easily solved. Store temperatures in Kelvins. No 0....

      --
      What I say does not represent the views of my employers, my friends, my cats, or myself.
  4. Patent Threat by G3ckoG33k · · Score: 1

    This may not be about the GPL, but rather about patents. Reading the article:

    "if Oracle holds patents or licenses for the underlying technology such as algorithms or file structures, "then that could get quite interesting,"

    1. Re:Patent Threat by John_Booty · · Score: 1

      "if Oracle holds patents or licenses for the underlying technology such as algorithms or file structures, "then that could get quite interesting,"

      I don't understand how these theoretical patents come into play.

      If Oracle already had patents on technology contained within InnoDB prior to acquiring InnoDB, they didn't need to acquire InnoDB. They could have challenged InnoDB's GPL licensing.

      If InnoDB contains patented technology that Oracle now owns as a result of their acquisition of InnoDB, isn't that a moot point since InnoDB already released that stuff under the GPL?

      Obviously I'm not a lawyer of any sort, but it seems like these theoretical patents are a non-issue. Please elucidate if I'm missing something because I probably am.

      --

      OtakuBooty.com: Smart, funny, sexy nerds.
    2. Re:Patent Threat by joto · · Score: 5, Informative
      If InnoDB contains patented technology that Oracle now owns as a result of their acquisition of InnoDB, isn't that a moot point since InnoDB already released that stuff under the GPL?

      No. Repeat after me: "patents have nothing to do with copyright!". Write it on the chalkboard 100 times...

      There could be patents covering the GPL-licensed code, which InnoDB might not have enforced. Of course, thinking in this way is almost paranoid, but it has happened before, remember GIF?

      No matter how GPL'd the code is, if it violates patents, it is illegal to distribute in countries where that patent is valid. If you doubt me, the text of the GPL license itself spells this out for you. And even if you already have a copy, unless it comes with a patent license, it's illegal to run as well.

    3. Re:Patent Threat by patcito · · Score: 1

      "And even if you already have a copy, unless it comes with a patent license, it's illegal to run as well."

      This is not true, if the code was written by the patent holder under the GPL than it gives YOU, the user, the right to use the software as you wish (according to the GPL), otherwise the patent holder would be violating the GPL.

  5. MySQL needs to build their own storage engine by Java_Good_COBOL_Bad · · Score: 2, Interesting

    It always seemed lame to me that MySQL users had to concern themselves with with "storage engine" gets used under the covers. The obvious answer from MySQL's perspective is to build their own storage engine as an integrated part of MySQL.

    1. Re:MySQL needs to build their own storage engine by jadavis · · Score: 2, Insightful

      The problem is *not* that the storage engines are modular. That's an innovative implementation decision, and it is not inherently bad.

      What causes the problems for some people is the MySQL exposes a different behavior to the user for each storage engine. In other words, the storage engine is not abstracted away from the user, but just the opposite.

      Examples include foreign key and transaction behavior on MyISAM tables (that is, the commands are ignored by a MyISAM table) versus the behavior on an InnoDB table (where the commands function as you might expect).

      --
      Social scientists are inspired by theories; scientists are humbled by facts.
    2. Re:MySQL needs to build their own storage engine by dennisbj · · Score: 1

      If all engines behaved the same you wouldn't need more then one engine.

    3. Re:MySQL needs to build their own storage engine by Master+of+Transhuman · · Score: 3, Insightful


      The problem is that if you are developing a database with multiple tables related to each other (MySQL DOES want to be considered a "relational" database rather than a mere "file handler"), then you have to have referential integrity (or waste time coding referential integrity yourself - I used to do that with FoxPro back in the day, and it's not fun.)

      Most of the MySQL engines don't do referential integrity - which makes them worthless for most "real" database efforts. Only InnoDB enforces foreign key constraints.

      If the current version of InnoDB in MySQL 5.x is under GPL, and MySQL AB can continue to develop their own fork, it may not be that big a problem. But if they can't, due to patent or other IP issues, MySQL is in big trouble. In fact, for any serious uses, they're history, and everybody would be advised to turn to PostgreSQL or Firebird - or even Ingres.

      Firebird in particular is fast and small and would suit the sort of Web applications that MySQL is known for - except that there isn't much support in Firebird for that sort of thing - but it could be added on if people see the need due to MySQL becoming a liability.

      --
      Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
    4. Re:MySQL needs to build their own storage engine by killjoe · · Score: 1

      Why not different storage engines? If you have a table that is rarely updated (like a lookup table) why not use a table type is optimized for reading? If you have a small table of disposable data why not use a in memory table type?

      It's a feature, maybe it's not useful for you but it might be useful for somebody else.

      --
      evil is as evil does
    5. Re:MySQL needs to build their own storage engine by jadavis · · Score: 1

      There are many valid reasons to have a different storage engine that behaves the same way.

      For instance, let's say PostgreSQL offered more than one storage engine. Possible new engines might be:
      (1) A table-stored-in-a-btree storage engine. The whole table would be stored in btree with the primary key as the btree key. Range queries would be much faster in some situations due to sequential access. Of course this method has serious costs and is not the best in most situations, but does have performance benefits in the right situation.
      (2) A storage engine that circumvents OS disk cache, perhaps by operating on a raw partition.

      Both of those are of limited use, but are perfectly valid choices and available in other database systems. The important thing is that they don't change the bahvior at all, only performance.

      --
      Social scientists are inspired by theories; scientists are humbled by facts.
    6. Re:MySQL needs to build their own storage engine by Anonymous Coward · · Score: 0

      Why?! They could just shrink-wrap PostgreSQL, and the world would be a better place for all of us. (Assuming they keep their hands off the code. No bastardizing to be backwards-compatible, please..)

  6. Other storage engines compared by NickDoulas · · Score: 3, Interesting

    Does anyone know the practical difference in using other storage engines? For example, how does using Berkeley DB (http://dev.mysql.com/doc/refman/5.0/en/bdb-storag e-engine.html) compare?

    Also, how typical are non-InnoDB configurations of MySQL?

    1. Re:Other storage engines compared by photon317 · · Score: 1, Flamebait


      Just use PostgreSQL and be happy. As soon as you start having to ask yourself questions about what "storage engine" you're using, or whether your transactions are really coherent, or start thinking maybe you really did need proper modern (OO)RDBMS features, it's time to drop the toy and start using the real option.

      --
      11*43+456^2
    2. Re:Other storage engines compared by einhverfr · · Score: 3, Informative

      The BDB handler has been barely maintained. All the benchmarks I have seen show BDB losing out to pretty much every other storage engine. You get transactions and advanced capability, but at a huge performance cost.

      Use PostgreSQL and you will be happier :-)

      --

      LedgerSMB: Open source Accounting/ERP
    3. Re:Other storage engines compared by Anonymous Coward · · Score: 0

      myISAM is the fastest of the lot but doesn't pass the ACID test ... no transactions so it's a non starter for anything I work with.

      INNODB is quite a bit slower, but actually supports transactions and is fully ACID compliant.

      The other's really don't rate unless you've got some really specialized needs.

    4. Re:Other storage engines compared by Anonymous Coward · · Score: 0

      Although BDB does support transactions after a fashion, it locks all reads (or reads on records effeced by the transaction, I can't remember) while the transaction is in progress. InnoDB implements a transaction isolation system that continues to allow reads.

    5. Re:Other storage engines compared by TheRaven64 · · Score: 4, Interesting

      A lot of things where MySQL is used really don't need a database. What they need is something like VMS's structured files - something slightly more abstract than an arbitrary stream of bytes, but not much. In these cases, SQLite might be a better choice than PostgreSQL, although I'd still recommend PostgreSQL to anyone who actually needs a database.

      --
      I am TheRaven on Soylent News
    6. Re:Other storage engines compared by hughk · · Score: 1
      What VMS had was RMS, which gave you everything from unstructured files all the way through to b-tree based ISAM. The assumption on file types is massive OS dependency, but it means that applications on VMS tend to interoperate well. Unfortunately, no such thing exists on Unix. Sure I can use Berkeley/Sleepycat DB, but which version?

      MySQL is already overkill, but at least it is some kind of standard.

      --
      See my journal, I write things there
    7. Re:Other storage engines compared by hritcu · · Score: 2, Insightful

      The BDB handler could be updated, and I don't think that would be that much of a problem. As for Berkeley-DB itself it is (very) actively developed (http://www.sleepycat.com/) and it is surely more widely deployed and more stable than InnoDB (Berkeley-DB has 200 million deployments compared to only 5 million MySQL deployments). As for the "huge performance cost", I really doubt there is such a thing.

      --
      If you don't fail at least 90 percent of the time, you're not aiming high enough. (Alan Kay)
    8. Re:Other storage engines compared by einhverfr · · Score: 2, Interesting

      As for the "huge performance cost", I really doubt there is such a thing.

      Sure there is. BDB uses page-level locking while Innodb uses snapshot technology.

      This means two things:

      1: BDB has blocking issues in high-concurrency environments and is thus not suitable for a backend for a higher-traffic RDBMS.

      2: You can only support Read Committed transaction level. You cannot support Read Uncommitted, Serializable, and Read Repeatible because you lack the snapshot capability required to make this happen.

      So no, BDB is not a viable alternative due to backend issues. This is why the table handler is not actively maintained-- because BDB, as great as it is, was not designed to be a backend storage mechanism for a high concurrency RDBMS.

      BTW, InnoDB has a table bloat problem (in that it has trouble removing dead tuples) which can be quite serious if you have a lot of updates or deletes. So for some sorts of installations, InnoDB might eventually have more serious performance issues than BDB (imagine PostgreSQL without vacuuming the DB).

      PostgreSQL 8.1 avoids all these issues. The RDBMS is integrated with an autovacuuming capability, avoiding the bloat issues with InnoDB, and it avoids the page locks you have with BDB.

      For those who are wondering, in a page lock, a page of data containing a number of rows of data is locked until the transaction times out. This essentially locks all rows stored near the row that is updated.

      --

      LedgerSMB: Open source Accounting/ERP
    9. Re:Other storage engines compared by hritcu · · Score: 1

      Thanks for the explanations. So, from know on, just PostgreSQL for me, right? :)

      --
      If you don't fail at least 90 percent of the time, you're not aiming high enough. (Alan Kay)
    10. Re:Other storage engines compared by einhverfr · · Score: 1

      Thanks for the explanations. So, from know on, just PostgreSQL for me, right? :)

      I don't know your requirements, but I would *certainly* recommend evaluating it.
        It has become quite easy to use, maintain, etc (which it was not 5 years ago) and has become even more powerful. As with all RDBMS's there is a transitional learning curve, but I have found it to be easier with PostgreSQL than with Firebird due to distributed documentation and high quality help on the email lists.

      --

      LedgerSMB: Open source Accounting/ERP
  7. What is it with technology and cutlery? by jd · · Score: 1, Interesting

    Microsoft wants to knife the baby, OSS projects do fork (though not as often as Microsoft would have you believe), and Slashdot postings invariably have spoon(erism)s. In that case, to produce good, clean technology, all we have to do is build a dishwasher the size of California.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    1. Re:What is it with technology and cutlery? by bersl2 · · Score: 2, Funny

      -1, Sporkless

    2. Re:What is it with technology and cutlery? by Master+of+Transhuman · · Score: 1


      Knives, eh?

      Don't forget this is SLASHdot!

      Oh, man, I can't believe I said that...

      --
      Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
    3. Re:What is it with technology and cutlery? by mikefe · · Score: 1

      Too bad they're not named Microdot...

      --
      There: Something at a specific location.
      Their: Owned by someone.
      Please make sure your english compiles.
    4. Re:What is it with technology and cutlery? by CapnGrunge · · Score: 1

      You just couldn't cut it, huh?

      --
      I see 57005 people
  8. The real problem by einhverfr · · Score: 5, Informative

    MySQL does not provide a transaction-safe store free of licensing overhead. Commercial licensing of BDB, SAP-DB, and InnoDB all require relicensing agreements.

    These being availble for use under the GPL and similar licenses helps out everyone who uses MySQL under the GPL. But it doesn't help anyone else out, including MySQL. What MySQL needs is the ability to add something like MVCC to a table type that they own. Oh wait that will never work because MyISAM should be pretty much at odds with the whole concept. I guess it is time to build one from scratch.

    So the inevitable outcome is that MySQL will probably have to write a storage engine from scratch that meets all the needs that InnoDB filled.

    --

    LedgerSMB: Open source Accounting/ERP
    1. Re:The real problem by einhverfr · · Score: 2, Insightful

      Exactly. What do you think *I* use? Hint: It starts with a P.....

      This being said, I think that this discussion is important because it helps clear up *why* this acquisition is so bad for MySQL.

      There are some other things that bother me about this response:

      1) It took so long to occur.
      2) No specific details. So this is entirely vaporware as far as I can see.
      3) There are *no* alternatives to InnoDB on MySQL (BDB has performance issues, etc).

      --

      LedgerSMB: Open source Accounting/ERP
    2. Re:The real problem by ComputerizedYoga · · Score: 2, Interesting

      technically superior, sure.

      But ... umm .... what's the word I'm looking for?

      Oh yeah. Slow.

      MySQL's got three big great things going for it: raw disgusting speed, relative simplicity to set up and administer, and the whole dual-license thing. Oh, and with innodb and with the 5.0 release, very nearly everything in the "technically superior" category (as far as most people are concerned) is covered. A lot of the things people bash mysql for are really complaints about the shortcomings of MyISAM instead. Oh, and don't forget that SO many websites and open source tools are built on top of the LAMP stack... and people wonder why it's talked about so much? Really, what's not to like?

      Postgresql has relative simplicity to set up and maintain, but I don't think is quite as straightforward as mysql. And, as far as I've been able to tell, the speed just isn't there. We all love BSD licensing, and it still has some technical strengths over mysql, but the biggest issue is still speed, and all the benches I've seen still put mysql ahead on that one. (Besides, who's ever heard of the "LAPP" stack, and does it have something to do with strippers?)

      Oracle, on the other hand, is already pretty fast and scalable, but non-free, non-f/oss, and generally an order of magnitude or two more complex to set up and maintain. In an organization large enough to have a dedicated database administrator, it's great, but it's not really suitable to be the database behind something like my weblog, or behind a small business's basic website.

    3. Re:The real problem by jsonn · · Score: 3, Interesting

      Update your facts. There are exactly two areas where MySQL is noticable faster than PostgreSQL. Those are connection time and non-transactional queries. If you depend on the former -- rewrite your programs, they are bogus. If you depend on the second, pray that the load will not increase enough to kill your database either by inconsistencies or locking. As soon as it comes to complex queries, PostgreSQL runs circles around MySQL.

      MySQL 5 is still lacking a lot, most importantly in the I of ACID. It does not ensure integrety, it does things behind the scene which do not match the SQL commands. Silently trunkating data is evil, just like creating invalid dates. No excuses, MySQL just sucks in that department. Besides, stuff like the query cache don't belong into the database server. They belong into the middleware, since that is exactly where the necessary context is.

      Last but not least, I do consider PostgreSQL to be much easier to administrate. I have central authentication support, I don't need magic commands to update the internal state of the server etc. It just works.

      The LAMP argument is useless. Neither is PHP often a good choice for good web programming, nor is Apache necessarily the best web server. Just because a lot of people ask for it, doesn't make it good. You know the most popular Operating System out there, don't you?

    4. Re:The real problem by TheRaven64 · · Score: 1

      I'm using an OLPP stack, (FreeBSD, Lighttpd, PostgreSQL, PHP), but I have to admit that it doesn't have quite the same ring as LAMP...

      --
      I am TheRaven on Soylent News
    5. Re:The real problem by rbanffy · · Score: 3, Interesting

      The single biggest problem with the MP part of the LAMP stack is that both PHP and MySQL attract the wrong kind of developers.

      The developers who are willing to live with the shortcomings of PHP and MySQL should consider if they really want to develop software. They are just like the people who are willing to live with the shortcomings of VB6 and Jet databases - they live with them because they know nothing else. While the query language of MySQL has improved with the latest releases, it is still not quite on par with, say, PostgreSQL. Do a somewhat complex join and you will see MySQL's speed go down the drain. See what happens when you have lots of concurrent long-running transactions.

      MySQL screams "cheap" since its beginning and no amount of engineering will make it look well built. It may look "overbuilt", at best.

    6. Re:The real problem by Anonymous Coward · · Score: 0

      mySQL is fast because it doesn't do much. Thats nothing to gloat about.

      It's not even close to Oracle, MS SQL Server or PostgresSQL in terms of functionality.

    7. Re:The real problem by Anonymous Coward · · Score: 0

      How does FreeBSD translate to an "O"? Do you mean OpenBSD?

      How would FAPP sound (FreeBSD, Apache, PostgreSQL, Python)?

    8. Re:The real problem by laffer1 · · Score: 1

      I agree with you aside from the administration end. mysql is much more consolidated on disk and much easier to administer using their tools and adding phpMyAdmin. I think postgreSQL is very attractive to oracle devlopers or people who have prior experience with oracle. mysql was easy for me to learn coming from a sql server 7/2000 background. As for performance, mysql 5.0.16 seems much slower than 4.1.x did. I upgraded to mysql 5.0 a few weeks ago and noticed that it killed performance on my php sites. I use a product called e107 for cms duty and while the software is written poorly, it also is noticably slower. Then again, most sane people wouldn't write code that generates 117 queries on one page using 2 database connections. It also seemed to toast file uploads. It does seem more consistent on my blogging site though. (java)

      mysql may yet have life in it though. Many operating systems still can't run oracle. mysql and postgresql are the two most likely alternatives. Sadly there are days i miss sql server.

    9. Re:The real problem by jsonn · · Score: 1

      Try PgAccess. IMO much nicer than phpMyAdmin. It also speaks for itself, that even for easy stuff like adding a user, almost noone uses the mysql internal tools. Depending on the needs, it can be as simple as createdb + createuser in PostgreSQL, e.g. when using PAM or ident for authentication and one database per user.

    10. Re:The real problem by Anonymous Coward · · Score: 1, Informative

      Actually, in mysql 5.x you can run the DB in strict mode, which will eliminate the majority of quirks regarding truncating / altering your data. This (to me) was the biggest shortcoming mysql had, and its finally fixed - the 'I' in ACID is thus fulfilled.

    11. Re:The real problem by LDoggg_ · · Score: 2, Insightful

      That's an interesting take, but a little unfair.

      Many people use LAMP because you can get a website hosted for next to nothing using it.
      Most of the really cheap places I've seen offer either win32/asp/IIS/sqlserver or LAMP.

      There are plenty of things I don't like about mysql 4.x (no sequences, no subselects, etc.) and plenty of things I don't like about php 4.x (lack of type hinting on primitives, no standard DB connectivity layer, etc.). But the workarounds aren't really that bad.

      I could be much more productive with J2EE and oracle, but I'm not gonna get someone to host my sites for just a few bucks a month.

      --

      "If they have both, tell them we use Linux. And if they have that, tell them the computers are down." -Dave Chapelle
    12. Re:The real problem by llefler · · Score: 2, Informative

      MySQL's got three big great things going for it: raw disgusting speed, relative simplicity to set up and administer, and the whole dual-license thing.

      From my perspective, MySQL has two things over Postgres. MySQLAdministrator and they don't do silly things with case sensitivity.

      OTOH, they only have one advantage over Firebird, MySQLAdministrator. I keep hoping that the guys over at "Firebird - Relational Database for the New Millenium" will wake up and get some good tools out there. If FlameRobin ever matures, I'll wipe MySQL from my systems in a heartbeat.

      What's not to like? The job of the database it to maintain data integrity. MySQL falls down in a number of ways. Who cares what license it has if your data isn't safe.

      Speed has never been an issue for me, but then, I'm not trying to use my workstation from two generations ago as a database server.

      --
      It is amazing what you can accomplish if you do not care who gets the credit. -- Harry Truman
    13. Re:The real problem by rbanffy · · Score: 1

      Sorry. I was unfair ;-)

      The cheap hosting options escaped me completely. There are people who are willing to live with PHP and MySQL as far as they don't have to pay too much for it.

      All things equal, I prefer either JSF/JSTL/JSP or Zope/Plone. I know what you mean.

      Sorry for the collateral damage

    14. Re:The real problem by jadavis · · Score: 1

      From my perspective, MySQL has two things over Postgres. MySQLAdministrator and they don't do silly things with case sensitivity.

      Can you elaborate on the case sensitivity issue? As far as I can tell the only way PostgreSQL deviates from the standard is by folding to lower case rather than upper case because, well, lower case looks more readable.

      MySQL, from what I can tell, has an entirely new backtick ("`") operator. I don't even really know what it does, but to me, that seems silly.

      --
      Social scientists are inspired by theories; scientists are humbled by facts.
    15. Re:The real problem by hritcu · · Score: 1

      BDB has performance issues

      Could you explain what these issues are and why are they not fixed/fixable?

      --
      If you don't fail at least 90 percent of the time, you're not aiming high enough. (Alan Kay)
    16. Re:The real problem by einhverfr · · Score: 1

      Innodb and BDB use different mechanisms to ensure that transaction data is updated atomically. Innodb uses MVCC (though it has a bloat problem here which can become very severe), while BDB uses page-level locking. In page level locking, all the rows in a data page to be updated are locked while the transaction is in process.

      More info at http://www.developer.com/db/article.php/2235521

      --

      LedgerSMB: Open source Accounting/ERP
    17. Re:The real problem by tntguy · · Score: 1

      How would FAPP sound

      Like the sound of one hand clapping!

    18. Re:The real problem by Anonymous Coward · · Score: 0

      > I'm using an OLPP stack, (FreeBSD, Lighttpd, PostgreSQL, PHP), but I have to admit that it doesn't have quite the same ring as LAMP...

      Just rearrange the letters a bit... PLOP!

    19. Re:The real problem by Anonymous Coward · · Score: 0

      > ...they don't do silly things with case sensitivity.

      Really?

      mysql> CREATE TABLE foo (a VARCHAR(20));

      mysql> INSERT INTO foo(a) VALUES ('foobar');

      mysql> SELECT COUNT(*) FROM foo WHERE a = 'FooBar';
      +----------+
      | COUNT(*) |
      +----------+
      |        1 |
      +----------+

    20. Re:The real problem by einhverfr · · Score: 0

      Actually, in mysql 5.x you can run the DB in strict mode, which will eliminate the majority of quirks regarding truncating / altering your data. This (to me) was the biggest shortcoming mysql had, and its finally fixed - the 'I' in ACID is thus fulfilled.

      A strict mode that can be turned off by the application does not guarantee integrity, therefore the I is as yet unfulfilled. If there was an option to guarantee strict mode and prevent it from being turned off by the client I would agree with you.

      --

      LedgerSMB: Open source Accounting/ERP
    21. Re:The real problem by Anonymous Coward · · Score: 0

      FLPR knocks LAMP/FAPP out of the water.

    22. Re:The real problem by dr.+greenthumb · · Score: 1

      Read the docs. You can configure how MySQL deals with case in my.ini. Case insensitivity is default on Win32 platforms though.

    23. Re:The real problem by dr.+greenthumb · · Score: 1


      MySQL, from what I can tell, has an entirely new backtick ("`") operator. I don't even really know what it does, but to me, that seems silly.


      If you don't know what it means, please research it before you rush to the conclusion that "it's silly". It's not an operator, it's merely the delimiter MySQL uses for identifiers - and it's not even mandatory in most situations.

    24. Re:The real problem by einhverfr · · Score: 1

      *shrugs*

      PHP is great for some things. It is a really great preprocessor. Too bad too many PHP dev's don't get the "it's a preprocessor" thing.

      For example, I can create a config template and use PHP to preprocess it based on certain inputs. I have used this to create configuration management scripts before. But in general, PHP is pretty piss poor for about anything other than preprocessing text files (including postscript,

      Speaking as a PHP/Perl/Python developer who can code in C when he has to.

      --

      LedgerSMB: Open source Accounting/ERP
    25. Re:The real problem by einhverfr · · Score: 1

      From my perspective, MySQL has two things over Postgres. MySQLAdministrator and they don't do silly things with case sensitivity.

      PgAdmin III is a really nice tool if you want a GUI-based tool.

      As for case sensitivity.... What on earth do you mean it doesn't do silly things with case sensitivity? PostgreSQL is almost standards-compliant here by default (except that case is folded to lower rather than upper for unquoted identifiers) while MySQL is anything but that.

      As for this comment:

      What's not to like? The job of the database it to maintain data integrity. MySQL falls down in a number of ways. Who cares what license it has if your data isn't safe.


      You hit the nail on the head. Even with strict mode, if an app can turn it off, your data is not safe...

      --

      LedgerSMB: Open source Accounting/ERP
    26. Re:The real problem by einhverfr · · Score: 1

      You mean like the ISO use of the double quote (")? Why use backticks if there is a standards-based alternative?

      --

      LedgerSMB: Open source Accounting/ERP
    27. Re:The real problem by jadavis · · Score: 2, Interesting

      I meant that I didn't know how it differs from the standard double-quote. Backtick seems to do the same thing as what the standard says a double-quote should do, so why don't they use the standard?

      That's what seemed silly to me. I'd be interested to hear the reason for the use of backticks versus double-quote.

      --
      Social scientists are inspired by theories; scientists are humbled by facts.
    28. Re:The real problem by llefler · · Score: 1

      As for case sensitivity.... What on earth do you mean it doesn't do silly things with case sensitivity? PostgreSQL is almost standards-compliant here by default (except that case is folded to lower rather than upper for unquoted identifiers) while MySQL is anything but that.

      Postgres converts the SQL to lower case, but not the schema or data, and then does a case sensitive compare. For example, I have an application that accesses a table called CustPackingSlipJour. I suppose we could change it to CUSTPACKINGSLIPJOUR, but that's just abusive to the people who have to read it.

      MSSQL, Firebird, MySQL will all do a Select * from CustPackingSlipJour. Postgres assumes that what I gave it is wrong, and lower cases everything UNLESS I put it in quotes. I would prefer that it assumed I knew what I wanted. I knew I wanted it to be case INsensitive when I installed it.

      --
      It is amazing what you can accomplish if you do not care who gets the credit. -- Harry Truman
    29. Re:The real problem by llefler · · Score: 1

      Can you elaborate on the case sensitivity issue? As far as I can tell the only way PostgreSQL deviates from the standard is by folding to lower case rather than upper case because, well, lower case looks more readable.


      Sure. Postgres (at least in v7) configured to not be case sensitive will lower case the SQL, but not the schema or data. It does the most annoying thing any software can do, it assumes it has to correct the users imput, and chooses wrong.

      Suppose you want your table named CustPackingSlipJour instead of CUSTPACKINGSLIPJOUR (or custpackingslipjour in Postgres' case) because it's easier for the developers to read. If you do a Select * from CustPackingSlipJour on MSSQL, Firebird, or MySQL you will get a result set containing all the rows of the table. Because when you tell them to be case insensitive they are. With Postgres you will not unless you change your query to Select * from "CustPackingSlipJour".

      I tried it on a couple different versions on Windows and Linux, was told that was just the way it worked, and moved on.

      --
      It is amazing what you can accomplish if you do not care who gets the credit. -- Harry Truman
    30. Re:The real problem by einhverfr · · Score: 1


      Postgres converts the SQL to lower case, but not the schema or data, and then does a case sensitive compare. For example, I have an application that accesses a table called CustPackingSlipJour. I suppose we could change it to CUSTPACKINGSLIPJOUR, but that's just abusive to the people who have to read it.


      The ISO standards specify behavior here. Firebird is actually correct in that it will convert CustPackingSlipJour to CUSTPACKINGSLIPJOUR by folding unquoted identifiers to upper case as the standard requires.

      PostgreSQL breaks the standard somewhat in that it folds to lower case as you point out.

      MySQL's case sensitive comparison of identifiers is however anything but standard.

      On other words, the standard says that the following identifiers are equivalent:

      "FOO", Foo, FOO, foo

      PostgreSQL treats the following identifiers as equivalent:
      "foo", Foo, FOO, foo

      MySQL treats the following identifiers as equivalent:
      "Foo", Foo

      In most cases PostgreSQL's behavior won't break ANSI compliant code while MySQL's is more likely to do so.

      MySQL's case handling of identifiers is far more screwed up than PostgreSQL's but you are correct in that Firebird (last time I used it ~ 1.0) obeyed the standard. I am one of those who wants to see PostgreSQL obey the standard too and am working on ways to help make that possible as a runtime option (many people find upper case identifiers to be annoying).

      If you want case sensitive identifiers, double quote them as the standard states.

      --

      LedgerSMB: Open source Accounting/ERP
    31. Re:The real problem by llefler · · Score: 1

      PostgreSQL treats the following identifiers as equivalent:
      "foo", Foo, FOO, foo


      This wasn't my experience.

      If you want case sensitive identifiers, double quote them as the standard states.

      This is the problem, I did NOT want case sensitive operations. According to the documentation I had, the systems that I was working with were not configured to be case sensitive. If I configure a system to be case INsensitive when I install it, I want it to be case INsensitive. That doesn't mean lower case the SQL and then to a case sensitive compare with the schema and data. Which is the behavior I experienced.

      --
      It is amazing what you can accomplish if you do not care who gets the credit. -- Harry Truman
    32. Re:The real problem by jadavis · · Score: 1

      Isn't PostgreSQL just following the standard?

      --
      Social scientists are inspired by theories; scientists are humbled by facts.
    33. Re:The real problem by einhverfr · · Score: 1

      I think you are missing the behavior specified by the standard:

      If I say:

      SELECT * FROM MyTable;

      the standard says that this should be equivalent to:
      SELECT * FROM "MYTABLE";

      In other words, if you have a table called "MyTable" it won't select from it. Last I checked Firebird did this.

      In PostgreSQL you have the (slightly non-standard) behavior of making
      SELECT * FROM MyTable;

      equivalent to:

      SELECT * FROM "mytable";

      In most cases, if you *never* quote, you will be fine in all cases. If you quote sometimes, then you can have problems.

      In MySQL you have the (more broken) behavior of:

      SELECT * FROM MyTable;
      being equivalent to
      SELECT * FROM "MyTable";

      Do you see what I am saying? The standard says to fold unquoted identifiers to *upper case* and then make a case sensitive comparison. PostgreSQL folds to *lower case* which produces the same results in 99% of all comparisons (I have been a fan of getting PostgreSQL to support folding to upper case, but this has not been popular). The only one that will be broken wrt to unquoted identifiers will be MySQL.

      Does this make more sense?

      --

      LedgerSMB: Open source Accounting/ERP
    34. Re:The real problem by llefler · · Score: 1

      No, what you are missing is that when other databases have an option to install in a non case sensitive manner, they are NOT CASE SENSITIVE. It doesn't matter what the spec says when you select an option that overrides the standard. If I remember correctly, Firebird is not case sensitive at all. At the very least, it is not that way by default because Interbase was not. MSSQL and MySQL both have options to turn off case sensitivity, and when you do they are NOT CASE SENSITIVE. That means 'table' is the same as 'TABLE' is the same as 'TaBlE'. The Postgres system I was using was represented as being configured to not be case sensitive, but in fact either the option does not exist or does not work. It has been a year or two, so I no longer remember the details of which it turned out to be. It was not possible to turn off this 'feature'. For compatibility reasons, I do NOT want case sensitivity. And Postgres' behavior is unsuitable for ANY project I have worked on in the last 10 years. Thus, I no longer attempt to use it. For the present, I have grudgingly been using MySQL where license cost is any issue, but will probably take another look at FlameRobin to see if it's mature enough to make Firebird an option.

      Actually, from doing a quick search, it appears that Postgres does not have a case INsensitive mode of install. Which is probably what I ended up learning before, and that the installs that I had were mis-represented. For the projects that I work on, case sensitivity causes considerable compatibility issues or makes using Postgres entirely impossible. Whether that is per standard or not, it is a fact of business.

      --
      It is amazing what you can accomplish if you do not care who gets the credit. -- Harry Truman
    35. Re:The real problem by Ryan+Amos · · Score: 1

      There's nothing wrong with PHP; in recent versions it has become a much more mature as a language. It works very well for what it does (web apps; yes they can be done in perl or C, but PHP was designed with web app designer's needs in mind.)

      I agree with you about MySQL though. Postgres has been superior for a long, long time. MySQL is fine for small projects, but for anything you want to eventually get "big" Postgres is the way to go. MySQL doesn't scream "cheap" to me so much as it screams "hobbyist."

  9. Not silly at all by systems · · Score: 1

    From my readings in the relational data model and RDBMS theory, I learned that SQL is supposed to be a declarative lanuguage, which means, you tell the DBMS engine what you want done, and not how?
    Then the DBMS will use the path which is most optimized for the request.
    Saying that a certain engine is optimized for some types of task, goes against the theory of relational databases, a DBMS can and should be everything for everyone.

    So, the fact that MySql is better for some tasks than Oracle, is a flaw in both engines, and supposedly one that both are working to fix.

    Just to give an example, since the user, doesn't see where and how his data is stored, Oracle for example, can use MySql's engine internally, when it detect situations for which MySql's engine is optimized, and switch to Oracle's engine, when the situation change, all hidden from the user, of course this is all in theory. It can be factually impossible to implement such a thing.

    1. Re:Not silly at all by jimmyhat3939 · · Score: 3, Interesting
      It turns out that configuration is a difficult task and so there's no way to obfuscate that from the DBA.

      Also, there exist plenty of situations where there are absolute tradeoffs. Making something fast in one case makes it slow in another. While it would be nice for the DB to be able to figure all that out beforehand, in practice it's impossible.

      Take a case where a bunch of precomputation is required to make an operation fast (a particular kind of indexing, for example). You have to instruct the DB to do that precomputation. It can't know in advance that you'll be doing a lot of queries that require it.

      Bottom line is that, though the goal of SQL is to make the "how" hidden from the user, in practice this is impossible and not even desirable.

      --
      Free Conference Call -- No Spam, High Quality
    2. Re:Not silly at all by Master+of+Transhuman · · Score: 4, Insightful


      Actually, neither one of them implicates relational theory properly. Aside from the InnoDB engine, most of MySQL is pathetically incomplete. Without InnoDB, MySQL is worthless with regard to referential integrity, which is a showstopper for any database that requires multiple tables related to each other.

      If the company building the Trans-Relational database ever gets off the ground (or failing that, goes open source), perhaps both of them (along with Sybase, SQL Server, Informix, and the OOP DBMSs) will be put out to pasture. The claimed capabilities of that system, implementing a very relationally complete system, would bury even Oracle eventually, if not immediately.

      Anybody have any real background info on why the company developing the Transrelational system is having legal and/or financial trouble? Nothing concrete appears to be available on the DbDebunk site or via Google. The whole thing appears to have been hanging fire for a long time.

      --
      Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
    3. Re:Not silly at all by StrawberryFrog · · Score: 2, Interesting

      database that requires multiple tables related to each other.

      That means, "any database" pretty much.

      The claimed capabilities of that system, implementing a very relationally complete system, would bury even Oracle eventually, if not immediately

      I don't think anything short of global nuclear war could "bury Oracle immediately". The installed base of very big DBs on Oracle is just too large.

      If the company building the Trans-Relational database ever gets off the ground...

      Does anyone have links or insight on this "Trans-Relational database" that I've nevre heared of, why it's so much better at "relational theory" than Oracle, why this is such a winner?

      --

      My Karma: ran over your Dogma
      StrawberryFrog

    4. Re:Not silly at all by Imsdal · · Score: 1
      Bottom line is that, though the goal of SQL is to make the "how" hidden from the user, in practice this is impossible and not even desirable.

      If the user is the end user, ie the guy who accesses the DB via, say, a wizard generated report in Crystal Reports, then SQL is actually very good at hiding the "how" from the user. On the other hand, most users shouldn't think about or use SQL at all. Instead they would be more helped by using a Business Objects type solution, where they are given "objects" and a "universe" (an intelligent converter) translates the usage of objects into actual SQL.

      If the user is the DBA or a developer of a custom application, you are completely correct.

    5. Re:Not silly at all by Anonymous Coward · · Score: 1, Informative

      I'm not sure that anywhere in the theory of relational databases (?? not sure how that relates to relational theory) it says anything about optimisation or performance of database engines. Much as the Java standard doesn't mean that all JVM perform in the same way, or compilers all produce identical machine code from the same high level language.

      The whole point of relational theory is more based around data modelling and normalization - providing a mathematically proven set of guidelines to remove duplication and ensure integrity.

      SQL was the first attempt to build a working language based on relational theory - i.e. that would be useful for accessing such a model - think of it as a proof of concept. Unfortunately, by becoming the defacto standard, it's become a bit like Pascal (and BASIC) was in the 80s/early 90s (a language popular despite it's flaws rather than because of it's strengths).

      It also took a long time for anyone to make a performant RDBMS that used SQL - Oracle got big for a reason. Even then it became rapidly evident that procedural extensions were required on the server, rather than the 'declarative on DBMS, procedural control on the client' model that was touted in theory. (Even if SQL worked perfectly this would still be inefficient).

      As for whether RDBMS should optimise the same SQL in a consistent way - it's worth considering the different ways a single statement gets optimized on Oracle. The same SQL will create a different optimisation path if you tell the optimiser you want first rows (optimize for response time) or all rows (optimize for throughput). In addition specifying that you want 'first 20 rows' rather than 'optimise for first rows' and then only querying 20 from your client will ALSO give a slightly different optimisation. And that's without going into streaming queries, etc.

      The point is that this is completely outside the scope of the SQL language definition (which IS declarative - what do you want, not how it should be done) but to work effectively the optimiser needs some pragmatic information about what you want to do.

      One of the key differences between mySQL and Oracle is that Oracle is optimised as a highly concurrent transactional database - which is overkill for 95% of the situations mySQL is used for. However this is again the sort of outside contextual information that keeps us humans employed. Much like knowing when to write something in perl and when to use C.

      All that said, having worked with navigational databases, SQL was/is a definite step forward, and removes a LOT of the work for most programmers in terms of needing to know how data is stored.

    6. Re:Not silly at all by Anonymous Coward · · Score: 0

      "Saying that a certain engine is optimized for some types of task, goes against the theory of relational databases, a DBMS can and should be everything for everyone.

      So, the fact that MySql is better for some tasks than Oracle, is a flaw in both engines, and supposedly one that both are working to fix.

      Things aren't as easy as that. A database that chooses the 'optimal' way to execute a query must, by nature, study the queries done on it in order to be able to optimize the way it does its queries before actually embarking on doing the work to do the query. A database that just goes ahead and does 'the natural thing to do' will beat such an optimizing database in all cases where 'the natural thing' is the optimal thing to do (probably by a small margin)

    7. Re:Not silly at all by Anonymous Coward · · Score: 0

      Does anyone have links or insight on this "Trans-Relational database" that I've nevre heared of, why it's so much better at "relational theory" than Oracle, why this is such a winner?

      If only the Internet provided some way to look things like that up!

    8. Re:Not silly at all by joto · · Score: 2, Insightful
      From my readings in the relational data model and RDBMS theory, I learned that SQL is supposed to be a declarative lanuguage, which means, you tell the DBMS engine what you want done, and not how?

      Yes, that's RDBMS theory.

      Then the DBMS will use the path which is most optimized for the request.

      In the real world, there is no such thing as "most optimized". Optimized for what? Reliability? Portability? Correctness? Features? Code maintenance? Speed? Memory usage? Disk usage? Throughput? Interactivity? Uptime? Least sysadmin work? I could probably go on, but I think you get the picture...

      So, the fact that MySql is better for some tasks than Oracle, is a flaw in both engines, and supposedly one that both are working to fix.

      Yes, but that doesn't mean that it's ever going to be fixed in either of the engines. The fact that buses have more space, and bicycles are easier to park, doesn't mean that one day we will have something that combines the advantages of both.

      Just to give an example, since the user, doesn't see where and how his data is stored, Oracle for example, can use MySql's engine internally, when it detect situations for which MySql's engine is optimized, and switch to Oracle's engine, when the situation change, all hidden from the user, of course this is all in theory. It can be factually impossible to implement such a thing.

      Just like it's factually impossible for the bus to use the bicycles internal engine when you decide it's time to find a parking spot.

    9. Re:Not silly at all by MattW · · Score: 3, Insightful


      Actually, neither one of them implicates relational theory properly. Aside from the InnoDB engine, most of MySQL is pathetically incomplete. Without InnoDB, MySQL is worthless with regard to referential integrity, which is a showstopper for any database that requires multiple tables related to each other.


      As the companies using this pathetic database have noticed, 99.9999% works just fine; especially when your application is aware that you don't have a foreign key constraint and yes, the data may be munged 1 time in 1 billion and need to be cleaned.

    10. Re:Not silly at all by Anonymous Coward · · Score: 0


      You can start here.



      It is not a new relational theory, but it is (claimed) to be a proper implementation of relational theory. Oracale (and all other DBMS vendors) implements a bastardization of the relational theory.

    11. Re:Not silly at all by Anonymous Coward · · Score: 0

      the data may be munged 1 time in 1 billion and need to be cleaned.

      I absolutely guarantee that virtually any database in production use without database-enforced consistency constraints has consistency errors. That would be more like 1 time in 1, not 1 billion. In MySQL's case, it's not 99.99999%, it's more like 10% is considered good enough by too many people who just don't know better.

    12. Re:Not silly at all by fbg111 · · Score: 1

      Anybody have any real background info on why the company developing the Transrelational

      No, but Date sure tortur, er teases his readers about it quite a bit in his writings. What's the company's name, and are you sure it is even a company and not just an ad-hoc group of computer scientists? I've just assumed the delay was due to figuring out the finer details, as I suspect it's not as simple to design and implement as Date may have alluded. Would certainly be cool if they open-sourced the TR concepts, though...

      --
      Flying is easy, just throw yourself at the ground and miss. -Douglas Adams
    13. Re:Not silly at all by Master+of+Transhuman · · Score: 1

      Date and Fabian Pascal have been fairly clear that it's an actual company - they've discussed the owner's name, which I don't recall, I'd have to look it up on my hard drive somewhere [Steve Tarin, apparently, is the inventor and owner, I've just looked it up].

      According to Pascal, Date has seen a working implementation of the TRM, and is writing a book about it tentatively entitled "Go Faster! The Transrelational Approach to DBMS Implementation."

      The company name is Required Technologies Inc.,
      39141 Civic Center Dr. Ste. 250, Fremont, CA 94538 Their Web site remain "under construction".

      The patent for TRM is United States Patent 6,009,432. There is also the patent application: United States Patent Application 0010000536.

      There is also a resume of one Vincent Poydenot who described his employment with Required Technologies as Vice-President of Software Development. He describes a 15-man development team which developed a full implementation for Windows NT 4.0 in Visual C++, with a port to Solaris and Linux.

      Links for the above here http://dmoz.org/Computers/Software/Databases/Relat ional/Implementations/Required_Technologies//

      An ad for programmers to work for the company when they were apparently in New York is here:
      http://www.codeguru.com/forum/archive/index.php/t- 188697.html/

      According to Pascal's DBdebunk Web site:

      "Not only had Date been exposed to a working TRM implementation - a prototype built by Required Technologies that included update and disk operations - but so have other highly respected database researchers and implementers. Moreover, several potential customers ran their own benchmarks against this prototype using their own real-world data and their own live complex queries. The results were extraordinary. In every case, TRM delivered orders-of-magnitude performance improvements over existing RDBMSs, in a large dynamic disk-based environment. These results can be demonstrated to anyone seriously interested in TRM....

      Not only does the prototype implementation of TRM (referenced above) still exist, but also a full-blown commercial disk-based updatable RDBMS based on TRM (with standard SQL, ODBC, JDBC, and third-party tool interfaces, plus all standard subsystems) is nearly complete."

      The above was as of January 2005. Back in late 2004 Pascal was describing "large transactional databases with subsecond response." Note that these were not in-memory databases but disk-based.

      Apparently there is some legal or financial issue involved that is threatening the owner with "having his company taken away from him", according to one reference. They claim the guy has been fighting tooth and nail to resolve the issues, but there hasn't been any recent info.

      I would have assumed the whole thing would have been resolved by now in most cases, unless the people involved are waiting for some court case.

      I have now found a post on Curt Monash's blog whereupon he apparently - I say apparently because I have no idea whether his information is correct - debunks the entire project and the company:
      http://www.dbms2.com/category/memory-centric-data- management//

      Fabian Pascal's response on Curt's blog:

      Monash knows zilch about TRM. But then he knows zilch about RM too,and lack of knowldge has not stopped him ever before from generating crappola. In fact, he is not even aware of how ignorant he is.

      Nonsense indeed, but the only one is from Monash.

      Unskilled and unaware of it. Typical american.
      Comment by fabian pascal -- November 14, 2005 @ 12:41 pm

      There follows a ton of incredibly acrimonious comments between Monash and Pascal in which both accuse the other of various incompeten

      --
      Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
  10. Lets hope whatever the alternaterntive...is.... by SimplyBen · · Score: 1

    Isn't half implemented for the better part of a half decade like rest of the MySQL stack. I give the swedes kudos for finally becoming a real database with version 5 (they only became a half assed database in 2003 with InnoDb). Finally we have stored procedures, and XA transactions, and the migration to 5 from 4.1.xx doesn't seem to be as hellacious as 3.xx to 4.xx (we're fighting fedora core's default build of 3.xx right now) But I guess who really wants transactions any way?

    --
    if sign.nil? Sig.new
    1. Re:Lets hope whatever the alternaterntive...is.... by Lacit · · Score: 1

      Me.

      Transactions increased my MySQL queries by a factor of greater than 10.

    2. Re:Lets hope whatever the alternaterntive...is.... by Anonymous Coward · · Score: 0

      Well, to be fair - TcX, the company that originally wrote MySQL, didn't set out to write an ACID compliant general purpose SQL base RDBMS.

      They wanted something that could handle a very simple database, with a lot of records very quickly, and rather than create their own query language, used SQL.
      They were then good enough to release the code.
      They were always up-front about the limitations, and explained how it had been written to suit their purposes and they didn't need transactions, subqueries, etc (or at least they didn't think they did).

      The rest of the world is responsible for a.) Mistaking it for a full RDBMS, and b.) making it popular, causing more people to fall for a.)

    3. Re:Lets hope whatever the alternaterntive...is.... by Master+of+Transhuman · · Score: 1


      Interesting. I wasn't aware of the back story. So MySQL was really intended to be a mere "file handler" from the start.

      Explains a lot.

      --
      Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
  11. With Oracle's products becoming more affordable .. by yintercept · · Score: 1
    I think there's hope for MySQL. With Oracle's products becoming more affordable

    Oracle has never been in the business of providing affordable technology. I doubt that this "coming affordable" that you mention is something they are doing by choice. However, if Oracle can find a sneaky way to stop the competition ... perhaps they will be able to stop this process of becoming more affordable. Then, Larry Ellison can get back to having so much extra cash that he feels like a God!

  12. Re:With Oracle's products becoming more affordable by masdog · · Score: 1

    Didn't Oracle just release a free version of their database technology?

  13. Re:With Oracle's products becoming more affordable by Anonymous Coward · · Score: 1, Interesting

    Yes. Do you really believe that they did this out of the goodness of their hearts?

    No commercial vendor will do anything unless they believe it will make them money in the long run. Period, end of story. Oracle wants you to use their products for everything. Easiest way to achieve that, with MySQL and PostgreSQL out there for free? Give away a low end product. Let people tinker at no charge. Start charging them when they need the more powerful product.

    Hook 'em early, and slug 'em when they're hooked. It works at pretty much every level. Oracle is moving along a very well worn path. How well it will work, remains to be seen.

  14. Does mysql support non-literals for default values by Anonymous Coward · · Score: 0, Interesting

    I ran into an annoying situation just yesterday trying to modify some code that 'depended' on mysql. I wanted to know WHEN a record was made in a certain table..

    Quick and dirty solution was to add a column with the default value of CURRENT_DATE().. Too bad that didn't work. There is even an entry in the mysql bug database about it that date back years..

    http://bugs.mysql.com/bug.php?id=1696
    http://bugs.mysql.com/bug.php?id=2040

    More retarded is that the lovely people on the mysql dev team have those classified as 'feature requests'.. Its not a feature request, its just another fundamental aspect of mysql that is missing and the type of attitude the mysql team has for missing features is just laughable(anyone remember reading the old mysql docs that talk about how useless transactions are and that you would never need them?)

    I had to spend 3 hours(no joke) dissecting the code to have it insert the current date into the record I wanted, 3 hours I will never get back.

    Thats just typical of every encounter I have had with mysql.. some annoying stupid feature I take for granted just isn't there.

  15. Which Database? by masdog · · Score: 1

    With Oracle's purchase of Inno-DB and their recent release of a free version of their database software, it looks like a war will be shaping up over the low end of the database market.

    Besides for being open-source, what advantages do PostgreSQL and MySQL have over Oracles' 10g Express, Microsoft's SQL Server 2005 Express, and IBM's proposed DB2 Express?

    1. Re:Which Database? by Anonymous Coward · · Score: 0

      If you have to ask the question, there is no way you can understand the answer.. Oracle is the cisco of the database world... no one competes or offers features oracle doesn't have.. mysql and mssql is a joke. DB2 has applications that are unique to it when you're talking about IBM iron and its kind of apples and oranges really.

      I will say this.. postgresql->oracle is the best combo in the book. Every project my company has touched in the last 7 years started with postgresql and if we ran into one of the limitations in postgresql, switching to oracle was painless.

      Converting mysql 'code' to work on oracle is damn near impossible.

    2. Re:Which Database? by Anonymous Coward · · Score: 1, Interesting

      Look at the licensing for Oracle 10g Express and you'll see that it is limited in terms of CPU and memory utilization and database size. The limitations still allow it to be useful for development and a number of small systems, but it would preclude 10g Express from replacing MySQL for situations like we have where MySQL is a back-end db for a variety of web apps (forum software, webCMS systems, weblog systems, etc).

      That being the case, Oracle has done a nice job with 10g Express. The web admin interface is quite nice and the install is a breeze. We run Oracle for our ERP systems and 10g Express provides a good database for our developers who work offsite to install on their home systems.

      If we weren't an Oracle shop, I don't see any real compelling reason why we'd go with 10g Express over Postgresql even if the admin tools are nicer.

      --

    3. Re:Which Database? by dkf · · Score: 4, Informative
      postgresql->oracle is the best combo in the book. Every project my company has touched in the last 7 years started with postgresql and if we ran into one of the limitations in postgresql, switching to oracle was painless.
      I'd agree with that, and point out that SQLite is there for people who come in at the small end of PostgreSQL. I'm told (by the SQLite author no less) that upgrading to postgresql is easy when you want to take that step. And it is public domain.

      Between them, sqlite->postgresql->oracle offer a full database solution for everything from "I want a better config file for my personal scripts" to "I have to run a mission-critical database for a Fortune 500 corporation", and you can't say much fairer than that.

      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
    4. Re:Which Database? by ajs · · Score: 2, Interesting

      I've watched dozens of projects in a handful of companies move from other databases to Oracle. These other databases have included Fame, Sybase, BDB, MSSQL, MS Access, MySQL, Postgress, PostgreSQL, Ingres, and a variety of lesser known databases.

      NEVER, have I seen such a project which I would refer to as "painless". Oracle is a monolithic beast which requires constant care and feeding by experts who have been so steeped in its ways that they are prohibitively expensive. Oracle perpetuates this situation and, as best I can tell, deliberately obfuscates their product in order to continue to rake in huge fees for training and services. Mind you, I've done Oracle DBA work in the past, so I'm not talking through my hat, here.

      I would never, ever rely on Oracle. I use it as a bargaining tool in project management ("if we hit a wall with PostgreSQL/MySQL/etc. we can always switch to Oracle"), but unless there is a practical reason such as customer requirements which cannot be altered by me, I dodge it, and I would suggest that others do the same.

      This has nothing to do with petty in-fighting about what feature-set is better. This is about a company that makes its money by abusing its customers. Why would you stand for that, even if it means giving up some feature, or working harder to implement some feature with a different tool? I would rather hire 5 MySQL or PostgreSQL engineers that work on expanding their feature-sets full-time, than pay Oracle a licensing fee.

    5. Re:Which Database? by kpharmer · · Score: 3, Informative

      > With Oracle's purchase of Inno-DB and their recent release of a free version of their database software,
      > it looks like a war will be shaping up over the low end of the database market.

      I think you're completely right there - the big vendors know that the little databases generate cash too - and mindshare. They can't afford to lose it. This is a market-protection plan for them.

      > Besides for being open-source, what advantages do PostgreSQL and MySQL have over Oracles' 10g Express,
      > Microsoft's SQL Server 2005 Express, and IBM's proposed DB2 Express?

      Well, MySQL is tarnished for a few reasons now:
      - future is uncertain due to innodb buy-out
      - history of inexplicable data quality and exception handling issues
      - dual-licensing complexity

      Postgresql is looking much better now:
      - they had some performance problems 3-4 years ago, but are now well-beyond that
      - is completely free
      - starting to get picked up within very large commercial applications

      In comparison SQL Server express and Oracle express offer:
      - a free database for very small applications
      - the opportunity to deploy a tiny database, then replace it with a larger one without any application code changes
      - opportunity for vendors and shops to reduce the number of supported databases

      DB2 Express offers:
      - a low-cost database (last I looked it was around $750/server)
      - with much more scalability than sqlserver/oracle express versions:
          - no storage limitations
          - partitioning is included (via mdc)
          - just two cpus (don't know if they can be multi-core or not)
          - I think 64-bit memory is supported, but is still limited to 4GB

      So, Oracle & SQL Server have one strategy (offer an extremely limited product for free), while DB2 has another (offer a slightly-limited product for less than or about cost of MySQL). IBM might change the DB2 strategy, but I hope they just add a extremely-limited free version, and keep the existing express version.

      And this strategy works: I've got oracle, sql server, db2, postgresql, and mysql in our organization, and am standardizing on db2. When we get a small database it uses a cheap db2 license. This keeps my labor costs down (which are far more than the software costs). If it wasn't for the cheaper licensed versions I'd probably be putting all of the small databases on postgresql - and growing that skillbase within the organization.

    6. Re:Which Database? by bluGill · · Score: 1

      The key is Fortune-500. When your database gets very large you will need full-time administrators watching it. You will need something that scales to big iron (million dollar servers), with good roll-over plans. Oracle does this well. PostgreSQL is great, but it doesn't scale up that far (yet).

      For smaller systems (even within Fortun-500) you don't need Oracle, but odds are you have it somewhere, complete with all those admins, so using Oracle isn't a big deal. Sure PostgreSQL could do the job, but why hire an admin for that when you already have Oracle admins?

    7. Re:Which Database? by briansmith · · Score: 1

      Oracle is a monolithic beast which requires constant care and feeding by experts who have been so steeped in its ways that they are prohibitively expensive.

      Oracle is no more difficult to maintain than PostreSQL. If you feel otherwise, please cite an example.

      Oracle doesn't require "constant care and feeding." In fact, I know of departmental Oracle databases that receive basically no maintenence.

    8. Re:Which Database? by ajs · · Score: 1

      First off, I've provided server arrays to such organizations before, and honestly I've never seen a "big iron" server that was purchased with any more justification than the phrase "big iron" turned on some executive. I've seen Oracle installations on single (or few) large systems work as well and as poorly as their open source competition from MySQL and PostgreSQL in VERY large environments. Some also hybridize (I think Sabre announced on a Webcast at one point that they used MySQL front-ends with DB2 back-ends), and that works well too.

      The question is: what are your costs, and what do you get for those costs? In my experience the costs are far higher in terms of systems and administration overhead with Oracle. If your experience differs, then we'll have to agree to disagree until one of us is exposed to a sample of the others' experience.

    9. Re:Which Database? by bluGill · · Score: 1

      I do not have direct experience in big databases. I work with backups (over the years for several different companies). I cannot say if those big servers are justified in any given case. I can tell you the justification they gave to me. They seem smart, but I'm easily fooled.

      Even then, I stand by my comment that there is a level the PostgreSQL/mySQL just doesn't scale to. If oracle will meet your needs is a different matter, though generally it is accepted that it does scale better. At a great cost, but if you really need a big database you pay the price whatever it is.

  16. The name of the new storage engine is... by adnonsense · · Score: 4, Funny

    MyFireGreSQL.

  17. Re:Does mysql support non-literals for default val by wimvds · · Score: 1

    Maybe you should have spent the 3 hours on reading the docs, because one of the features of MySQL (prior to 4.1) is that the FIRST timestamp column of any record is automatically updated to the current date/time whenever you change (insert/update/replace) a record. This default behaviour has changed since MySQL 4.1, now you can specify if you still wish to use this behaviour or not, or if you want another TIMESTAMP to change : http://dev.mysql.com/doc/refman/5.0/en/timestamp-4 -1.html

  18. Value of the GPL if another buys the brains. by David+Hume · · Score: 1
    PostgreSQL has a huge lead here over MySQL, that is, as an community-driven open source project. They already have a large distributed network of developers, whereas I understand MySQL has been developed almost entirely by the AB.

    I've often wondered about this as a general matter. That is the value of the GPL, as a practical matter, if somebody else buys the brains? I'm not asking to be antagonistic; I honestly don't know.

    I've read that various GPL, LGPL and open source projects are heavily supported by corporations who employ the hackers who do the majority of the work, and the vast majority of the hard work. Stuff that other people aren't capable of doing and/or don't have the time to do. How vulnerable are various projects to somebody buying the brains?
     
    1. Re:Value of the GPL if another buys the brains. by joto · · Score: 1
      How vulnerable are various projects to somebody buying the brains?

      Probably not that much. Most of the brains aren't "for sale". They have their own interests that they work on, which they do, as long as it's still fun, and they have food on the table. A corporate sponsor might provide more time for their pet projects, thus making life easier for them, but it's the love of what they do, and their expertise of what they do, that's getting anything done at all. The corporate sponsing only helps.

      Some notable projects might never have happened without corporate sponsors (e.g. openoffice or mono). But then we would have other alternatives that people worked on. Remember that before openoffice, there were dozen of wordprocessor/spreadsheet/presentation/office suite projects around. Some of them would eventually succeed. (even today, koffice seems pretty healthy). And before mono, there was lot's of talk of free java implementations (it still is), and lot's of other free languages and runtimes to choose from (it still is).

      It's better to look at it as a healthy symbiotism, than a treath. It sure is good that we have corporate sponsors, but it's not like it would be the end of free software/open source, even if they all suddenly disappeared overnight.

  19. MySQL will and by TarrySingh · · Score: 1

    sure will be absorbed by Oracle. Mark my words!

    --
    Scott McNealy to Michael: "Suck my Sun!" Michael Dell to Scott : "Lick my Dell!"
  20. Oracle are now offering a free version of their DB by philj · · Score: 2, Informative

    In case you didn't know, Oracle are now offering a free version of 10g: Oracle Database 10g Express Edition

  21. Indexing component is key for performance by aharth · · Score: 2, Informative

    From experiments with my database-like system in Java (http://sw.deri.org/2004/06/yars/yars.html), I learned the hard way that the indexing/storage component is the key piece for any system dealing with large amounts of data. Performance depends to a great extent to the underlying storage mechanisms used (and different implementations of e.g. B+-Trees vary greatly in performance and functionality). Implementing a fast B+-Tree with transactions is a non-trivial task.

    Let's hope MySQL AB. find a good replacement for InnoDB.

  22. Original reasons for using MySQL replaced by... by Anonymous Coward · · Score: 4, Insightful

    When many people started using MySQL over other choices, the landscape was quite different.

    * PostgreSQL wasn't available in a native version on Windows and their developers were still mastering the codebase they inherited (they mastered it around 7.3, before that they tiptoed around making massive changes).

    * Firebird wasn't open sourced yet (before summer 2000).

    * and SQLite wasn't released either (before summer 2000).

    The above 3 conditions are no longer valid.

    Many of us use MySQL because so many other software packages are already designed to work with it. Like Windows, it doesn't matter even if better alternatives exist because this one reason alone is most compelling if the software is "good enough".

    In other words, the original technical reasons for choosing MySQL over others has been replaced by a compelling new reason: the same reason many people use Microsoft Windows today.

    In a nutshell:

    * If I want a super fastest lookup table without SQL, I'll use something like cdb or tinycdb.

    * If I want a fast and simple database requiring only a tiny subset of SQL, I'll use something like SQLite.

    * If I want a modern, full-featured, and free rdbms/ordbms, then I'll use PostgreSQL.

    * If I want compatibility with most 3rd-party software, then I'll use MySQL.

    1. Re:Original reasons for using MySQL replaced by... by Anonymous Coward · · Score: 0

      Also 6-8 years ago even if you did spring for oracle (oracle for linux 1.0) etc it was slower than dog shit for web use. Mysql inserts were much faster. Interbase blew them all way on inserts...very slow querying. So you just moved your integrity stuff to your code.

      anyone bashing mysql wasn't around when p2's were the norm...choices were hashing berkleydb, mysql, interbase before firebird, or 9 grand for buggy oracle linux....hell and nothing ran on bsdi.

    2. Re:Original reasons for using MySQL replaced by... by hritcu · · Score: 1

      May I add this? * If you don't need a relational database but need to store data use a text file * If you don't need a relational database but need transactions use Berkeley-DB

      --
      If you don't fail at least 90 percent of the time, you're not aiming high enough. (Alan Kay)
  23. (OT) looking it up by StrawberryFrog · · Score: 1

    I did a brief google, but nothing definitive came up... Someone who knows more about it, or has more time might have more luck.

    --

    My Karma: ran over your Dogma
    StrawberryFrog

    1. Re:(OT) looking it up by Anonymous Coward · · Score: 0
  24. Re:Oracle are now offering a free version of their by IANAAC · · Score: 2, Insightful
    Free? I can't seem to find the GPL source anywhere.

    Different kind of free. You do know there's more than one kind of free, don't you?

    Anyway, Oracle XE is actually pretty nice, particularly if you want to throw together a quick web-based app. I wish I could put something together with PostgreSQL/PHP (or whatever) as quickly as I could with Oracle's web forms designer.

  25. Don't think so by A+nonymous+Coward · · Score: 2, Informative

    GPL prevents redistribution, not use. Companies only need to buy the commercial MySQL license if they want to distribute closed source programs. Internal usage doesn't care a whit.

    1. Re:Don't think so by thePowerOfGrayskull · · Score: 1

      Actually, make that "closed source modified versions of mysql". They can redistribute closed source programs that connect to mysql, yes?

    2. Re:Don't think so by arkanes · · Score: 1

      Not to my knowledge. The MySQL client libraries are GPL.

    3. Re:Don't think so by yahyamf · · Score: 1

      what if the code is used in a web based application? The code is on one server, but used by many distributed clients. What does the gpl say about this?

    4. Re:Don't think so by linuxhansl · · Score: 1
      Companies only need to buy the commercial MySQL license if they want to distribute closed source programs. Internal usage doesn't care a whit.

      Correct! I think MySQL makes some chunk money of companies who embed the MySQL DB or parts of it and redistribute the result. Under the GPL that would be a derived work and hence the source code of the whole thing would have to distributed with the executable code; hence if these companies do not want to distribute their source they'd have to acquire a different from MySQL (you can actually argue whether MySQL is really an Open-Source company - I would argue it is not, the source distribution is just used for wide testing).

      Remember, MySQL can only possibly make money from two sources:

      1. Selling licenses (in this case close source licenses)
      2. Providing service

      Traditionally 1) scales better. Without owning the copyright or a redistribution agreement with the copyright owner, MySQL can only distribute the InnoDB under the license granted to them. In case of InnoDB that is the GPL (unless they find another agreement with Oracle). That basically cuts of big chunks of revenue stream 1).

  26. Whistling in the dark by ppanon · · Score: 2, Interesting

    He said the code for InnoDB is under the GPL (General Public License), so 'the code is always out there. It will always be out there.'

    That's true enough, and yet MySQL uses GPL for its free Linux version but a different licence for the Windows version, don't they? They can't just pick up InnoDB and roll it into their Windows release because they don't hold the copyright to be able to release Windows InnoDB under their Windows licence. So, if Oracle kills InnoDB (or starts increasing the price for non-GPL releases) MySQL might have to revisit its business model.

    I always liked PostgreSQL better anyways.

    --
    Laissez lire, et laissez danser; ces deux amusements ne feront jamais de mal au monde. - Voltaire
  27. Oracle has a FREE database by Anonymous Coward · · Score: 0

    Oracle has a *free* database now (in beta last I heard, but you can DL it). Oracle 10g Express (XE) edition. Still not GPL free, but at least $$$ free :)

    http://www.oracle.com/technology/products/database /xe/index.html

    Same codebase as the big daddy but with a few limitations (like you can't store 10TB of data in it, etc..)

  28. Re:Oracle are now offering a free version of their by Anonymous Coward · · Score: 0

    It's not even freeware, it's crippleware.

  29. NULLs by einhverfr · · Score: 1

    I tend to use NULL to mean that I know that there is no value. Setting a field so that it allows NULL is like saying "not all things of this type have this property". Setting a field NOT NULL says that all things of this type DO or MUST have this property.

    But you have a problem doing this because your comparison sematics don't work for the meaning you are attributing to NULL. NULL tends to be used this way though and unfortunately it is an accepted practice :-(

    The problem comes from the following issue: What do the following expressions evaluate to?

    NULL NULL
    NULL = NULL
    NULL IS TRUE
    NULL IS FALSE
    first_name || ' ' || NULL (|| being the ISO SQL concat operator)

    In all the above areas, the expressions evaluate to NULL. After all when you compare UNKNOWN to anything, the answer is always UNKNOWN ;-) Yes, that means you cannot concat a value you know not to exist with a value that does exist because you don't know what the result will be.unknown as well.

    --

    LedgerSMB: Open source Accounting/ERP
    1. Re:NULLs by Nurgled · · Score: 1

      I've never had much problem with the idea that one "nothing" is not the same as another "nothing". If I know that two people both don't have a middle name, do those people both have the same middle name? If two people don't have a car, do they have the same model of car?

      Your case of displaying a full name optionally including a middle name is one particular case where the result is sub-optimal, but all it requires is a little more care when dealing with optional fields. Most applications don't concatenate strings as part of database queries anyway; they fetch the data and then figure out how to present it at the presentation level.

      Finally, I wonder how you would represent "non-applicable" in a case where you have a foreign key constraint referring to another table. That field must either have NULL or a valid key to satisfy the constraint. If we've used NULL to mean "I don't know", then how to we represent "I know that this is not relevant"?

  30. Why NULL's exist by einhverfr · · Score: 1

    In a purely mathematical sense, Date is correct in that NULLs don't fit into the math model of relational algebra terribly well. They are also overused which creates a tendency to have ambiguities in the interpretation of data (example: have a wage column for hourly earners and a salary column for salary earners. Does a NULL mean that you don't know the hourly wage or that the person doesn't have one?). But this is largely a db design issue and not an inherent issue with NULLs per se.

    The problem exists when you are gathering real-world information and you know how this information is supposed to fit together. You may be missing pieces of that information at any given point. So what do you do? You use a special, reserved value to indicate "unknown at present time" which is exactly what a NULL should be used for.

    For example, someone calls me for technical support. They are in a hurry so I gather telephone number, email address, and billing address. Lets say there is more information I need about this contact for my relational model but I forget to gather it at this time (lets say, zip code). So I store it as a NULL, and then I go to Google Maps, look up the zip code, and fill it in later :-)

    --

    LedgerSMB: Open source Accounting/ERP
    1. Re:Why NULL's exist by Master+of+Transhuman · · Score: 1

      "Does a NULL mean that you don't know the hourly wage or that the person doesn't have one?). But this is largely a db design issue and not an inherent issue with NULLs per se."

      If you don't know what a NULL represents, I'd say that's a problem with the theory, not a design issue.

      "Lets say there is more information I need about this contact for my relational model but I forget to gather it at this time (lets say, zip code)."

      This is reasoning backward. You're assuming your relational model (or more importantly in this case, the IMPLEMENTATION of that model) is correct, and you need nulls to represent missing data. Maybe you don't. Maybe you need an implementation that allows for missing data without using NULLS. That's the issue.

      I myself don't have enough relational theory knowledge yet to decide how that might be done, but apparently Date and Darwen and the people who invented the TRM model have figured out ways that work. How well they work is an open question since they're not being used on a wide scale because none of the existing DBMS systems implement them. A chicken-and-egg situation.

      This is why Date, Pascal, and others are frustrated - without an implementation, they can't prove their correctness is implementable, and none of the major DBMS vendors give a shit about correctness so they won't implement what needs to be implemented to prove it works. If Required Technologies had taken off, it might have been able to do that, but it doesn't look like that's going to happen at the moment.

      --
      Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
    2. Re:Why NULL's exist by einhverfr · · Score: 1

      This is reasoning backward. You're assuming your relational model (or more importantly in this case, the IMPLEMENTATION of that model) is correct, and you need nulls to represent missing data. Maybe you don't. Maybe you need an implementation that allows for missing data without using NULLS. That's the issue.


      The wage example and the zip code example are designed to illustrate different issues with NULL's or not having them.

      In the wage issue, NULL could mean more than one thing because the table is not atomically designed. I.e. an employee might have a wage and might have a salery. A null in one field could mean more than one thing. So you could either break off the wage and salary fields into separate tables as Date does in his example or (preferably) find a better way to represent pay so that it can be generally done such as: expected_annual_pay, pay_per_unit, and pay_unit. So this makes it easier to enforce data constraints.

      In the second issue, the zip or postal code is an atomic part of the address or at least for the purpose of this example I am assuming it is (yes, there are a few countries that don't have them but bear with me). Because the zip code is an atomic part of the address record, it makes no sense to break it off to another table, as doing so would make it difficult to maintain sanity checks within the table without writing much more complex triggers.

      Also, having read a number of Date's essays outside his books, my reading of his ideas are that these recommendations can be thrown out but just be aware of the cost of doing so. If you want to denormalize your database, that is fine. Just do so because you are informed of what the costs of doing this are.

      This is important because there is a balance often between avoiding NULL's altogether and maintaining a simple system. I personally draw the line with two issues:

      1) Keep columns that need to be jointly maintained (i.e. with data enforcement constraints) together in the same table.

      2) Avoid using NULL's both as "not applicable" and as "unknown". Generally minimize their use, but I don't say don't ever use them...

      --

      LedgerSMB: Open source Accounting/ERP
    3. Re:Why NULL's exist by Master+of+Transhuman · · Score: 1

      I agree with you on the wage example. An employee really shouldn't have both a pay rate and a salary, so such a model is not correct on the face of it (weird business rules for some weird industry not considered here.) NULLs don't help when the model is wrong, they make things worse. This is what I mean by reasoning backward - until the model represents reality perfectly, other issues - such as performance or even feasibility - shouldn't be considered, as they will introduce errors in the design. The reason TRM is considered so valuable by Date and Pascal is because it supposedly allows dealing with performance and feasibility issues without compromising (to some better degree at least than current DBMS systems) the relational correctness of a model.

      As I mention in my other response, I think the zip code issue is not so much "atomicity" as it is whether an attribute is a REQUIRED attribute or not. If it is required, it should NEVER be "unknown" - therefore NULLs are not necessary. (If a fact is not "available", that information should not be in the database at all - it should be on somebody's "To Do" list to find that fact.)

      If it is NOT required, it shouldn't be in the model at all, since by definition you're not doing anything with that attribute but collecting it for vague and unclear reasons - which leads to the sort of data corruption we see. This is a business issue as much as it is a relational model issue.

      Databases are supposed to be collections of consistent verifiably logical propositions about entities and their relationships. Anything that weakens that weakens the value of the database. This is why I suspect NULLs can be dispensed with. I just need to nail down the "why" so it can deal with ANY situation that might arise where it SEEMS that NULLs are a solution. Otherwise people will just keep coming up with "Well, what about THIS situation?" - requiring re-analyzing the thing from scratch to demonstrate that their situation really isn't special.

      This is where Date and Pascal get frustrated - they can explain the issues until they're blue in the face, but the rest of the industry just says "Well, it won't work BECAUSE...it DOESN'T work with existing DBMS systems." To which Date and Pascal can only respond, "Duh! What did you expect from DBMS systems that DON'T implement the model correctly?"

      Might be interesting if the open source Rel language project ever gets anywhere in properly implementing the model. Or if the TRM company ever gets off the ground.

      --
      Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
    4. Re:Why NULL's exist by einhverfr · · Score: 1


      As I mention in my other response, I think the zip code issue is not so much "atomicity" as it is whether an attribute is a REQUIRED attribute or not. If it is required, it should NEVER be "unknown" - therefore NULLs are not necessary. (If a fact is not "available", that information should not be in the database at all - it should be on somebody's "To Do" list to find that fact.)

      If it is NOT required, it shouldn't be in the model at all, since by definition you're not doing anything with that attribute but collecting it for vague and unclear reasons - which leads to the sort of data corruption we see. This is a business issue as much as it is a relational model issue.


      This is a good point. From a purely cartesian perspective, I think it is reasonable to say that NULL's in data sets are inconsistant with the mathematical base if you assert that the row is the atomic unit of information. However, there is another way to look at it-- that is that tables are nothing more than logically correlated sets (i.e. columns) of values. These exist in a 1:1. A NULL literally means here "no data" but I would argue that you are breaking the above definition of a table if you use it for "not applicable" instead of just "unknown" except in foreign keys.

      Back to the zip code scenario. We can assume for the purposes of the example that I expect every address to have a zip or postal code. Therefore it makes sense to put it in with the address table. If the zip code is not going to be immediately collected on every record but will need to be filled eventually (maybe via an automated zip code lookup) allowing for NULL's in this field makes sense (and is certainly easier than worrying about the data management headaches of breaking it into its own column).

      What I am trying to get at is that there is an ambiguity in the way relational models are implimented in RDBMS's. This might be what you might think of as an entity-relational vs value-relational ambiguity. The entity-relational model might allow for a single customer record to have multiple addresses, while the value-relational model is based on independant but correlated columns (logically the keys plus a column idea even if you organize these into tables for maintenance purposes). Hugh Darwin is clearly sees things are value-relational while Date and Codd seem to see things as entity-relational. However, these are not mutually exclusive and can be different ways of looking at the same DB schema even.

      --

      LedgerSMB: Open source Accounting/ERP
    5. Re:Why NULL's exist by Master+of+Transhuman · · Score: 1


      I think it's acceptable to have multiple addresses in a record, at least if they're not all intended to be the same address TYPE (such as "home address").

      I'm not sure I'd go for the idea that tables are just logically correlated attributes. In the entity concept, of course they are. If they aren't, I don't see the "logical correlation".

      The zip code issue can be easily solved by an auto-lookup BEFORE having to put a NULL in the field, I'd say. There may be other attributes harder to come by, of course, so the principle might stand, but I'd still prefer to see this sort of thing handled by a properly designed data collection and vetting system that front-ends the database.

      I'd agree that breaking out a column into its own table just because the data is missing is an extreme way to handle it - but my solution is not to let the data into the database at all if required data is missing.

      Not every piece of data in a system needs to be in a database per se, however it is stored. You might store tuples with missing data in Oracle or wherever, as long as you don't treat it as part of the "real" database. Then NULLs don't matter. The issue is whether NULLs should be allowed in tables that are actually intended to be used for querying. If you have a customer with a missing zip code, leave that customer tuple in a "vetting database", rather than the "query database" - until you've retrieved the missing data. Makes for a nice clean solution - now you can query your heart out for any "missing" and "unknown" stats you want without worrying about mixing it up with "real" data...

      Also, I suppose there's nothing preventing missing attributes from being in a "virtual table" while still residing in whatever passes for a physical table. The queries just ignore them entirely, just not at the NULL level, but on the table level.

      --
      Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
    6. Re:Why NULL's exist by einhverfr · · Score: 1

      I think it's acceptable to have multiple addresses in a record, at least if they're not all intended to be the same address TYPE (such as "home address").

      This is a business rule that might or might not apply.

      For example, branch offices of a business.....

      The zip code issue can be easily solved by an auto-lookup BEFORE having to put a NULL in the field, I'd say.

      At the price of responsiveness.... Depends on where you want to be spending your money and what it really costs you.

      One of the things I have occasionally been hired to do is come in and reconstruct missing/incomplete data from poorly designed databases (or at least databases that don't enforce sane data integrity rules) In general misuse of NULL's is a big part of the problem, as is insane design like having foreign keys that might reference any of a number of other tables' primary keys.... Most of the time if your data model is complete enough, missing information can often be reconstructed from known sources and exceptions flagged. The zip code is an example of something that could easily be handled in this way and so I am not too concerned about where/when it is resolved (though this might be more critical depending on required reports).


      I'd agree that breaking out a column into its own table just because the data is missing is an extreme way to handle it - but my solution is not to let the data into the database at all if required data is missing.


      That is all good and well, but sometimes the data model might have one or more fields that might be optional.

      One example might be a second line in the street address. Something like
      100 4th Ave
      Ste 510
      New York, NY 10001
      (fictitious example)

      Usually I store the second line if it is not entered as an empty string because I assume that the second line is empty if not entered. Some people prefer to use NULL's which I think is potentially more problematic on a semantic side.

      Aside from the entity v. value relational ambiguities, you have a bigger conflict with RDBMS's and SQL, and I think that Date et al hit the nail on the head when they say that SQL is not an exact implimentation of a relational mathematical model (it is certainly based on the field of relational mathematics but it is not a computer implimentation of this theory). The reason is that relational mathematics and cartesian operations provide a very limited semantic framework for analyzing information, and certain assumptions of the cartesian world don't work for the RDBMS world (such as the assumption of complete sets vs the assumption of limited knowledge). So you have a conflict between the theoretical underpinnings regarding cybernetics/semantics and mathematics, and going with a pure mathematical model does absolutely nothing to erase the semantic ambiguities that exist with NULL's.

      I.e. if one breaks off every data column into a separate table and add required keys, then NULL's simply exist in the form of an absense of a join value, and these NULL's present the same semantic ambiguities that exist in current implimentations. I.e. is the value not relevant? Unknown? Did someone just forget to enter it? What does it mean that there is no value there? This problem exists even though now you can essentially translate your SQL queries into cartesian operations involving union and intersection operators (I would argue that you can do this anyway by logically assuming columns to be separate tables).

      I.e. the real problem with NULL's is not a mathematical one. Mathematicians don't like to see them because they are reminded that RDBMS's are not software versions of relational theory in a strict sense (though relational theory provides a foundation for the software). People with some linguistic training don't like to see them because we understand that the meaning of the data is what is important, and NULL's often introduce ambiguity into that meaning. I am in the semantic camp because I don't see how NULL's introduce problems from a mathematical perspective.

      --

      LedgerSMB: Open source Accounting/ERP
    7. Re:Why NULL's exist by Master+of+Transhuman · · Score: 1

      Usually, in the case of something like branch offices, you break that off into a separate table. For instance, at City College, we have several possible addresses for staff and students, and several possible email addresses. Both of these get their own table linked by the foreign key to the primary person table. While putting all the addresses in one tuple might be just as unambiguous and provide better performance, I think it's cleaner to separate them out, even though relational theory actually allows for such.

      As for zip code autolookup impacting responsiveness, I'd say that's irrelevant (or may be) because if you don't have the zip code, you're not going to be querying that record anyway (if a query depends on the zip code), and responsiveness is strictly an implementation issue anyway. Depending on how you do it, the impact could be really minimal.

      It also argues for the practice of separating out attributes based on their probability of being known in advance as well. In other words, never put address in the primary tuple (physically) if it is subject to not being known. Only put attributes in the primary tuple that have to be known and are likely to be always known (name, for instance - if you don't have somebody's name, you really don't have anything at all).

      I would agree that using NULLs for "optional" fields is not a good idea, either. I think Date discusses all these issues in his "Database in Depth" by referring to "empty tuples" - every subset of a tuple is a tuple, and it may be empty. That is, it may have a heading, but no values. And the best way to represent this is to break it out into its own relation which has no tuples. He shows this in the case of a supplier with a status and one without - the one without simply doesn't appear in the database.

      In fact, a relation can even have an empty heading - he shows that this is equivalent to the concept zero in math and apparently is very important in relational algebra. He refers to two relations as Table_Dum and Table_Dee - relations of degree zero, of which only two exist. I'm still not sure I follow all this, but it indicates there are depths to this that I haven't plumbed yet.

      I think Date's view of the absence of join values is summed up in the quote he uses when discussing 6NF, from Wittgenstein: "Whereof one cannot speak, thereon one must be silent." As Date puts it, "If you use nulls, you're effectively making the database state explicitly that there's something you don't know. But if you don't know something, it's much better to say nothing." So instead of showing a value with null, you don't show the value at all. As you say, it's effectively the same as using a NULL - but Date argues it's better since it doesn't introduce the logical ambiguities of NULLs in other areas.

      Whereas you might not know WHY the value is not there, putting a NULL there doesn't help you solve that particular question. In my view, that goes back to the overall system design - if you have the right design as to capturing and vetting the data before it gets to the database, then you can find out from THAT why you don't have a value there. (And of course all THAT information can be captured in a database - just not the same database.)

      Of course, all this depends on having a database that can handle the semantics of all this - and since most of them can't, we're back to using NULLs as a poor substitute.

      I need to find time to finish reading "Database in Depth" - it's an excellent overview of these issues, although it's a bit terse due to the small size of the book (and I can't afford the $105 they charge for Date's huge college textbook on databases right now.)

      --
      Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
    8. Re:Why NULL's exist by einhverfr · · Score: 1

      Usually, in the case of something like branch offices, you break that off into a separate table. For instance, at City College, we have several possible addresses for staff and students, and several possible email addresses. Both of these get their own table linked by the foreign key to the primary person table. While putting all the addresses in one tuple might be just as unambiguous and provide better performance, I think it's cleaner to separate them out, even though relational theory actually allows for such.

      If you might notice, my example only mentioned zip codes as part of an address. The address was assumed to be such a table. Whether it is in the same table as other contact information will depend on business rules.

      --

      LedgerSMB: Open source Accounting/ERP
    9. Re:Why NULL's exist by Master+of+Transhuman · · Score: 1


      Right, understood. Zip codes would always be in the address table, however that table is designed. Unless there is some possibility that a zip code would not be available because it didn't exist for that location (I'm talking in the US only here, not internationall) - I suppose that's possible.

      Sort of like the problem with international addresses - there's such variation that designing an address database that cleanly handles any kind of address is an interesting exercise.

      --
      Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
  31. But what do you field optionnal fields with? by renoX · · Score: 1

    If you do not use NULL, what do you put in say the maidden name of a man?

    I think that it makes more sense to use a null value than an empty string.

    1. Re:But what do you field optionnal fields with? by einhverfr · · Score: 1

      I think that it makes more sense to use a null value than an empty string.

      Why?

      How do you differentiate between a woman with an unknown maiden name and a man with no maiden name? If you don't do this then later when you need to do so, you will have ambiguous data in your db.

      You have two options here.

      You can split the maiden name column into another table where a NULL value means "Unknown" and no value means "No value" or you can use an empty space to indicate that the Maiden Name has no value in the table.

      --

      LedgerSMB: Open source Accounting/ERP
    2. Re:But what do you field optionnal fields with? by Master+of+Transhuman · · Score: 1


      If your relational design is correct, you don't HAVE a maiden name for a male. Such a field makes no sense.

      You're reasoning backwards from existing DBMS systems, not from relational theory.

      --
      Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
  32. Re:With Oracle's products becoming more affordable by yintercept · · Score: 1

    Thanks Mr. Coward. Oracle has a tradition of giving product away to students and developers as well. The goal of this goodness was to corral companies into buying Oracle. They are extremely aggressive at getting money out of your company once you bite the hook.

  33. I use LAPP by einhverfr · · Score: 1

    Linux, Apache, PostgreSQL, Perl or PHP.

    As rugged as a Lapp. All the better because Linus is a Finn and Lapps and Finns are closely related :-)

    --

    LedgerSMB: Open source Accounting/ERP
  34. Ha! Ha! Hee! Hee! by Anonymous Coward · · Score: 0

    The slashdot effect !

    1. Slashdot supports Netscape corporation against Microsoft in the Broswer Market
    2. Bang!
    3. Slashdot suppports MySQL, and runs it's site on it.
    4. Bang!

    Ha! ha! Ha!

  35. Wrong by einhverfr · · Score: 1


    Both
            0 NULL

    is false.



    Wrong. These both evaluate to NULL, not false. After all, you can't tell me that you know if an unknown integer or float is greater, less, or equal to 0, right?

    With NULL's all your logic is trinary:

    TRUE, FALSE, or NULL.

    TRUE OR NULL is TRUE (there is no value in place of NULL to make this false)
    FALSE OR NULL is NULL (depending on what you put in place of NULL this could be true or false)
    FALSE OR FALSE is FALSE (obviously)

    NULL is used to create an assumption of the limitation of our knowledge.

    --

    LedgerSMB: Open source Accounting/ERP
  36. Try a real 50+ MB MySQL DB Test by fprog26 · · Score: 1

    Connection speed and non-transactional ?

    Maybe for a small site.

    Try MyISAM or InnoDB.

    Now create a table with 30 fields:
    userid : int, dataid : int, subdataid: int, someparam : int, a bunch of varchar(30), varchar(255), with 3 of them as FULL TEXT with index.
    Create an index for any where clause field.

    Now take this table, insert 1...50000 rows with incremental data and some field being identical.

    Now create 10000+ rows with the same userid, and a mixture of dataid/subdataid, being equals, like this, (1,1,1), (1,1,2)...(1,1,1000),(1,2,1)...(1,2,1000)... you get the idea.

    Make sure, the RAW DB file is at least 50 MB.

    Now do 6 queries that looks like this:

    select * from test where userid=1
    and somevarchar likes 'test%'
    and somevarchar2 likes 'foo%'
    and someid > 10;

    and let it crawl.

    HTML page load time 30 seconds and up.

    MySQL is great for small sites, but it doesn't scale up.
    We encountered this problem on our LAMP server, when we added a *busy* site to our framework and the busy site was producing 10,000 hits per day... We had to move the site to a new server... with PostgreSQL.

    BTW, I did some similar test with Oracle10g and PostgreSQL 8.1, I couldn't see much difference apart that Oracle seems to create indexes by itself and it cost 45K$ per processor, unless you buy a Dell/Oracle server blade deal.

    You can try Oracle10g from http://otn.oracle.com/

    The only fact that I will agree with you is:
    MySQL Cluster seems to rocks [never test it myself],
    but our friend Google seems to be using it a lot.

    Of course, if your website has less than 1000 hits per month, like almost everyone here, there's no point between MySQL/PostgreSQL, use whatever you want.

    1. Re:Try a real 50+ MB MySQL DB Test by Jamesday · · Score: 1

      Then, some of us use MySQL for LAMP web sites doing 5,000 page views per second, not just 10,000 per day, and are approaching a billion queries per day with hundreds of gigabytes of data. Saying it's unsuitable for LAMP applications isn't very consistent with those practical results.

      Did you analyze table to update the optimiser statistics after adding the rows? Did you have a suitable index? What did EXPLAIN SELECT.. say? Did you try a covering index? Did you try InnoDB, given the lack of any use of a fulltext index in that query? Were you using the program defaults (which are for tiny applications) or did you use one of the sample my.ini files?

  37. eating up the market?! disruptive?! by Stu+Charlton · · Score: 1

    MySQL is hardly a disruptive technology. Disruption implies "radically different". I question whether another RDBMS that happens to cost less and can be tinkered with is truly disruptive.

    And let's be clear. MySQL's database revenues were 0.01% of Oracle's in 2005 based on the Gartner and IDC surveys. Oracle makes in 12 hours what MySQL takes a year to make. And Oracle's growth is not shrinking at the expense of MySQL, not according to any statistics YET anyway. Perhaps it will happen, but a number of OSS advocates are claiming advancements that are 3 to 5 years away.

    --
    -Stu
  38. eh? by Stu+Charlton · · Score: 2

    Oracle is a monolithic beast which requires constant care and feeding by experts who have been so steeped in its ways that they are prohibitively expensive.

    I know multi-terabyte Oracle DBMS' that run the entire billing and customer systems for some major telecommunication companies are basically run by 2 or 3 people, and they also have a couple dozen minor multi-gigabyte databases to support.

    Oracle requires very little care and feeding. It does have a lot of knobs available for twiddling if you need to tune it, but if you're constantly twiddling them, there's usually a larger problem (the application has design problems, your hardware is inadequate, your staff is twiddle-happy, etc.)

    Oracle perpetuates this situation and, as best I can tell, deliberately obfuscates their product in order to continue to rake in huge fees for training and services.

    Complete bullshit. Oracle is one of the most well documented databases out there. If ppeople would just take the time to read the Concepts manual and browse Asktom.oracle.com for a day, that would skewer a lot of the "mystique" surrounding Oracle.

    You don't need training or P.S. to manage a large Oracle database, you just need someone with good technical chops, the willingness and time to experiement, and the ability to learn. But that seems like the criteria for any profession.

    --
    -Stu
    1. Re:eh? by ajs · · Score: 1

      Stu, you've clearly had different experiences from me. I've fought with Oracle and watched others do the same many, many times. If you've come away unscathed, then more power to you (drop me a line if you're interested in interviewing), but I've just seen so many horror stories now that I'm tired of trying. Every time I deploy any other database it's a 5 minute install and little or no tweeking. Every time I or a DBA installs Oracle it's a 1-2 day install that requires care and feeding for days or weeks before it's in a state that can be left unattended.

      Mind you, claims that more extensive reading of documentation would help seem a tad off the mark, since other databases get the same cursory treatment that 99% of system tools get. Your database simply shouldn't be a major learning experience, any more than your filesystem is. You should be able to set it up with a few simple commands that are packaged appropriately for the operating system and applications that need it should then be able to use it.

  39. Your grammar sucks can't even get the tenses right by gd23ka · · Score: 1

    You said:

    What VMS had was RMS, which gave you everything from unstructured files all the way through to b-tree based ISAM.

    When you should have said:

    What VMS has is RMS, which gives you everything from unstructured files all the way through to the b-tree based ISAM.

    Of course, OpenVMS is dying. When it goes, we will bury it right next to BSD.

  40. Re:Your grammar sucks can't even get the tenses ri by hughk · · Score: 1
    OpenVMS may not be totally dead but it definitely is dying. This is because of the vast effort by first Compaq, then later HP and Intel to bury it. I have to refer to it in the past tense even there are still a precious few installations around.

    It isn't a BSD that can linger for ever. OpenVMS isn't open source so in the end it depends on the OS vendor's interest for development. The DoD contracts force a kind of lingering support but it is cursed with running on the EOLed Alpha and the turkey referred to as Itanium.

    --
    See my journal, I write things there
  41. Not wrong, somewhat imprecise (was Re:Wrong) by Eivind+Eklund · · Score: 1
    That NULL represent "unknown" is correct in theory; alas, NULL is FALSE in most cases when you end up in an boolean logic context in practice.

    Example: With items containing a bunch of items, both SELECT * FROM items WHERE NULL > 0; and SELECT * FROM items WHERE NULL The alternative would be to make the query fail the moment a NULL was encountered in such contexts, which makes for a lot of other problems.

    This tristate problem also shows up on the level of query optimizing/handling. Without NULLs, a statement can be represented by a logic tree (a NAND tree), in a canonical/normalized form, allowing comparisons of different queries, a bucnh more optimization of the code due to this, and easier understanding of what's going on.

    Eivind.

    --
    Doubting the existence of evolution is like doubting the existence of China: It just shows that you're uninformed.
    1. Re:Not wrong, somewhat imprecise (was Re:Wrong) by einhverfr · · Score: 1

      Example: With items containing a bunch of items, both SELECT * FROM items WHERE NULL > 0; and SELECT * FROM items WHERE NULL

      I think you are missing it. NULL > 0 evaluates to NULL. Null returns on where conditions are not included in the record set because the where condition requires a definite true and NULL is not a definite true (it is not a definite false either).

      So if I have a boolean column and I say:

      select * from mytable where myboolcol IS FALSE, null entries in myboolcol are not included in the result set either.

      So NULL is not true. It is not false. We don;t know what it is, so in where conditions we don;t include it.

      Another example.

      SELECT first_name || ' ' || last_name as name from customers;

      will return a NULL name where either the first or last name is NULL.

      Hope this helps.

      --

      LedgerSMB: Open source Accounting/ERP
    2. Re:Not wrong, somewhat imprecise (was Re:Wrong) by Eivind+Eklund · · Score: 1
      I know exactly how NULL works. I just consider the introduction of tristate logic to be a bad idea. In my opinion, the absence of knowledge can better be represented by the absence of a row than by a field containing NULL, as both computers and people are better at handling binary logic.

      Eivind.

      --
      Doubting the existence of evolution is like doubting the existence of China: It just shows that you're uninformed.
  42. Hugh Darwin's idea by jbolden · · Score: 1

    Can you explain why you should do this? I think its frequently the case that M keys have N distinct types of data which they define. Lets pick an example:
    Key = Social Security Number
    Values = First Name, Middle Name, Last Name, Date of Birth, Fathers SS#, Mother's SS#

    What's the advantage in splitting this up?

    I could understand if you are talking associative databases not relational but then that's a different data model.

  43. Hugh Darwin's idea by jbolden · · Score: 1

    The argument is, perhaps that field should be normalized into a separate table of visit dates which does not contain an entry for me.

    I wouldn't go that far. "Last gynocologist visit" has a 1->{0,1} relationship with you (say your social). As such it should be a value in a table with that key. "All gynocologist visits" should be a seperate table.

  44. I might add by einhverfr · · Score: 1

    I was going to add that if you are finding that you are using NULL's to mean "not applicable" rather than "unknown" then it is a good sign that your data model is not atomic, and that certain optional information could be broken down into other relational entities.

    NULL's have uses, but they should be sparingly used. I am convinced that they always introduce ambiguity into certain query types, such as outer joins.

    --

    LedgerSMB: Open source Accounting/ERP
    1. Re:I might add by Master+of+Transhuman · · Score: 1


      I agree with both points of your addendum. "Not applicable" is the wrong way to think about a NULL by definition. And they do introduce ambiguity.

      I'm not sure about your other points, but that's because I'm not yet familiar enough with the relational model. I don't argue for any particular implemention for that reason, yet.

      I think the classic case of a missing zip code is fairly easy to examine. Obviously the issue is context - when the attribute is empty because the data has not been entered, it is both "unavailable" AND "unknown". Obviously any query against that attribute must ignore that tuple because it is not known whether it applies to the query proposition. This is the key point: a query is an attempt to state a logical proposition about the data set.

      Marking it only as "unavailable" doesn't solve the problem; marking it as "unknown" does but still renders the tuple useless for queries related to that attribute. Putting an empty string in there may or not be interpretable as either "unavailable" or "unknown" - that depends on the DBMS implementation and the business rules. The only real solution is either to fill in the missing attribute - or ignore that attribute - which is a business rule decision that is not relevant to the relational model per se.

      What the relational model says is that you simply can't use a tuple marked as "unavailable" or "unknown" in a query that is stating a proposition about the set - unless that proposition is the simple one "is this attribute unknown?" NULLS are only useful to tell you what tuples belong to the set of unknowns - they can't tell you anything useful about any other query proposition.

      The question I guess devolves to whether we need to mark a tuple with an unknown attribute as "unknown" and keep it in the set - which in some sense violates the intention of the database to be a collection of facts about entities and relationships - or move that tuple out of the database altogether until it DOES represent a fact about the entities and relationships.

      In the old days of flat files, you had batch updates. Each batch update was preceded by a vetting run that dumped out records with missing or incorrect data. In that sense, the only thing that was (supposed) to make it into the end result files was good data. No "unknowns" were allowed (in a good system design) because it was too hard to get rid of them later. If something was "unknown", somebody was assigned to MAKE it "known." Now, with databases and direct data acquisition, data gets dumped into the database without adequate vetting and an overreliance on NULLS to solve the issue. It may be time to rethink that approach, particularly since it goes against the relational model concept that a database is a CONSISTENT set of verifiable logical propositions about entities and relationships (not necessarily true propositions in reality - but preferably so, otherwise why have the database at all?) NULLs violate that consistency requirement big time.

      The business rule answer should be: if you don't know a fact about a required attribute, that tuple should not be in the set at all until you do know. And if the attribute is NOT required, it shouldn't be in the model at all. A zip code, for instance, should be REQUIRED - if you're bothering with an address at all. If you don't know it, you'd best find it out rather than dumping it into the database and trying to resolve it later by trying to select between tuples about which you know this attribute and ones where you don't.

      This approach completely avoids the issue. Granted, it means strict application of business rules - and this is the rub, since nobody wants to do that. People want to have their cake and eat it, too - dump everything into the database, including unknown things, and then have the ability to extract correct and useful logical propositions about the entities and relationships involved.

      Doesn't work. Typical human behavior - do the wrong thing, and hope for the right results.

      --
      Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
  45. One more point by einhverfr · · Score: 1

    I think Date's view of the absence of join values is summed up in the quote he uses when discussing 6NF, from Wittgenstein: "Whereof one cannot speak, thereon one must be silent." As Date puts it, "If you use nulls, you're effectively making the database state explicitly that there's something you don't know. But if you don't know something, it's much better to say nothing." So instead of showing a value with null, you don't show the value at all. As you say, it's effectively the same as using a NULL - but Date argues it's better since it doesn't introduce the logical ambiguities of NULLs in other areas.

    Maybe.... But I don't really see the *semantic* difference between the sloppy use of a NULL in DB design and the semantic ambiguities that occur from the lack of enforcement in the event where optional fields are split off without proper enforcement of business rules. And I don't see a semantic difference between the inclusion of a NULL in a field and a NULL that comes from a missing record in an outer join. The end result is the same semantically, and I would argue that, provided that you have split off MVD's properly, that the difference has very little to do with anything other than preference regarding semantic structure.

    I.e. from a semantic viewpoint, whether the NULL is in the table or a product of a join, the result is semantically identical and I would argue that one can see them as mathematically equivalent. So I see the problem as one of semantic clarity rather than anything else.

    --

    LedgerSMB: Open source Accounting/ERP
    1. Re:One more point by Master+of+Transhuman · · Score: 1


      There may be no semantic difference between a missing value in a join and a NULL in the general case we're discussing, but it's possible that Date is referring to the overall issues of missing values AND the effect of NULLs in producing three-valued logic - which he demonstrates, in one case in the book, must necessarily produce an erroneous result - which presumably an entirely missing tuple wouldn't produce in the same situation.

      In other words, if a thing isn't there, it isn't there. Whereas if "something" (a NULL) IS there, what exactly does it mean? And even if you know it means "unknown" (or "unavailable" which is basically the same thing in this case), in SOME queries, this might produce an erroneous result, according to Date's specific (if, as he admits, contrived) example.

      So if you allow NULLs, you end up with a situation where the semantic result in MOST cases is identical to a missing tuple, but in SOME types of queries, you get erroneous results that a missing tuple method of handling missing data wouldn't provide. IF that's true, then it's a good argument for not using NULLs.

      I haven't gone through that chapter in detail yet, so I can't say if that's the main issue or if that is what he's saying. Date left out a lot of stuff in discussing this due to space constraints in this small book.

      He said that ONE argument against nulls was the erroneous result problem, and THEN in relation to using NULLs to handle missing data, he refers the reader to chapter seven, where he discusses the empty tuple and refers to another example illustrating its use. So I assume the two are related, but perhaps not directly. He also indicates there are other reasons he doesn't go into about both NULLs and missing data. Soo I'll need to look elsewhere for a more definitive discussion.

      --
      Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
    2. Re:One more point by einhverfr · · Score: 1

      This is a very fascinating discussion. Thank you very much for the though-provoking element of it. As a result I have certainly changed my viewpoint at least subtly in the course of it. But that is what the best conversations are about.

      In other words, if a thing isn't there, it isn't there. Whereas if "something" (a NULL) IS there, what exactly does it mean? And even if you know it means "unknown" (or "unavailable" which is basically the same thing in this case), in SOME queries, this might produce an erroneous result, according to Date's specific (if, as he admits, contrived) example.

      I am not sure I agree with this logic. Lets take an example that shows a way of looking at this where they are identical. :-)

      Lets say I create the following sets of tables and a view:

      CREATE TABLE address_street (
      address_id INT,
      street_addr VARCHAR()
      );

      CREATE TABLE address_city (
      address_id INT,
      city VARCHAR()
      );

      CREATE TABLE address_state (
      address_id INT,
      state VARCHAR(2)
      );

      CREATE TABLE address_zip (
      address_id INT,
      zip VARCHAR(10)
      );

      Ok. Now I have a relational model that doesn't need NULL's for storage :-)

      So what if I need to manage addresses for another application. Why not create a view?

      CREATE VIEW address AS
      SELECT coalesce(s.address_id, c.address_id, st.address_id, z.address_id), s.address_street, c.city, st.state, z.zip
      FROM address_street s FULL OUTER JOIN address_city c USING (address_id)
      FULL OUTER JOIN address_state ON (COALESCE(c.address_id, a.address_id) = st.address_id)
      FULL OUTER JOIN address_zip ON (COALESCE(a.address_id, c.address_id, st.address_id = z.address_id);

      Now, suppose in PostgreSQL I make rules for adding/updating/deleting entries.

      Now I have my table stored such that NULL's are not stored if someone doesn't add the zip code, but the outer join still presents them as if they were in the same table. What I am unclear about is what the difference is between simply having a NULL stored in a table and a NULL being the result of an outer join between single columns. Semantically they are the same, even if they are stored differently. The only difference here is that a NULL here is used in the strict mathematics sense (i.e. empty set). But on a higher level, I cannot think of any ambiguities that can occur in one place that cannot occur in the other, especially if you think of multicolumn tables as being a shorthand for the above model.

      The only examples I have seen Date mention regarding missing data w/o using nulls have suffered from a very subtle problem: the example and the solutions are semantically different in how the information is retrieved despite the fact that the storage is semantically equivalent (you can convert one representation into the other and vice versa using views).

      Here is my point and my definition of semantic equivalence:

      Two schemas can be considered to be semantically equivalent if, for any set of data, either schema can be presented as the other using views with no loss of information or ambiguity regarding inserts, updates, or deletes.

      Anyway, many thanks for the food for thought.

      --

      LedgerSMB: Open source Accounting/ERP
    3. Re:One more point by Master+of+Transhuman · · Score: 1


      It has been a good discussion.

      I think the only place I can refer you to with regard to Date's examples is the one in his "Database in Depth", Chapter Three, where he provides the following example of an erroneous query as a result of NULLs:

      You have a database consisting of two relations:

      Relation S: Supplier Number: S1 City: London
      Relation P: Part Number: P1 City: NULL

      Now you do the query as follows:

      SELECT S.SNO, P.PNO
      FROM S, P
      WHERE (S.CITY P.CITY) OR
                  (P.CITY 'Paris')

      For the only data in the database, the SQL result is: UNKNOWN OR UNKNOWN, which reduces to UNKNOWN, so nothing is retrieved.

      But part P1 DOES have a corresponding city - xyz - in the REAL world, and it is either Paris or it isn't. If it is, the WHERE clause evaluates to 'London' 'Paris' OR 'Paris' 'Paris' - since the first term is TRUE, the expression would evaluate as TRUE.

      If the P1 city is not Paris, then the expression would evaluate to 'London' xyz OR xyz 'Paris' - the second term is true, so the expression would evaluate as TRUE.

      Thus, the boolean expression is ALWAYS TRUE in the REAL world, and the query should return the S1-P1 pair, REGARDLESS of what REAL value the null stands for (or is missing). The result - nothing returned - that is correct according to 3VL is different from the result that's correct - everything returned - in the real world.

      He follows this up with the (more contrived but more obvious) example:

      SELECT P.NO
      FROM P
      WHERE P.CITY = P.CITY.

      The REAL world answer is the set of part numbers in the database. Using NULLs, however, you get nothing again.

      His point apparently is that it doesn't matter what NULL is being used for - the query should respond as it is in the real world.

      His conclusion is that you can't trust a database with NULLs because you can't know what answers to what queries are correct in the real world.

      Using nonexistent tuples to represent the above missing data, you end up with the same result - nothing is returned, because Part P1 has no city and thus isn't in the database at all. I don't know the answer to that except the one that Darwen proposes - Part P1 is in a Relation U with the header "City Unknown", not in Relation P at all. In that case, the only way to get the real world result Date wants is to have a join with Relation U which always returns TRUE and returns a value of "Unknown" - which means the city DOES exist but is unknown (if a city didn't exist, there would no tuple in Relation P OR Relation U). And the value returned for that city is precisely the string "Unknown". That at least is a real value that can be represented in output.

      It's a mess, all right. However, the important point is that Date believes that it is important to get the relational model correct, and THEN worry about the implementation. And he believes that the implementation could be much more sophisticated than present DBMSs are capable of - the prime example being the TransRelational Model implementation by Required Technologies. So the issues of performance required by having all these extra relations for "unknown" and the like would be minimized by a better physical model.

      I don't know. Until a better DBMS is built, it's all irrelevant, I guess, since we're stuck with NULLs. All we can do is minimize their use and hope we don't have to use them too much to minimize performance issues with strict normalization.

      As usual, the IT situation is depressing at best.

      Thanks for the conversation, it has helped me at least consider the issues more clearly.

      --
      Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
    4. Re:One more point by einhverfr · · Score: 1

      Well, the fundamental problem is that NULL's whether the result of outer joins or stored in tables present some semantic ambiguity. The RDBMS is not in a position to determine what the real meaning is in either query, and is certainly not in a position to go so far as to say that "in the real world every part has a city." So because of this ambiguity, the results of Date's examples are correct and follow the rule of not speaking to that one cannot speak authoritatively of.

      The RDBMS can't make these assumptions about the data model unless we come up with at least two different types of NULL's (one to represent no value and one to represent unknown). So it is up to the DBA and the programmer to use NULL's both sparingly and consistantly so that one *can* extract the data one wants.

      --

      LedgerSMB: Open source Accounting/ERP
  46. Footnote by einhverfr · · Score: 1


    The RDBMS can't make these assumptions about the data model unless we come up with at least two different types of NULL's (one to represent no value and one to represent unknown). So it is up to the DBA and the programmer to use NULL's both sparingly and consistantly so that one *can* extract the data one wants.


    Note that the same ambiguity appears when using outer joins even if NULL's in data models clear cut enough. The basic problem here is that you don't know if a lack of a record means that there isn't one in the real world or that there is and you don't know about it.

    --

    LedgerSMB: Open source Accounting/ERP