Slashdot Mirror


SQL, XML, and the Relational Database Model

Kardamon writes "In an article on DBAzine, Fabian Pascal writes that SQL is not a good representation of the relational model, and is afraid the situation will get worse with XML and XQUERY. An overview of some of the reactions on the positions Pascal and also C.J. Date take on this issue is given in this article over at SearchDatabase.com by Sara Cushman."

453 comments

  1. Can you by cbrocious · · Score: 1

    convert SQL to XQuery and others, though? If you could, it'd make porting your applications easier... Wonder how long it'll be until something out there does it.

    --
    Disconnect and self-destruct, one bullet at a time.
  2. Isn't XML semi-object oriented? by Neil+Blender · · Score: 2, Insightful

    Of course XML is going to be hard to represent in a relational database. Unless your tables are ( id, object text) and you pull out your XML and parse it.

    1. Re:Isn't XML semi-object oriented? by Bitsy+Boffin · · Score: 4, Informative

      No.

      XML is a file format, it has nothing to do with objects, no more than HTML does, which is not at all.

      However to counter your claim that XML is hard to represent in a relational database. Uhm. No, it's not.

      XML consists (simplifying) of elements and attributes, elements may be nested.

      A generic mapping to a relational database is that elements correspond to the entity tables, attributes correspond to columns in those tables, and the nesting of elements is modelled as a foreign key in the child entity records.

      Whats so hard?

      --
      NZ Electronics Enthusiasts: Check out my Trade Me Listings
    2. Re:Isn't XML semi-object oriented? by AKAImBatman · · Score: 3, Informative

      Whats so hard?

      The "arbitrary XML" part. You must have existing mappings set up to process the XML. New forms of XML thus require a great deal of work on the part of the DB developer.

      XML databases such as Xindice OTOH, allow you to create a table and insert XML in whatever format you chose. XPATH queries take a bit of getting used to, but you can query on tags, attributes, CDATA, or whatever else you chose at whatever level in the XML hierarchy you choose. Thus I can query for the list of addresses for all records that have a firstname attribute that is LIKE "Bob". Or I can dive down to the individual address level and query for all records that have an address of "Drury Lane" (important for tracking down the Muffin Man when you need a giant gingerbread cookie).

      It's not like you can't do this stuff with SQL databases, it's simply a different method of accomplishing the task. Depending on the data you're working with, an XML database may very well be a more efficient method of storage and queries.

    3. Re:Isn't XML semi-object oriented? by Black+Perl · · Score: 2, Informative

      Of course XML is going to be hard to represent in a relational database. Unless your tables are ( id, object text) and you pull out your XML and parse it.

      This is completely false. If you had RTFA, it is mentioned that the relational model can represent hierarchies (and thus XML) just fine. It is SQL that is deficient for this purpose.

      Also, it makes no sense to call XML "object-oriented," which is a programming language term[semi-OO? LOL]. XML is a syntactic hierarchy that can be used to represent "objects", just as it can be used to represent other types of data.

      Last, just about every major database now supports XML as a native datatype, meaning you don't have to pull out XML documents and parse them.

      --
      bp
    4. Re:Isn't XML semi-object oriented? by Doctor+Faustus · · Score: 4, Interesting

      Of course XML is going to be hard to represent in a relational database.

      Generic XML, sure, but you can always layout your XML in a relational style, like this:

      <root>
      <Table1>
      <Table1Row Table1RowID="1"/>
      <Table1Row Table1RowID="2"/>
      </Table1>
      <Table2>
      <Table2Row Table2RowID="1" Table1RowID="1"/>
      <Table2Row Table2RowID="2" Table1RowID="1"/>
      </Table1>
      </root>

      Join support would be nice for simple jobs, but this works really well for more complicated jobs in XSLT. You can use a for-each on "/root/Table2/Table2Row", calling a template and drill down to "/root/Table1/Table1Row[@Table1RowID='$Table1ID'] within the template. This lets you use whatever hierarchy you want, rather than being stuck with the one hierarchy the original designer chose. Just like real relational databases.

    5. Re:Isn't XML semi-object oriented? by TheRealMindChild · · Score: 1

      Im not going to go too deep into why your statement is completely sillyness... but I WILL point out, in regards to objects, us in the modern day have a field type cald BLOB (Binary large object)

      --

      "When life gives you lemons, don't make lemonade. Make life take the lemons back!" -- Cave Johnson
    6. Re:Isn't XML semi-object oriented? by MojoRilla · · Score: 4, Insightful

      Try to model anything moderately complex with XML, where things have many to many relationships with each-other. Nesting becomes impossible.

    7. Re:Isn't XML semi-object oriented? by Tassach · · Score: 2, Informative
      but I WILL point out, in regards to objects, us in the modern day have a field type cald BLOB (Binary large object)/blockquote And I will point out that BLOBs are, for all practical purposes, unusable from within the database. While you can store and retrieve BLOBs from SQL, that's about all you can do with them -- anything more elaborate has to be done outside of the database.
      --
      Why is it that the proponents of "one nation under God" are so eager to get rid of "liberty and justice for all"?
    8. Re:Isn't XML semi-object oriented? by tcopeland · · Score: 2, Interesting
      > XPATH queries take a bit of getting
      > used to, but you can query on

      Right on. Another nice thing about XPath is that it can be mapped onto other hierarchical structures. For example, the Java static analysis utility PMD uses XPath to query Java source code for problems. This XPath query checks for empty if statements:
      //IfStatement/Statement/Block[count(*) = 0]
      Good stuff; more XPath rules are here. Props to the Jaxen and SaxPath guys for their fine work!
    9. Re:Isn't XML semi-object oriented? by yintercept · · Score: 5, Insightful
      One does not know whether to laugh or cry. It has been quite obvious that the designers of SQL had little understanding of data fundamentals in general, and the relational model in particular

      This quote needs to be placed toward the beginning of the Grand Encyclopedia of Intellectual Arrogance. Let's see, you have flat tables with a defined primary key and you form relations between these flat tables.

      I do agree that SQL is not the best possible query language, but it succeeds where the other languages fail, it is easy for people to grasp and manipulate. Likewise, HTML has many faults. Plain HTML is still the preferred choice of most web designers because it is easy to learn and write.

      Personally, I think the primary intellectual impulse is to add convolution to simple processes. There will never be an end to the stream of blither about how nulls cannot exist, and anyone who simply uses an sequence counter as a primary key is the devil incarnate. HTML and SQL have two things that almost all the stuff coming from arrogant snits like this author lack. They were designed by people who were actually doing stuff.

      This quote needs a position in the library of intellectual arrogance as well:

      Indeed, data/information management requires "some organizing principle"; that is, structure; anything "unstructured" -- and many in the industry promote XML for that purpose -- is not data, but meaningless random noise that carries no information.

      A snit crassly dismisses several millenia of literature because it is unstructured.

      Quite frankly, meaning and structure are independent of each other. It is possible to find meaning in things with radically different structures. It is true that there is a correlation between structure and the ability to communicate meaning, but a healthy mind can find meanings in things that have not been normalized.

      Likewise, you can have meaningless garbage in relational databases. A case in point is the large number of fake web sites that do things like join the FIPS database to product names so that they can have millions of pages that show up in search engines. Likewise, we see academician filling volume after volume of publications with meaningless tripe.

    10. Re:Isn't XML semi-object oriented? by pacc · · Score: 2, Interesting

      Sure, this is a nice syntax to describe one-off problem. Though for a database you'd hope that it is done once to transform it to something useful instead of 100 times per second.



      //*[@color = "Red"] is taken as a good example in your sense but is actually the worst case in the article. A database would hopefully aleady have track of "red things" and shouldn't have to traverse a tree full of other useless data. Multiple trees spanning the same data is only a contradiction in XML.



    11. Re:Isn't XML semi-object oriented? by RatRagout · · Score: 1

      Also I'm having trouble understanding how XML can ever compete with SQL when doing large queries. I mean, parsing a file...hello... In what in this thread is referred to as SQL DBMS, the records have fixed length fields to make indexing possible. I guess you could index an XML, but all index addresses would be wrong the minute you change a value in the first record. -- Please write your signature using italic so my brain can separate it from the rest

    12. Re:Isn't XML semi-object oriented? by mcc · · Score: 2, Interesting

      The "arbitrary XML" part. You must have existing mappings set up to process the XML. New forms of XML thus require a great deal of work on the part of the DB developer

      Um... I haven't read this article yet, but to respond to your comment.. um, no, not really. Here's a trivial possible setup to describe an arbitrary xml tree:

      Table tagname: int tagname_id, varchar name
      Table attrname: int attrname_id, varchar name
      Table tag: int tag_id, tagname_id tagname, tag_id parent
      Table attr: attrname_id attrname, varchar value, tag_id parent

      What's so hard about that? I mean, given, you'd have to do a little bit more work to describe non-tag data or attributes with types other than string. You'd need an "order" column on the tag table if order of xml elements is significant. And it would kind of suck to get too much data out of this with SQL, you'd need either a rather large join or a big flurry of small sql selects (I assume that's what the article has to do with). But the relational model holds up just fine here. I mean, really, XML is just tagged heirarchal data, nothing that special...

    13. Re:Isn't XML semi-object oriented? by halaloszto · · Score: 2, Insightful

      Would you please enlighten me on how an XSLT job operating on a structure like above and showing all the employees who have a salary over 100k, having more than 20 directs and have travelled in the past 12 months look like? Also please include their manages names and phone numbers int he result. And what performance it would have? me

    14. Re:Isn't XML semi-object oriented? by AKAImBatman · · Score: 2, Interesting

      A database would hopefully aleady have track of "red things" and shouldn't have to traverse a tree full of other useless data.

      In fact, that's exactly what a good XML database will do. By indexing all the accessible fields, it's able to provide fast access to database wide queries like the one you suggested. Of course, these indexes do come at a cost of disk space and memory, so not everything is quite rosy in Denmark. :-( The upshot is that with processing and storage power at all time highs many (most?) databases have no need to concern themselves with running out of storage resources. More than enough resources for a multi-gigabyte database exist on even the smallest modern PC.

    15. Re:Isn't XML semi-object oriented? by kpharmer · · Score: 1

      or just write a recursive query.

      For example the following query will recursively drill-down through parent-child relationships within a single table (col1 is pkey & col2 is ckey where either column can have a value linked to the other column in another row).

      WITH parent (pkey, ckey) AS
      (SELECT pkey, ckey
      FROM hierarchy
      WHERE pkey = 'AAA'
      UNION ALL
      SELECT c.pkey, c.ckey
      FROM hierarchy c,
      parent p
      WHERE p.ckey = c.pkey )
      SELECT pkey,
      ckey
      FROM parent ;

      This isn't the simplest syntax in the world - but isn't too horrid either. It's basically defining a parent row then continually issuing union statements to get the attached children. And it can be fast.

      Anyhow, the above along with tons of examples & discussion was taken from the free db2 cookbook page 256 by Graeme Birchall:
      http://ourworld.compuserve.com/homepage s/graeme_bi rchall/HTM_COOK.HTM

    16. Re:Isn't XML semi-object oriented? by AKAImBatman · · Score: 1

      In what in this thread is referred to as SQL DBMS, the records have fixed length fields to make indexing possible.

      This seems like an odd statement. As far as I'm aware, there are absolutely no requirements on the length of indexed fields. If there was, fields like VARCHARs couldn't be indexed. Many databases may optimize indexes based on size, but the real reason for maximum field length comes from the days of ISAM files. (Indexed flat files with fixed length records.)

      I guess you could index an XML, but all index addresses would be wrong the minute you change a value in the first record.

      XML databases don't actually work that way. It may seem "obvious" to create a database out of a massive XML file, but most implementations such as Apache Xindice actually deal with small XML fragments that hang off a larger table structure. Each "table" holds multiple fragments as records. If a record changes, the database reindexes the record in a similar fashion to SQL databases.

      The primary differences between XML DBs and SQL DBs are that tables can be hierarchical, and the data can be hierarchical. A lot of details (such as the lack of NULL) also differ, but those are the main points.

    17. Re:Isn't XML semi-object oriented? by RatRagout · · Score: 1

      The above post was meant as a general wondering, not a bashing of XML-databases. Thank you for your clearification. As far as I'm aware, there are absolutely no requirements on the length of indexed fields
      I guess using the term indexing instead of access was kinda bad here. The reasion why i say fixed length is a good thing is because jumping from record to record is a lot easier when every record has the same length. (To jump to record 100, jump to address if start + 100 times length of a record. (This is overly simplified. When records are deleted and moved things get more complicated)). For fields of varying length an SQL DBMS can divide the record into say two blocks and put a pointer at the end of the first part. Still, changing one record won't affect the position of the other records, like it would do in an XML file. But since "Large XML-file" does not apply, this is obviously not an issue.

    18. Re:Isn't XML semi-object oriented? by AKAImBatman · · Score: 3, Insightful

      Ok, so you break down the XML by its structure. Now how do you do a valid query on it? Since you've completely dereferenced its structure and stuck it in a relational model, you've cut yourself off from the prospect of doing XPATH type queries. Instead, you'll need to make multiple (perhaps hundreds?) of passes at the table to reconstruct its data structure. XMLDBs don't have this problem. They deal with the XML in its natural form and are thus able to index, order, and query in that fashion.

      As I said before, you can do many of the same things with an SQL database as you can with an XML database. That's not the point. The point is working with the data in a form that is natural to it and will provide the best results.

    19. Re:Isn't XML semi-object oriented? by AKAImBatman · · Score: 1

      I think it's important to understand that flat data models have gone the way of the dinosaur. While they were very fast because of their easy record indexing, they have been replaced by more advanced (albeit more memory hungry) data structures. Had mainframe designers been given a choice, I can almost guarantee that they would not have used the fixed record model. Sadly, they had only a few kilobytes of memory to work with and had to stretch it as far as possible.

    20. Re:Isn't XML semi-object oriented? by Anonymous Coward · · Score: 0

      No, you'd just need to use a few joins, a CONNECT BY, and maybe a stored procedure. Why would you take hundreds of passes?

    21. Re:Isn't XML semi-object oriented? by AKAImBatman · · Score: 1

      Why would you take hundreds of passes?

      Because it's hierarchical data that is of unknown complexity. At the top of the hierarchy you have the root tag, which splits out into its attributes, sub-tags and CDATA. Each sub-tag again has the same split. Thus you have an exponential increase in complexity for every level the XML wraps.

      No, you'd just need to use a few joins, a CONNECT BY

      CONNECT BY is a new Oracle 10g feature to help support this sort of SQL DB nonsense. While it helps, the database really still has to make all the of hierarchical passes that you previously would have made manually.

      and maybe a stored procedure.

      If it involves a stored procedure, then you've already suggested that the query is too complex to be handled in a single SQL pass.

      Things can get particularly dicey if you need all the data returned. For example, if I have a customer, I may wish to display his addresses and phone numbers on a page. In an XML database, this is one query, one transaction, and (when the data is changed) one update. For an SQL database, it's several queries, multiple result sets, and tons of updates. If you're serializing from XML, it even means that you'll need to delete the entire dataset out before performing the update. That kind of stings.

    22. Re:Isn't XML semi-object oriented? by DnsZero · · Score: 0

      Hell yeah, baby! Just what he said!

      Who's your Daddy?
      Who's your Daddy!

      Note: This is not flamebait, I just get very excited about SQL. MySQL in particular. I mostly like to read the manual. really. BTW, did anyone else notice the new format and location for the MySQL Docs? I really liked the old format better, although, the completeness of the new edition is cause for much rejoicing.

      Pity that they took the Documentation link off their frontpage. I was going to save a spot on my Grand World Council for the MySQL guys. Perhaps I'll give them another chance... That is, IF they get stored procedures done before I take over the world. Muhahaha!!!

    23. Re:Isn't XML semi-object oriented? by shelterit · · Score: 2, Interesting

      Actually, many such things exist, RDF and Topic Maps XTM (http://shelter.nu/art-007.html) being two popular ones.

      --
      -- Home, James - it doesn't matter where that thing has b
    24. Re:Isn't XML semi-object oriented? by Floofnargle · · Score: 1
      What is more arrogant? To make statements based on long and deep familiarity with a given subject, or to dismiss those statements because one doesn't understand them? The latter implies that, you are SO smart, that if you can't immediately grasp the basis of their complaint, it is groundless.
      Let's see, you have flat tables with a defined primary key and you form relations between these flat tables.
      No, you don't! This is not a statement that would be made by a person who understands the relational model. Tables aren't flat. Relations aren't between tables. You probably do have some familiarity with products that call themselves "relational" but they aren't true to the model, and that is probably the source of your misunderstanding, and that is Date's gripe.
      A snit crassly dismisses several millenia of literature because it is unstructured.
      No he didn't. Literature has structure. Chapters, paragraphs, sentences, words. Themes, story arcs, scenes, settings. Etc, etc. Can you honestly claim that XML is "unstructured" information? Does it not have organizing priciples? Date is saying that to claim that XML is useful for "unstructured" information is nonsense, because there is no such thing.
    25. Re:Isn't XML semi-object oriented? by Doctor+Faustus · · Score: 1

      That's not much help if you're supposed to be working with XML.

      I looked up the WITH keyword in my MS SQL docs, and it appears to be part of MDX (for OLAP cubes, I think), rather than SQL.

    26. Re:Isn't XML semi-object oriented? by ckaminski · · Score: 1

      I have to disagree here. If you've done your data model properly, and have your relationships set up right, retrieving nearly everything about your customer can be done in one query.

      The fact that a SQL resultset might end up consuming MORE memory by far by redundant data in many columns than an XML resultset, that I'll give you.

    27. Re:Isn't XML semi-object oriented? by ckaminski · · Score: 1

      Not really. You could, in theory, implement an XML parser in nothing by SQL-92 stored procedures. It'd be nasty, evil-looking code, and probably slower than a 3-toed-sloth with broken toes, but it could be done.

      <obl. Microsoft bash>
      Does anyone know if SQL Server's XML support is written this way?
      </bash>

    28. Re:Isn't XML semi-object oriented? by AKAImBatman · · Score: 1


      The fact that a SQL resultset might end up consuming MORE memory by far by redundant data in many columns than an XML resultset, that I'll give you.


      As I said, sometimes an XML database is the type of thing that really makes sense, sometimes it isn't. :-)

      I myself got into Xindice and XPath when I started working on a new kind of "unhackable" OS Shell. This shell would run all the servers and handle all SSH user interaction. Where it got really cool was that I had this XML configuration file for each server. The server could store its own style of configuration data in one section, and the security settings would be stored in another section. (I actually made it impossible for a server to access folders and network connections it wasn't configured for.) Anyway, I needed a secure way to store this XML data, so I looked into Xindice. I fell in love with it immediately and decided to use the database system wide. (Including tracking of users, groups and permissions.) Really a great piece of technology there. *sigh* I should really get back to that project one of these days. :-)

    29. Re:Isn't XML semi-object oriented? by dcam · · Score: 0, Troll

      I'm confused. I heard the words "XML" and "database" in the one sentence. Then I read "good" and realised you don't know what you are talking about.

      --
      meh
    30. Re:Isn't XML semi-object oriented? by Doctor+Faustus · · Score: 4, Informative

      Would you please enlighten me on how an XSLT job operating on a structure like above and showing all the employees who have a salary over 100k, having more than 20 directs and have travelled in the past 12 months look like? Also please include their manages names and phone numbers int he result. And what performance it would have?

      It sounds like you're kidding (or being sarcastic), and I'm not going to debug it (I'd need data, anyway.), but probably something like this:

      <xsl:template match="/root/Employees/Employee[@Salary&gt;100000] " >
      <xsl:variable name="EmployeeID" select="@EmployeeID" />
      <xsl:variable name="NumReports" select="count(/root/Employees/Employee[@ManagerID = $EmployeeID])" />

      <!--
      XSLT sucks with dates, no matter what your data is arranged like. Storing dates as YYYYMMDD does
      at least allow for comparisons. Also, there is no facility that I know of to retrieve the current
      date, so I'm going to hard-code that.
      -->
      <xsl:variable name="RecentTravel" select="boolean(/root/Trips/Trip[@EmployeeID = $EmployeeID and @Date&gt;=20030628 ])" />

      <xsl:if test="$RecentTravel and ($NumReports&gt;20)" >
      <xsl:variable name="ManagerID" select="@ManagerID" />
      <xsl:text>
      Name: <xsl:value-of select="@Name" />, Phone #<xsl:value-of select="@PhoneNumber" />,
      Manager Name: <xsl:value-of select="/root/Employees/Employee[@EmployeeID = $ManagerID]/@Name" />
      </xsl:text>
      </xsl:if>

      </xsl:template>

      And this approach seems to perform just fine in MSXSL, which I believe is DOM-based. It might give a SAX-based engine problems, because it jumps around so much.

      Lest I give a false impression, I'm not suggesting that this sort of XML replace relational databases. The point is that a roughly relational layout is still a good approach, even when you need to be working in XML.

    31. Re:Isn't XML semi-object oriented? by AKAImBatman · · Score: 1

      I'm confused. I heard the words "XML" and "database" in the one sentence. Then I read "good" and realised you don't know what you are talking about.

      Trust me, sir. I'm well aware of what I speak. Besides the fact that "good" never appeared in my post (though it was implied, so we'll let it slide), I've spent quite a bit of time with some of the most popular SQL databases (PostGreSQL - good, MySQL - sucks, Oracle - good for BIG stuff, MSSQL - often sucks for anything but "small" dbs, DB2 - just shoot me now, Pervasive sQL - BTrieve sweetness, etc.) and with Apache Xindice. After working with Xindice on a real project, I feel like XPATH is a breath of fresh air. I certainly wouldn't recommend it for everything, but it's certainly nice for some small databases.

      I gave a brief explanation of how I used Xindice here. Just keep in mind that there doesn't always have to be "ONE TRUE SOLUTION" to everything and you'll do just fine. :-)

    32. Re:Isn't XML semi-object oriented? by Tailhook · · Score: 1

      ...CONNECT BY is a new Oracle 10g...

      CONNECT BY goes back to at least 7.1.x. Oracle has handled hierarchical data for a long, long time.

      ...the database really still has to make all the of hierarchical passes...

      Thus index organized tables. The engine picks up all or most of the document in a small number (1?) of physical reads and the rest happens in RAM.

      ...In an XML database, this is one query, one transaction...

      XQuery/XPath/XWhatever is going to end up just another Oracle "value add" to the enterprise edition server. Much of it is already there. Look into it. You may find that you can build your app on the defacto standard DBMS and before Oracle or Microsoft buys whoever you're dealing with now. :)

      --
      Maw! Fire up the karma burner!
    33. Re:Isn't XML semi-object oriented? by dcam · · Score: 1

      The post I replied to said:
      In fact, that's exactly what a good XML database will do.

      I think XML has some benfits when you are talking about data transfer, not in other cases. The load on parsing is too high, you are hibbling the DB before you even begin.

      My beef with XML databases is twofold. One is that I'm not sold on XML as a way of representing data. My second beef is that the way that database stores the data is independant of how that information is returned.

      Getting back to small solutions, the problem with small databases is that they grow. And grow. You know how it is with software. After a while you are in a world of pain with a skyscraper built on a foundation made for a two bedroom house.

      --
      meh
    34. Re:Isn't XML semi-object oriented? by AKAImBatman · · Score: 1

      CONNECT BY goes back to at least 7.1.x. Oracle has handled hierarchical data for a long, long time.

      Odd. Oracle claimed otherwise. Sounds like another one of their marketing tricks.

      You may find that you can build your app on the defacto standard DBMS and before Oracle or Microsoft buys whoever you're dealing with now. :)

      Thankfully, there's no chance of that. ;-)

    35. Re:Isn't XML semi-object oriented? by rycamor · · Score: 2, Interesting

      Thank you... yes. While Fabian Pascal may be arrogant at times, that is just a personal thing. What he says should be evaluated for whether it is true, not whether it is arrogant.

      And yes, the relational model is indeed *much* more than just flat tables with keyed relationships. I thought this upon my first encouters with the relational model, but after actually studying the concepts, I realized that there is in fact something much deeper going on, and that it has a conceptual integrity that I only wish could exist in the most of the rest of computing.

    36. Re:Isn't XML semi-object oriented? by AKAImBatman · · Score: 1

      Believe what you like. I personally have used Xindice and find it very good for some applications. It really is just a different way of storing data, and can actually improve performance in the right situation.

      Most people have been indoctrinated to accept SQL as the only was things can be done. But the truth is that there are dozens of effective methods for storing and retrieving data. Some may even be better than SQL in the right sitation.

    37. Re:Isn't XML semi-object oriented? by dcam · · Score: 1

      I certainly believe that there are other effective methods of storing data, however in most cases XML is not an effective method for storing data.

      I have no problem with storing known small amounts of data in XML. For example Microsoft's use of XML to store metadata in IIS, but IMO this is a rare case.

      I note you have not addressed my point about scalibility.

      --
      meh
    38. Re:Isn't XML semi-object oriented? by bucknuggets · · Score: 1

      > That's not much help if you're supposed to be working with XML.

      It's a simple example of the types of recursion that you can use to traverse hierarchies within sql. So, depending on how XML is implemented within the database, this capability might be used to query xml data.

      > I looked up the WITH keyword in my MS SQL docs, and it appears to be part of MDX (for OLAP cubes,
      > I think), rather than SQL.

      I played with recursion within MS SQL back around 2000-2001. At that time (using SQL Server 2000) their recursion capability was provided via stored procedure source code they provided. Which unfortunately had a few glaring errors in it and basically didn't work. With modifications it could, but was still primitive compared to DB2 & Oracle.

    39. Re:Isn't XML semi-object oriented? by LoztInSpace · · Score: 1

      Point of order - CONNECT BY has been around for ages, possibly since 6 or even 5.
      In the real world it can be very useful, although a bit non-relational in the true sense.

    40. Re:Isn't XML semi-object oriented? by AKAImBatman · · Score: 1

      I note you have not addressed my point about scalibility.

      Theoretically, there's no reason why an XML database shouldn't scale, as long as hierarchical data is what you need. The core theories behind database indexing and queries are the same for XML databases as they are for SQL, ISAM, and Hiearchical. The reality of current XML database solutions is that they're excellent for smaller database solutions (again, if XML is what you need).

      As with myself, I needed a very advanced way of storing configuration and security information. XML provided the solution, and Xindice gave me a database solution. While that sort of use is somewhat left-field, there are a few more traditional uses I could see it being useful for. In a few other posts I've mentioned Customer information. The ability to attach all his addresses, phone numbers, and other personal information in one record is a very nice bonus. Similarly, you might store all the relavent information about a product in its XML tree. Need a new type of product? No need to change the schema, just embed a different information set for that type!

      BTW, I realize that you consider parsing to be a big problem for XML, but consider the amount of CPU and memory that goes into many SQL JOIN operations. Databases often have to literrally perform thousands (sometimes millions) of comparisons just to provide a small record set. The only way they manage this effectively is by keeping all the necessary indexes (and sometimes data) loaded into memory. XML DBs merely embed the information and have the slight overhead of parsing a kilobyte or two. In the end it tends to even out for a simple dataset, and the scales can tip significantly in XML's favor with a large dataset. As a bonus, the parsing can be offloaded to the application server. Depending on your configuration (especially since application servers are easier to cluster than databases), this can improve database performance significantly.

    41. Re:Isn't XML semi-object oriented? by AKAImBatman · · Score: 1
    42. Re:Isn't XML semi-object oriented? by dcam · · Score: 2, Interesting

      Hmmm. You make some very good points. I may have to revise my position on XML databases. Indexing may certainly solve some of parsing problems inherent in XML, although this is likely to have other side effects. The database would need to maintain an internal index on each record/row/whatever, which might well add to the size of the database. I'm just thinking aloud here.

      I also see the benefit of the flexibility of XML. Black boxes objects and all that. For example when you reach a user object, just pass in the contents of the tag to some sort of class factory (or an equivalent mechanism). As more information is added only the user object needs to change.

      Arguably performance problems with XML may drop as more and more people implement systems using XML, something similar happened for RDBMSes, so long as there isn't some feature of XML that is inherent in its design that makes it perform poorly. Your comment on indexing counters what I believed was an XML performance killer.

      Thanks for the comments, this has given me quite a bit to chew over. I think I need to do some further research on this and work it through.

      --
      meh
    43. Re:Isn't XML semi-object oriented? by AKAImBatman · · Score: 1

      Thanks for the comments, this has given me quite a bit to chew over.

      You're welcome. And thank you for the conversation. It was quite stimulating. :-)

    44. Re:Isn't XML semi-object oriented? by LoztInSpace · · Score: 2, Informative

      I think that document refers to new (connect by features) rather than (new connect by) features.
      I've been happily using it on version 8. The connect_by_root & sys_connect or whatever it is may well be introduced in 10g. Certainly I could have killed for the ORDER SIBLINGS BY clause in 8 but it's only available in 9i+.

    45. Re:Isn't XML semi-object oriented? by leomekenkamp · · Score: 2, Insightful
      (...)is not data, but meaningless random noise that carries no information.
      A snit crassly dismisses several millenia of literature because it is unstructured.

      He was not talking about 'data for human consumption', but rather 'data for machine consumption'. As far as a machine is concerned, all literature carries no information, because a machine cannot extrapolate meaning from it like humans can.

      --
      Wenn ist das Nunstueck git und Slotermeyer? Ja! Beiherhund das Oder die Flipperwaldt gersput.
    46. Re:Isn't XML semi-object oriented? by Hognoxious · · Score: 1
      In fact, that's exactly what a good XML database will do. By indexing all the accessible fields
      In other words, it pretends to be a ralational database.

      Of course, these indexes do come at a cost of disk space and memory
      I take that back, it doesn't pretend to be a relational database; it creates one in addition to itself. Maybe that's why it takes so much disk space?
      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    47. Re:Isn't XML semi-object oriented? by AKAImBatman · · Score: 3, Insightful

      "In fact, that's exactly what a good XML database will do. By indexing all the accessible fields"

      In other words, it pretends to be a ralational database.


      What is it with people lately? Everyone seems to want a piece of the trolling action?

      For your information, a set of indexes no more makes a relational database than being in a garage makes you a car. Index data structures are an inherent way to improve the performance of all databases, whether they be ISAM, Hiearchical, Object, or Relational.

      What makes a relational database relational is the way that the data is loosely intertwined to develop logical connections. In theory, these connections can complete various thoughts about a piece of data. ISAM files are merely indexed data, while Hierarchical (including XML) databases contain a complete "data packet" covering a particular concept. (e.g. Customer contains name, a set of addresses, and a set of phone numbers.)

    48. Re:Isn't XML semi-object oriented? by Hognoxious · · Score: 1
      What is it with people lately? Everyone seems to want a piece of the trolling action?
      -1 ad hominem.
      For your information, a set of indexes no more makes a relational database than being in a garage makes you a car.
      Poor analogy. Anyway, suppose your XML database consist of sales orders. In each order you have header information e.g. customer and n items each with a product code, quantity and so on. I don't see how your precious freakin' index is any different to a table[1] (order header) with a 1:N child table (items), just as it would be in a relational one.
      Hierarchical (including XML) databases contain a complete "data packet" covering a particular concept. (e.g. Customer contains name, a set of addresses, and a set of phone numbers.)
      So does a relational one, thought it might take several SQL commands to get it all. Your point was what, again?

      [1] or a view.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    49. Re:Isn't XML semi-object oriented? by AKAImBatman · · Score: 1

      -1 ad hominem.

      Actually, that should be -1 "Complaining about all the trolling".

      Anyway, suppose your XML database consist of sales orders. In each order you have header information e.g. customer and n items each with a product code, quantity and so on. I don't see how your precious freakin' index is any different to a table[1] (order header) with a 1:N child table (items), just as it would be in a relational one.

      The indexing isn't different. Only the data query and retrieval methods. I don't know why that's so difficult to understand. Indexes != Relational Database. Or were you expecting some revolution in database storage and retrieval?

      So does a relational one, thought it might take several SQL commands to get it all. Your point was what, again?

      My point was that they are different query and retrieval methods. Nothing more, nothing less. I don't know why you feel you have to attack me on that point. It's just another type of database that has its own performance and query characteristics. It's not about "I can do something you 1uz3rz can't!"

      Please. If you'd like to have an intelligent discussion, then let's do so. If you just wish to attack and troll, then please go bother someone else.

    50. Re:Isn't XML semi-object oriented? by anomalous+cohort · · Score: 1

      You posted...

      The "arbitrary XML" part. You must have existing mappings set up to process the XML. New forms of XML thus require a great deal of work on the part of the DB developer.

      ...in reply to...

      A generic mapping to a relational database is that elements correspond to the entity tables, attributes correspond to columns in those tables, and the nesting of elements is modelled as a foreign key in the child entity records.

      Is basically that this approach is not generic enough. I have two thoughts on that. Real world XML usage tends to rely on a predefined schema so the original poster's suggestion would probably be good enough for most applications. The second thought is that if you really needed a more generic XML to relational mapping, then just set your database schema to what is truly invariant in XML.

      create table XMLTag (
      XmlTagID int primary key not null,
      Name varchar(50) not null
      )

      create table XMLAttribute (
      XmlAttributeID int primary key not null,
      Name varchar(50) not null,
      Value varchar(250) not null,
      XmlTagID int not null foreign key references XmlTag
      )

      create table XmlHierarchy (
      XmlHierarchyID int primary key not null,
      ParentTagID int foreign key references XmlTag(XmlTagID),
      ChildTagID int foreign key references XmlTag(XmlTagID)
      )

      The advantage of the original poster's XML to SQL mapping is better performance. You could write a single SQL select for any given XPATH. This second approach is more generic but more procedural in the mapping of SQL to XPATH.

    51. Re:Isn't XML semi-object oriented? by Anonymous Coward · · Score: 0
      Please learn how to make links.
      <a href="http://ourworld.compuserve.com/homepages/gra eme_birchall/HTM_COOK.HTM">free db2 cookbook</a>
      (without any spaces put there by Slashdot) yields: free db2 cookbook
    52. Re:Isn't XML semi-object oriented? by Hognoxious · · Score: 1
      Or were you expecting some revolution in database storage and retrieval?
      Given all the hype that would be perfectly natural, would it not?
      I don't know why you feel you have to attack me on that point.
      Where did I attack you? you != you.opinions.
      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    53. Re:Isn't XML semi-object oriented? by Hognoxious · · Score: 1
      in fact something much deeper going on, and that it has a conceptual integrity that I only wish could exist in the most of the rest of computing.
      I have clearly missed this much deeper Tao of SQL, although I've been able to create perfectly functional programs based on the "flat tables with keyed relationships" theme. Perhaps you'd care to enlighten me?
      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    54. Re:Isn't XML semi-object oriented? by rycamor · · Score: 1

      Well, the point is not SQL, but the relational model. In fact, I think SQL itself often distracts from the relational model, because of it's complexity and inconsistency (See presentations by Hugh Darwen at www.thethirdmanifesto.com).

      I'm not saying that "flat tables with keyed relationships" is wrong, but that it is only one part of the relational model (for example, my applications almost never talk to the base tables, but to views and set-returning functions). Really, there is no way to confer this 'Tao' on anyone with a quick sentence or two (there is no one 'aha' moment). It is all about set-oriented and predicate-oriented thinking, rather than object-oriented or procedural thinking. Used properly, this allows you to handle an incredible amount of business logic declaratively rather than procedurally. To grok it any further, I would heartily recommend you spend some time reading C.J. Date's book Foundation for Future Database Systems, as well as the material at www.thethirdmanifesto.com, and www.dbdebunk.com (yes, Fabian Pascal can be abrasive, but the discussions on the site are real eye-openers, especially some of the older material).

  3. "NULLS are bad." quote by MattRog · · Score: 5, Informative

    Celko is misquoting Darwin in saying that "The idea that you will always know everything is arrogant".

    Date/Darwin/Pascal propose that you codify what you don't know (so to speak). Read their proposed solution here:
    http://www.hughdarwen.freeola.com/TheThirdManifest o.web/Missing-info-without-nulls.pdf

    And yes, XML DBMS are a throwback to IBM IMS and other hierarchical DBMS products. Anyone who has ever used a hierarchical DBMS will tell you that there are some pretty non-trivial problems that you cannot work around due to their hierarchical data model, yet XML DBMS proponents propose we go back to that old, inflexible system!

    --

    Thanks,
    --
    Matt
    1. Re:"NULLS are bad." quote by MattRog · · Score: 5, Interesting
      To those not "in the know" here's some further clarification:
      "The use of the terms "flat tables" or "2D tables" to describe data stored in a relational database is wrong, he added."

      Basically what I take from this is that the table (e.g. SELECT * FROM foo) is simply a convenient logical representation of a stored relation. That is to say, foo can be implemented by the DBMS as a linked list, a tree, any data structure. The problem is that current SQL DBMS products do NOT do this and so we have the associated performance problems with normalized schemas. If the DBMS was truly a RDBMS then it could optimize the physical storage to improve performance.
      When asked if the relational model was implemented soundly in today's systems, Craig Mullins' instant reply was "no," but he doesn't think the situation is as bad as Date says it is.
      "We're doing production work and delivering value," Mullins said. "Isn't that what it is all about?"

      The question is not "Are current SQL systems providing value" because certainly they are. They overthrew the hierarchic DBMS products for good reason - they were better. The real question is "Are the current SQL systems providing all the value they can". One can simply look at the wide array of DBMS offshoot products like XML DBMS, so-called "Multivalued DBMS" etc. to know that there are significant limitations of SQL products - ones which Date/Pascal/Darwin stress are not limitations of the Relational Model but merely these SQL products. To put words in their mouth, but I don't think they'd disagree at my summation, they'd suggest that if someone were to implement a Truly Relational Database Management System that these other products would quickly become obsolete.

      --

      Thanks,
      --
      Matt
    2. Re:"NULLS are bad." quote by Yosi · · Score: 2, Informative

      At least some XML people recognize the same, thus the existence of RDF.
      RDF is equivilent to a relational database. There is a working group right now looking into a query language for RDF.

      That said, some RDF people here at the w3c don't care that much for serializing RDF as XML, prefering the much more readable n3

    3. Re:"NULLS are bad." quote by molarmass192 · · Score: 1, Insightful

      I skimmed that article and I totally agree. The author fails to mention is that the S in SQL is for STRUCTURED, as in structured (aka. relational) data. Hierachical databases are by definition unstructured (LDAP, SleepyCat, etc). So no shit SQL doesn't mesh well with XML dbs, it was never meant to.

      --

      Good people do not need laws to tell them to act responsibly, while bad people will find a way around the laws-Plato
    4. Re:"NULLS are bad." quote by __aagmrb7289 · · Score: 1

      Thank you. This was a bit of what I was looking for - a little sense out of a long article on non-sense.

    5. Re:"NULLS are bad." quote by AKAImBatman · · Score: 1

      And yes, XML DBMS are a throwback to IBM IMS and other hierarchical DBMS products.

      Ah yes, IMS tables. I hated those things with a passion. Every time I saw one of those I sat and wondered "Who the hell came up with THIS idea?!" That being said, XML databases are not quite the same thing, IMS databases pretty much required you to navigate the parent record before descending to the sub record. Not to mention that the flat storage meant that both the parent record and child record must take up the same number of bytes. Sure, separating parent and child into two tables makes sense, but IMS was FASTER. (Makes you want to throttle someone, doesn't it?)

      XML databases work on a slightly different principle. For one, there are very few restrictions on the parent/child relationships. You can have as much or as little data at each level as long as it's well formed XML. This is useful for getting rid of many arbitrary SQLDB decisions such as separating customer information (Address, phone numbers, etc.) from the customer record, and allows for things like arbitrary attributes to be attached without changing the schema. In addition, XML databases allow you to query at any level, not just at the parent tag. Thus I can find all customer records where the guy's name starts with "Bob" and he has at least one address on "Drury Lane". You can do the same thing in SQL, but it has various performance considerations, and can result in duplicate records being returned. (Since SQL results do not necessarily conform to an underlying record.)

    6. Re:"NULLS are bad." quote by Chops · · Score: 4, Insightful
      While you're at it, check out this tripe (from the article):

      The relational model is predicate logic applied to databases. Predicate logic is the real-world's two-valued logic (true/false) ... logic guarantees correctness -- defined as consistency -- of query results. It is to preserve logical correctness, therefore, that Codd's Information Principle requires that all information in relational databases be represented as values in relations. The term "NULL values" suggests that Chamberlin does not realize that part of the problem with NULLs is that they are not values -- indeed, they are supposed to be markers for the absence of values. Whatever a database table with NULLs is, neither is it a relation, nor do NULLs represent anything in the real world and, consequently, correctness and the rest of the relational benefits are lost.

      Incidentally, "inapplicable values" are a red herring. They are an artifact of bad database design. There is only one kind of missing value -- unknown -- and as I demonstrate in the above-mentioned chapter, it can be handled relationally, without the huge problems of SQL's NULLs.


      I read this and pretty much gave up getting anything of value out of this article -- I hadn't understood much that went before it, though my distrust of all things XML had led me to believe this guy might know what he's talking about.

      If you removed NULLs from relational database design, people would reinvent them (poorly) -- probably by using IDs of -1 or 0, or IDs to a special magic "null" row, which I suspect is what he's talking about by "it can be handled relationally." To suggest that missing or inapplicable values are not part of "the real world" is so wrong it's... well... wrong. Anyone who's actually done database work (or programming work, for that matter) knows this.
    7. Re:"NULLS are bad." quote by RatRagout · · Score: 3, Informative

      The S in SQL points to the fact that the Query Language is structured...not the data...

    8. Re:"NULLS are bad." quote by ceswiedler · · Score: 2, Insightful

      SQL NULLs are the worst thing since unslicable bread. They break boolean logic. You would think that if (X = Y) is false, then (X != Y) would be true. With SQL, if either X or Y or both are NULL, then any expression evaluating it is false.

      I understand the argument (NULL indicates no data--so you can't claim it's equal to anything). Academic bullshit. Anyone who's maintained code using SQL NULL semantics will agree. If you really want to claim that NULL is so much 'not a value' that you can't compare it to anything, then do it the man's way and throw a goddamn exception. Of course, anyone can see that doing so would make code which MIGHT encounter a null value even MORE difficult to maintain, so they came up with this 'any comparison to NULL is false' crap.

      The easiest way to define NULL is that it's equal to another NULL value, but not equal to anything else. Then I don't need any special 'is null' clause either.

      The very definition of b0rken.

    9. Re:"NULLS are bad." quote by MattRog · · Score: 2, Interesting

      I'm not quite sure what you're getting at but if you've read his work he's never saying that the concept of a missing attribute is "bad" just that the "NULL" method of representing it as such is. He proposes several different methods in his book which I think I mentioned somewhere else.

      If you're interested, it's Chapter 10 - "What You Don't Know Can Hurt You: Missing Information". Take a look at it for a more precise definition.

      --

      Thanks,
      --
      Matt
    10. Re:"NULLS are bad." quote by Maditude · · Score: 1
      If you removed NULLs from relational database design, people would reinvent them (poorly) -- probably by using IDs of -1 or 0, or IDs to a special magic "null" row, which I suspect is what he's talking about by "it can be handled relationally." To suggest that missing or inapplicable values are not part of "the real world" is so wrong it's... well... wrong. Anyone who's actually done database work (or programming work, for that matter) knows this.


      MattRog posted this just a few posts above, about how one could do away with nulls, and gain a (nominally, imho) greater approximation of reality.

      Interesting read, and reminded me of a time or two when I've taken normalization to extremes. Personal experience tells me that: nobody's ever correctly modeled reality, and simple models are much easier to change and maintain than complex ones.
    11. Re:"NULLS are bad." quote by Chops · · Score: 1

      Yes, that was in the post I replied to. I finally took a look at it, and I was right: it advocates something which is effectively special "null" rows, which it justifies by referring to a situation where multiple NULL values are meaningful. I'd handle the "salary" example with a combination of an enum and an integer (something like salary_type enum ('salaried', 'unemployed', 'unknown'), and salary_amount integer) -- that's informationally equivalent to what he proposes and a hell of a lot simpler. His solution is overengineered.

      Most uses of NULL aren't even that complicated -- you'll just have something like an "end date" row, with NULL meaning that the thingy isn't ended yet. You might also have one general "what state is this in" enum (e.g. 'unstarted', 'started', 'ended', 'aborted') per row, and that defines which data is valid and which is NULL. In that case, what he proposes is way overengineered, with one state per variable and extra syntax and wierdness.

      I agree that his criticisms are valid for the situation of using NULL as an actual value (like in a userid column to mean "guest user"). That'll get you in trouble sooner or later, but that's bad design anyway; teach the designer, don't fuck up the system.

    12. Re:"NULLS are bad." quote by AKAImBatman · · Score: 1

      I'm not quite sure what you're getting at but if you've read his work he's never saying that the concept of a missing attribute is "bad" just that the "NULL" method of representing it as such is.

      In fact, if I understood him correctly, his problem was that NULL situations represent a lack of information, but they've been implemented as a type of information in of themselves. While the author may despise XML databases, they handle the situation correctly. i.e. The field does not exist.

      Part of the problem is that simply not having a field would break First Normal Form. (First Normal Form states that all columns must exist in all records. Period.) Yet there the situation arises where an individual record may not have data for that column. Thus the dilemma. You can't get rid of NULLs and maintain Normal Form. The two are simply at odds with one another.

    13. Re:"NULLS are bad." quote by gfody · · Score: 2, Insightful

      nulls shouldn't be used in the case where this behavior would be unwanted. a null isn't a blank value, it's a missing value. implementing a missing value relationaly would mean absence of a reference (the null is the result of a left outer join or simply isn't returned).

      --

      bite my glorious golden ass.
    14. Re:"NULLS are bad." quote by Anonymous Coward · · Score: 1, Insightful
      ...That is to say, foo can be implemented by the DBMS as a linked list, a tree, any data structure. The problem is that current SQL DBMS products do NOT do this and so we have the associated performance problems ...

      I take it you've never used

      CREATE INDEX fooindex ON foo (fookey);
      .Saying that the data is not indexed is a red herring. Lack of indexing is a limitation of your understanding of your RDBMS, not SQL or the relational model.

      The concern of OODBMS proponents is not performance, it's data modeling. OODBMS proponents assert that by useing on object model they can easier represent complex real-world systems. RDBMS proponents counter that the lack of structure of an OODBMS leads to lack of representation of the semantics and to problems with searching. RDBMS' and the relational model excel at representing highly structured data and efficiently searching it. OODBMS' and the object model excel at representing complex structures without a high degree of structure, or where the structure needs to be flexible.



      Which should you use? well that depends... I would not consider an RDBMS for a MMORPG. On the other hand, I would not consider an OODBMS for an accounting system.

    15. Re:"NULLS are bad." quote by Chops · · Score: 1

      Hmm -- I've maintained SQL code for a few years at least, and my head's never exploded from the giant boolean contradiction that is NULL comparison semantics. Do you also hate IEEE NaN?

      SQL isn't really code; this thing you say about throwing exceptions has no meaning within a "SELECT WHERE" statement. Returning FALSE is about the closest you get the throwing an exception in SQL; it's just sort of a "these aren't the droids you're looking for" answer, which is generally precisely what you want when you're dealing with a null row.

      For example, I really wouldn't want "SELECT customer_pictures.imageblob FROM customer_pictures, transactions WHERE customer_pictures.eventid = transactions.eventid AND transactions.value > 10000 to return every picture that wasn't associated with an event for whatever reason (if there was a >$10000 transaction that wasn't associated with an event). The kind of problem you're talking about is only true if you're using NULL to mean something (in another post I mentioned using NULL in a userid column to mean "guest user", which is totally wrong and will cause problems, yes). If you use it to mean nothing, though, then the semantics you describe would be the ones that would cause problems, whereas the ones that actually exist are trouble-free.

      (Yes, the example is wierd, but it gets my point across I think.)

    16. Re:"NULLS are bad." quote by CoughDropAddict · · Score: 2, Insightful

      I cannot think of a situation where I would want NULL=NULL to be true. I'm no SQL guru and I'm a bit rusty, but:

      1. the only situation I can think of where you are comparing two values, where neither is a literal, is when you are joining on a pair of columns

      2. if one of the values *is* a literal, there is no reason not to write "IS NULL" instead of "=NULL"

      3. if you *are* joining on a pair of columns, letting "NULL=NULL" would not make any sense (you would get the cartesian product of all rows from both tables that have NULL in those columns)

      4. also, if you are joining on a pair of columns, one of those should be a primary key, and you shouldn't have NULL values in a primary key column.

      Can you illustrate a situation where you want NULL=NULL to be true? Perhaps the situation arises in procedural SQL (stored procedures and such) that I am not as familar with.

    17. Re:"NULLS are bad." quote by Anonymous Coward · · Score: 0

      Ahh, spoken like a MySQL developer.

    18. Re:"NULLS are bad." quote by ceswiedler · · Score: 1

      Yes, it's more of a procedural thing. If you have a variable which may or may not be null, you can't do 'where column = :variable'.

    19. Re:"NULLS are bad." quote by johnnyb · · Score: 1

      From what I've read from these guys in the past, it's not quite what you're making it out to be. They believe that NULL's should be removed, but then replaced with the concept of "domains" (domain being a possible range of valid values). For example, if I have a field "salary", normally in SQL you would give it a number or null. However, does NULL mean that they aren't employed? That we don't know what their salary is? What? So, instead, what you need is to set the column up to be, instead of just a regular integer, a domain consisting of (a) all the integers, (b) the value NO_SALARY, and (c) the value UNKNOWN_SALARY. This way, you don't have the vague concept of "null", but specific concepts which are flexible, since you get to define the domain and include any "special" values that you need to have a proper data model.

      They simply believe that "nulls" are a hack around the real need of supporting full value domains.

    20. Re:"NULLS are bad." quote by ceswiedler · · Score: 1

      No...if you're joining two tables, and a column is NULL, then it doesn't match the join. NULL only would equal NULL. So in your example, only if a customer_pictures row had a NULL eventid and a transactions row had a NULL eventid would these be joined. They join because neither one is associated with any event. It's easy in this case to append 'and eventid != NULL' if you also want to filter these out.

    21. Re:"NULLS are bad." quote by Anonymous Coward · · Score: 0

      : SQL NULLs are the worst thing since unslicable bread. They break boolean logic.

      You say that as if it's the NULL's fault, and not SQL's. Perl has NULLs (undef) that don't break logic, so it's not a problem inherent to NULLs, but a problem in how the language defines the operators.

    22. Re:"NULLS are bad." quote by Chops · · Score: 1
      No...if you're joining two tables, and a column is NULL, then it doesn't match the join. NULL only would equal NULL. So in your example, only if a customer_pictures row had a NULL eventid and a transactions row had a NULL eventid would these be joined. They join because neither one is associated with any event.

      Correct -- the given query will return every NULL-event picture if any selected transaction has a NULL event. Saying "only" and explaining why it happens doesn't make it right :-).
      It's easy in this case to append 'and eventid != NULL' if you also want to filter these out.

      If you're using NULL correctly, then you always want to filter these out, which is why the SQL semantics are what they are.
    23. Re:"NULLS are bad." quote by Anonymous Coward · · Score: 0

      : So no shit SQL doesn't mesh well with XML dbs, it was never meant to.

      Maybe you should do more than skim the article. That's not even close to the topic discussed.

    24. Re:"NULLS are bad." quote by Anonymous Coward · · Score: 0

      Which isn't all bad. For example "WHERE Price < 100.00" -- you most likely would not want rows that did not have a price value assigned.

    25. Re:"NULLS are bad." quote by Tony-A · · Score: 1

      SQL NULLs are the worst thing since unslicable bread. They break boolean logic.

      It's boolean logic that's broken.
      More exactly, inapplicable.
      If the presence of any unknown makes it impossible to represent everything that is known, you've got some serious problems.
      The real world is rather more complicated than a trite boolean logic world.

    26. Re:"NULLS are bad." quote by Anonymous Coward · · Score: 0

      One point alone rebukes what you are pushing.

      In oracle, indexes are not populated for rows that have columns of only NULLs. It is possible to have very small (read: selective) indexes on otherwise large tables, even if a function needs to be applied to generate the NULL value.

      NULLs are there and are not going away anytime soon. They server a very good purpose, they just need to be understood and used properly.

      besides, that, fuck you.

    27. Re:"NULLS are bad." quote by tsarin · · Score: 2, Insightful
      With SQL, if either X or Y or both are NULL, then any expression evaluating it is false.
      Actually, if if any members of the set (X, Y) are NULL, any comparison involving those members evaluates to NULL, not false.

      The easiest way to define NULL is that it's equal to another NULL value, but not equal to anything else. Then I don't need any special 'is null' clause either.
      No, the easiest way to define NULL is that you don't know what it is; it's unknown. It can't be equal, not equal, whatever to anything, even another NULL, for that very reason.

      An example: I'm thinking of a number. Tell me, is it greater than 13? Unknown. Less? Unknown. &c. Try

      select (NULL = NULL)::boolean;
      (PostgreSQL syntax; substitute however it works for your favorite [O]RDBMS). If it's not brain-damaged, you'll get NULL. (Even MS SQL gets this one right, folks.)

      The very definition of "b0rken" is people railing -- incorrectly, even -- against something they don't understand. The very definition of irony is when the thing in question is SQL NULL.

    28. Re:"NULLS are bad." quote by kcbrown · · Score: 2, Interesting
      SQL NULLs are the worst thing since unslicable bread. They break boolean logic. You would think that if (X = Y) is false, then (X != Y) would be true. With SQL, if either X or Y or both are NULL, then any expression evaluating it is false.

      No wonder you think SQL NULLs are the worst thing -- you can't even get the semantics right.

      When either X or Y (or both) are NULL, then any expression evaluating it is NULL. Not false. Not true. NULL. Any SQL implementation that behaves differently is broken.

      It's not the SQL standard's fault if your code's logic can't handle that case. Nor is it the standard's fault that you can't see fit to NOT USE the NULL feature when you don't want to (and any reasonable database even goes so far as to give you the option of making *sure* you don't use it -- that's what the NOT NULL declaration when defining a column is for).

      Getting rid of NULL isn't going to help you when you suddenly discover that you really DO need to be able to represent "missing data" somehow.

      --
      Use 'slashdot stuff' in the subject line in any email you send me if you want to get past the spam filter.
    29. Re:"NULLS are bad." quote by Tony-A · · Score: 1

      If you really want to claim that NULL is so much 'not a value' that you can't compare it to anything, then do it the man's way and throw a goddamn exception.

      You have a select statement that returns 55000 NULLs.
      How would you field each one of those exceptions?
      How would you do it correctly?

      You can compare numbers.
      Either A < B or A > B or A = B.
      If A < B and B &;t C then A &lt C.
      Fine, but did you ever hear of Complex Numbers?

    30. Re:"NULLS are bad." quote by Anonymous Coward · · Score: 0

      but "unknown" == "unknown" yes ?

      The fact that for both nulls, both are unknown is true right ?
      so null == null == true could make sense if you are talking
      about a meta statement that: it's true that both are unkown.

    31. Re:"NULLS are bad." quote by Tony-A · · Score: 2, Insightful

      They simply believe that "nulls" are a hack around the real need of supporting full value domains.

      Thanks, makes sense.
      However, the required complexity required to handle "the real need of supporting full value domains" has to be sufficient to handle all possible reasons and scenarios where you do not have a simple value. You might get close with a Lisp back-end, but this is heading too much like needing a supercomputer to hand one number in an el-cheapo caculator.

      Nulls are a hack. A nice simple hack. It is possible to build bigger and more complicated hacks and take things a wee bit further, but the further the screwier I'd expect.

    32. Re:"NULLS are bad." quote by johnnyb · · Score: 1

      You know, what's interesting is that Codd (the one who wrote the original paper) started out hating NULLs, but I believe now he believes in their practical value.

    33. Re:"NULLS are bad." quote by Anonymous Coward · · Score: 0

      I'd hardly call it "academic bullshit"; nor is it broken.

      The semantics of NULL don't stop you from using comparisons like =, ; however, it does make you think about them.

      What you end up doing is one of two things:

      1. You use IS NULL or IS NOT NULL, because you're interested in the when the value is, specifically, null; or

      2. You use a substition function (usually NVL()) to replace the null with an appropriate value.

      Either way, you supply a meaning to the null value, which is exactly what you should be doing.

      Personally, I think your attitude is just plain lazy. Either that, or you haven't worked with many real-world commercial databases, where nulls have a specific meaning, in context.

    34. Re:"NULLS are bad." quote by Tony-A · · Score: 1

      What you don't know has a way of messing up any and all grandiose plans.

      He was right about one thing. Anything that is capable of being represented will most likely be represented. A good choice of representation can deprive an awful lot of nasty problems of any means of expressing themselves. Getting rid of "almost all" still leaves plenty ;-(

    35. Re:"NULLS are bad." quote by doom · · Score: 2, Insightful
      Date/Darwin/Pascal propose that you codify what you don't know (so to speak). Read their proposed solution here:
      Well, I read it and essentially what they're saying is that instead of having a single NULL that means "we don't have the info and we don't know why", they recommend having lots of single column tables pointing at the missing values, one table for each reason/excuse. So instead of having NULLs in the salary field, you've got tables that explain "No salary because the guy is on commision", "No salary because the guy is unemployed", "No salary because it is regarded as confidential", and so on.

      What strikes me about this notion (outside of it's general clunkiness) is that it seems to assume that you will always know why you don't know. In reality, I think systems like this would sprout lots of tables called "unknown_value_for_unknown_reasons".

    36. Re:"NULLS are bad." quote by MattRog · · Score: 1

      "(outside of it's general clunkiness)"

      Well, their solution is for DBMS vendors to implement, not SQL DBMS users (because SQL products are pretty clunky as you pointed out).

      "...that it seems to assume that you will always know why you don't know. In reality, I think systems like this would sprout lots of tables called "unknown_value_for_unknown_reasons"."

      Well, consider what it means to say "I don't know why I don't know this" -- why would you store that in the DBMS in the first place? The DB is a collection of facts, and the absence of a fact is essentially "I don't have this fact" which I think is logically equivalent to your supposition.

      However that said I'm not sure of too many instances (I can't think of any off of the top of my head) of why you'd have something that is truly unknown in your example. Of course, you have to wonder how valuable or reliable the rest of your data is if you start having many entries of "I have no idea" :)

      --

      Thanks,
      --
      Matt
    37. Re:"NULLS are bad." quote by Anonymous Coward · · Score: 0

      Sure. And if someone were to implement a Truly Communist Society, capitalism would be obsolete, too.

      The fact that no one has yet to pull off such a feat doesn't seem to bother them nearly as much as it bothers me.

    38. Re:"NULLS are bad." quote by HaggiZ · · Score: 1

      You obviously don't need to be more familiar as you've got it pretty well summed up.

      If you don't want nulls, put a constraint on the field to prevent them being accepted as a type.

      How difficult is that for anybody who knows how to use SQL effeciently? Don't blame on poor design what can easily be explained by ignorance.

    39. Re:"NULLS are bad." quote by Wastl · · Score: 2, Insightful
      Basically what I take from this is that the table (e.g. SELECT * FROM foo) is simply a convenient logical representation of a stored relation. That is to say, foo can be implemented by the DBMS as a linked list, a tree, any data structure.

      True. However, this encoding is usually very inconvenient (consider representing an HTML document or a structured piece of literature in this manner).

      Besides this, nested structures are at least as logical as flat structures (I continue to call them flat because they *are*). Relational database logic is merely a fragment of first order predicate logic, one that is restricted to - guess what - flat relations, whereas first order predicate logic usually works with *nested* structures (called terms) and relations. XML and other nested data structures fit very well into logic, and in fact we (a research group in Munich and some other places) are working on a logic-based query language that exploits this similarity.

      I agree with many of the statements that the author of the article makes, in particular regarding XQuery. However, some are so arrogant and *unproven* that it leaves the article in a bad light. Also, while he claims to have a good insight into database theory, I don't think he really has. SQLs big advantages are (1) it is easy to use and (2) it has a very limited expressive power which makes it easy to implement and efficient to evaluate. Other approaches have been considered, e.g. in deductive databases or knowledge base systems. However, those needed languages that were basically Turing complete or at least supported basic recursion (to implement transitive closure) and thus could lead to very inefficient queries.

      Sebastian

    40. Re:"NULLS are bad." quote by Anonymous Coward · · Score: 0

      you dense fool

      I don't know how old you are
      you don't know how old I am

      does that make us the same age?
      (no it does not)

    41. Re:"NULLS are bad." quote by Tony-A · · Score: 1

      but "unknown" == "unknown" yes ?
      No.

      "unknown" == "unknown" no.
      "unknown" != "unknown" no.

      I think the semantics for NULL is that the row for that column value does not exist. NULL is not a value, it is a mechanism to indicate the lack of a place that would hold that value.

    42. Re:"NULLS are bad." quote by ultranova · · Score: 1
      Which should you use? well that depends... I would not consider an RDBMS for a MMORPG. On the other hand, I would not consider an OODBMS for an accounting system.

      Why not ? CREATE TABLE objects (id SERIAL PRIMARY KEY, x INT NOT NULL, y INT NOT NULL, z INT NOT NULL, pointer INT, spectators INT, objectdata BYTEA NOT NULL);

      Whenever someone comes near the object, if the pointer is NULL it's taken from the storage (deserialized ?) and the pointer is set to point at the object in memory. If pointer is not null, well, thats the pointer to the object in memory. In both cases spectators is increased by one. When the last person leaves (spectators goes to zero), object is stored (serialized ?) and pointer is set to NULL.

      Of course all this assumes that the MMORPG server shares a single memoryspace, and all the pointers and spectators need to be reset at each startup. And we need semaphores around the retrieval/restore processes and spectator increase/decrease...

      Anyway, I've never used a OODBMS database, so how would you store the object in one, and how would it be more usefull / perform better ?

      --

      Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

    43. Re:"NULLS are bad." quote by JamieF · · Score: 1

      >Hierachical databases are by definition unstructured (LDAP,

      This nonsense about calling anything that isn't strictly tabular "unstructured" has got to stop. "Structured" is NOT a synonym for "relational".

      LDAP has objectclasses, and an objectclass's attributes are typed. It's hardly "unstructured".

      If you want to think of it in these terms, an objectclass is like a table; an attribute is like a column. Maybe add a table with columns for dn, table name, and an FK to the PK of each table, and voila, you've got a SQL DBMS back-end for an LDAP directory. (No, it's not optimized for ad hoc querying, but that's not what LDAP is for either, and it'd be blazing fast for lookups by DN.)

    44. Re:"NULLS are bad." quote by the_olo · · Score: 1

      They break boolean logic. You would think that if (X = Y) is false, then (X != Y) would be true. With SQL, if either X or Y or both are NULL, then any expression evaluating it is false.

      1. I don't agree. See below to see why NULL is useful in this SQL logic.
      2. What you've stated is untrue. If either X or Y or both are NULL, then any expression evaluating it is NULL, not false! This way, you get the indication that some unknown value doesn't let you make the decision (true or false).
    45. Re:"NULLS are bad." quote by lucianx · · Score: 1

      Depends on the implementation. For example, in PostgreSQL, if either X or Y is NULL, any expression evaluating it isn't false: it is, in fact, NULL.

      This makes sense, because there is no meaningful comparison for truth-value against a NULL, because there is no value for comparison. If there is a reasonable default value, use coalesce(). Otherwise, what's the problem, in practice?

      --
      John C. Worsley - Artist, Musician, Coder
      Portfolio
    46. Re:"NULLS are bad." quote by Anonymous Coward · · Score: 0

      Who modded this a +5.

      By definition a relation involving null can't be a relation. Your table design is wrong.

      If a value is not part of relation then the tuple doesn't need to exist.

      If some one has a death date of null then what are they doing in a table that represents date of death. There should simply be no row that matched.

    47. Re:"NULLS are bad." quote by Anonymous Coward · · Score: 0

      NULL op NULL always produces NULL; any system that gets this wrong is not a SQL implementation.

    48. Re:"NULLS are bad." quote by Anonymous Coward · · Score: 0

      Preview is your friend.

    49. Re:"NULLS are bad." quote by epine · · Score: 1


      I also had problems with the tone of the author. The killer point for me was how he lambasted SQL without pointing to *any* relational query language which had, in his opinion, been done right. If no one has ever invented such a thing, maybe the Unicorn has no clothes.

  4. 'scuse my ignorance but... by fiannaFailMan · · Score: 4, Interesting
    It has been quite obvious that the designers of SQL had little understanding of data fundamentals in general, and the relational model in particular; and SQL was hardly developed in accordance with good language design principles.
    What exactly is the problem with SQL?
    --
    Drill baby drill - on Mars
    1. Re:'scuse my ignorance but... by sphealey · · Score: 1
      What exactly is the problem with SQL?
      I have read 20 or so of Mr. Pascal's rants in various forums over the last two years, and darned if I yet know the answer to that question. Wonder if he is related the Eli Goldratt? Their style of argument is similar.

      sPh

    2. Re:'scuse my ignorance but... by XMyth · · Score: 5, Funny

      Well, all the apps which are backed by SQL databases are crashing all over the place. After its several years in the field now SQL has been proven to be unstable, unreliable, and completely incapable of doing the job.

      Evidence of this is in the hundreds of companies who are completely unable to maintain a database of any significant size despite vendor claims to the contrary. Also, note the thousands of websites which routinely fail due to random database problems. It appears that all SQL products are sad implementations of a horrible standard which simply does not cut it .

      (the above is intended entire as sarcasm)

    3. Re:'scuse my ignorance but... by barks · · Score: 1

      Prehaps those that are unable to grasp the basics or understanding of it?

      It's funny this is from an advocate of relational databses. I once remember a wise person telling me once that unless you're making an "Access Realm database" that relational databases would not be the way to go.

      E.g. when a police officer is punching up your licence info to see if you're a wanted criminal that's considered armed and dangerous, chances are it's not running through a relational database to give Johnny Law a quick response.

    4. Re:'scuse my ignorance but... by afidel · · Score: 5, Insightful

      The problem with standards is their's so many to choose from. Or in the case of SQL every vendor seems to think that the standardized language is inadequate and yet they make no roads towards improving the standard. This leads to every vendor having their own superset of the language which makes maintainability in cross database projects exceedingly difficult and migration in applications that aren't designed for it incredibly difficult. As to fundamental flaws in the concepts around SQL I have yet to hear a concrete argument against it, mostly vague rantings from people who's ideas weren't chosen by the marketplace to serve real world needs.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    5. Re:'scuse my ignorance but... by Doesn't_Comment_Code · · Score: 2, Funny

      Oh... I was all ready to thrash you, with my mouse over the reply button and everything. Then I got to that last line about how your entire post was sarcasm.

      Whew...

      --

      Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.
    6. Re:'scuse my ignorance but... by Tarantolato · · Score: 4, Insightful

      One problem is that as a language it lacks elegance and is awkward to build large queries in. More deeply smug relational weenies insist that it does not properly model the relational algebra model pioneered by Ted Codd.

      I'm not sufficiently versed in database theory to understand the technical side, but SQL certainly does feel to me like a non-optimal solution.

    7. Re:'scuse my ignorance but... by Short+Circuit · · Score: 1

      It sounds silly to blame the stability problems of a database on the language used to talk to said database...That language shouldn't have any bearing on the quality of the code used to implement it.

      Or maybe it's just me...

    8. Re:'scuse my ignorance but... by tanguyr · · Score: 5, Insightful

      What exactly is the problem with SQL?

      Wouldn't call it a problem, but there just seems to be something about it that drives all the Oo fanboys up the wall - maybe it's the fact that they can't make nice UML diagrams of a query or something.

      There nothing wrong with SQL and RDBMS - they've been around for years and they'll be around for years to come. I have this argument each and every day at work with people who seem to think that the solution to (hypothetical) "database bottlenecks" is to bury everything in a quarter of a million lines of EJB code and invest a king's ransom in application server licenses to run it on. Don't get me wrong: i've seen some real horrorshow coding with SQL mixed into code, but a bad coder will produce bad code in any language. Until then, SQL works. What more can you say?

      --
      #!/usr/bin/english
    9. Re:'scuse my ignorance but... by MattRog · · Score: 5, Informative

      There are a couple of "problems" that they have identified:
      1) You can write a given query and number of different ways. This is not necessarily a SQL problem but due to this the query optimizers have to be enormously complex to handle complicated queries and by association you can have queries which describes two identical sets but have vastly different runtimes/costs.
      2) Little/No support for relational domains (e.g. complex data types)
      3) Non-updateable views (partially due to duplicate handling and/or allowing relations with no primary key)
      4) Weak support for complex integrity constraints (e.g. business rules)
      5) No support for entity sub/supertype relationships
      6) Supports NULLs (Date/Pascal/Darwin do not like NULLs)

      Try searching www.dbdebunk.com for SQL. Or pick up the great book "Practical Issues in Database Management" by Fabian Pascal.

      --

      Thanks,
      --
      Matt
    10. Re:'scuse my ignorance but... by LostCluster · · Score: 1

      The main complant I'm seeing is that an SQL-based database doesn't exactly support a tree-based structure "out of the box". However, isn't using linked tables to make a one-to-many relationships one of the first things anybody's taught in a basic how-to-make-a-database class?

      To me, these people drank too much XML Kool-aid...

    11. Re:'scuse my ignorance but... by Xentax · · Score: 4, Insightful

      No kidding. This sounds like Andy griping that Linus' "school project" is an inferior kernel.

      Sounds like a semantic argument to me; where the Rubber Meets The Road, Linux is the kernel of a variety of widely used, production-quality OS's, while Minix is an academic *model* (on purpose, to be sure, but a *model* rather than a useable-on-a-daily-basis kernel, nevertheless). Similary, claiming SQL is crappy for various academic/theoretical reasons doesn't change the fact that it's in wide use today, as a concrete solution to any number of million- and billion-dollar abstract problems.

      So, if SQL is so bad, maybe they should stop cursing the darkness, and show us the light. In the meantime, people will use (and incrementally) improve the tools at hand to solve the problems at hand.

      Right now, SQL-based database solutions are generally the best solutions for *real* data problems that we have to solve, from mySQL-driven personal webpages, to enterprise-grade databases powering major websites, business-to-business e-commerce, and everything in between.

      Invent a better mousetrap, and the world will beat a path to your door. Criticizing the mousetrap as an inferior pest control device doesn't do much to keep the mice out...

      Xentax

      --
      You shouldn't verb words.
    12. Re:'scuse my ignorance but... by Anonymous Coward · · Score: 0

      Also, no support for composite foreign keys - where a foreign key is required to be from table1 OR table2 (alternatively, no support for foreign key relationships to views rather than base tables).

    13. Re:'scuse my ignorance but... by kpharmer · · Score: 1

      > The main complant I'm seeing is that an SQL-based database doesn't exactly support a tree-based
      > structure "out of the box". However, isn't using linked tables to make a one-to-many relationships
      > one of the first things anybody's taught in a basic how-to-make-a-database class?

      Well, you don't normally want to have to implement recursion by manually linking tables together - way too cumbersome.

      You're better off using the 'out of the box' support for recursion from the database vendors. I use it in db2 all the time, and have used it in oracle as well successfully.

    14. Re:'scuse my ignorance but... by sql*kitten · · Score: 5, Insightful
      there just seems to be something about it that drives all the Oo fanboys up the wall

      Yeah, I know what you mean. These kids can't wrap their tiny minds around the following concepts:
      • A table is not a class
      • A row is not an object
      • A column is not a property

      Whenever I see a project gone horribly wrong, and the language is C++ or Java, the problem usually is the system architect didn't grok the above statements. They should be tattooed onto the forehead of every OO programmer, so when they're "pair programming" they can read it off each other.

      solution to (hypothetical) "database bottlenecks" is to bury everything in a quarter of a million lines of EJB code

      I get that too - then I show 'em the logs that show the database processor is mostly idle as it waits for their application to either request more data or finish working on what it's got!
    15. Re:'scuse my ignorance but... by tanguyr · · Score: 4, Funny

      I get that too - then I show 'em the logs that show the database processor is mostly idle as it waits for their application to either request more data or finish working on what it's got!

      Here's one that'll make you howl: "sorting is a presentation-tier concern"

      --
      #!/usr/bin/english
    16. Re:'scuse my ignorance but... by alex_tibbles · · Score: 1

      read Date's book (Introduction to Database Systems) but basically, SQL is meant be be a relational query language, but it isn't relational!

    17. Re:'scuse my ignorance but... by metamatic · · Score: 1

      I would summarize your summary as: The problem with SQL, and with relational models as implemented by SQL databases, is that it is often incredibly difficult to represent real-world data.

      This is particularly true if you limit yourself to standard SQL and brain-dead fixed-length fields. The result is endless RISKS Digest stories about inflexible database systems which can't represent unexpected real-world data because it's too wide, too narrow, uses unexpected characters, ...

      Even a fairly simple problem domain, like a customer relationship management system, results in a complex tangle of tables.

      Furthermore, the representation which allows you to query the data the way users want (star schema) is utterly different to the representation which allows you to update the data efficiently and keep it consistent (normalized form).

      This latter defect has given rise to the entire OLAP / data warehousing industry, where you basically take all your data out of a database with one structure, and dump it in its entirety into a separate database with a different structure, just so that you can browse it and report on it without your SQL server falling over.

      --
      GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
    18. Re:'scuse my ignorance but... by LostCluster · · Score: 4, Insightful

      1) You can write a given query and number of different ways. This is not necessarily a SQL problem but due to this the query optimizers have to be enormously complex to handle complicated queries and by association you can have queries which describes two identical sets but have vastly different runtimes/costs.
      Just because two queries return the same results today do not mean that they will continue to do so in the future. If a value that used to be bounded from 1 to 10 suddenly is declared to be allowed to be cranked to 11, then suddenly "equal or greater than 9" and "equal to 9 or equal to 10" will have gone from always returning the same results to now specifying different sets. Clearly, the more specific code will execute faster, but if an assumed boundry no longer holds in the future, the program will become obsoleted and require revision to the less specific version. This isn't a language-specific issue, it's just a problem that crops up whenever a computer program encounters a situation its designer wasn't expecting.

      2) Little/No support for relational domains (e.g. complex data types)
      Not a bug, it's a feature. The S in SQL is for "structure"... go hammer out your data into a structured format rather than a complex one and then come back.

      3) Non-updateable views (partially due to duplicate handling and/or allowing relations with no primary key)
      Totals will always be a non-updatable view. You can't change the number of objects you have without creating some new objects or chosing to get rid of some existing objects. Fields in a one-to-many relationship cannot be changed because to do so would be ambigious... do you want to create a new entry in the other table, or do you want to rename an existing entry in the other table. Go do what you meant to do, then refresh your view.

      4) Weak support for complex integrity constraints (e.g. business rules)
      That's more an issue for applications rather than databases. The program or user that's creating the query should know what's allowed by business rules, because if the database is going to refuse a query due to business rule violations, that query shouldn't have been offered to the database in the first place. Those errors should be trapped upstream before they get that far. SQL triggers for business rules should be a last line of defense, not something that should be regularly asked to function.

      5) No support for entity sub/supertype relationships
      Plenty of support, just not intrinsically. Just use a one-to-many relationship in your DB structure and go along your way.

      6) Supports NULLs (Date/Pascal/Darwin do not like NULLs)
      That's like trying to do math without a concept of zero. Sometimes, things just don't apply and we put "N/A" on the form and "NULL" in the database.

    19. Re:'scuse my ignorance but... by sql*kitten · · Score: 1

      Here's one that'll make you howl: "sorting is a presentation-tier concern"

      I've met people who think that joins are best done in the middle tier, on tables with hundreds of thousands of rows!

    20. Re:'scuse my ignorance but... by Osty · · Score: 1

      You can write a given query and number of different ways. This is not necessarily a SQL problem but due to this the query optimizers have to be enormously complex to handle complicated queries and by association you can have queries which describes two identical sets but have vastly different runtimes/costs.

      Same for Perl, or C, or pretty much any other language out there. This just means you need to learn more about your chosen RDBMS platform, and how to investigate what it'll do for a particular query.


      Non-updateable views (partially due to duplicate handling and/or allowing relations with no primary key)

      I'd suggest that's a limitation of the engine you're using, and not of relational databases in general.


      Weak support for complex integrity constraints (e.g. business rules)

      Yes, there's only so much you can do with constraints, but CHECK constraints can be very powerful. Limit your database access to stored procedures, and implement your necessary business logic there. Then, if someone needs to insert data that you can't constrain through foreign keys, unique constraints, and check constraints, you can do more complex handling in the stored procedure.


      Supports NULLs (Date/Pascal/Darwin do not like NULLs)

      Waa. NULLs are not inherently evil. This argument is along the same lines as GOTO arguments. GOTOs aren't bad when used properly. The misuse of GOTOs is what's bad. The same goes for NULLs. For what it's worth, if I have a table where I expect to have some unknown values, I want to be able to get a NULL. I don't want to know that -1, or "", or some other magic value is equivalent to "unknown" when there's already a defined standard for "unknown" -- NULL.


      There are plenty of criticisms for SQL databases, and even more so for individual implementations (why does MySQL suck so much? Why does every vendor have to add their own extensions to the SQL standards? Why is there no cross-platform way to mark a table as AUTOINCREMENT/IDENTITY/whatever other RDBMS engines use? etc). The arguments above are pretty weak, however.

    21. Re:'scuse my ignorance but... by Brandybuck · · Score: 2, Interesting

      As a OO fanboy, I must protest your mischaracterizatoin of us. Unlike the DB fanboys, we never claimed that OO is suitable for every problem domain.

      What drives me nuts aobut the DB fanboys is that they have to use DB for everything. For example, I maintain an embedded system where some nitwit ex-web-developer decided to implement the process table with MySQL. Huh?

      DB is suitable when you have massive amounts of uniform data. OO is suitable when you have heterogenous structured data. Often these two areas overlap, so you can choose whichever you prefer. You can even mix the two if you like. But where they don't overlap, stick with the appropriate paradigm.

      --
      Don't blame me, I didn't vote for either of them!
    22. Re:'scuse my ignorance but... by iSwitched · · Score: 2, Insightful

      You seem to have been exposed to some relatively clueless, or at least inexperienced J2EE coders.

      No java guy worth his or her salt thinks that all that EJB code is there to 'work around' database bottlenecks. Anyone with any experience knows that if you want pure perfromance, you let the database do its thing with nice optimized stored procedures in the dbs sql dialect du jour. Of the many reasons for choosing EJB, one is to abstract the database away enough that your application will run against any database, with no re-coding.

      You see, the opposite end of the extreme example you proposed is all those monolithic DB apps I've seem where the data-access AND business logic are tied up in multitudes of stored procedures, creating a porting and upgrading nightmares, should you ever want to change platforms.

      Sure, if you're creating proprietary solutions for internal corporate problem-solving, and you're pretty sure you'll be running on Oracle forever, then go for it.

      If you need multiplatform, databse-agnostic apps, then I've been happy with Java for a while now. ...And as for that Kings Ransom? Might I suggest JBoss or Orion.

      --
      "That naive cube! How long must I suffer this!" --Sheldon J. Plankton
    23. Re:'scuse my ignorance but... by The_REAL_DZA · · Score: 1

      (the above is entended entire as sarcasm)

      Dang it! In the future PLEASE , instead of appending such a statement to the bottom of your post, please use this statement at the top of your post:

      "The following is intended entire as sarcasm"

      I had my asbestos typing gloves all donned and was just cracking my knuckles when I got to that line...

      --


      This space intentionally left (almost) blank.
    24. Re:'scuse my ignorance but... by MattRog · · Score: 5, Informative

      "Just because two queries return the same results today do not mean that they will continue to do so in the future."

      Total misunderstanding of what I wrote. To put it another way:
      SQL allows you to write queries which are mathematically equivalent but result in vastly different query plans and performance. Again, not a particularly stinging-indictment of SQL as such but had it been designed differently it could have avoided such ambiguity in the language.

      "Not a bug, it's a feature. The S in SQL is for "structure"... go hammer out your data into a structured format rather than a complex one and then come back."

      So you're saying a tree has no "structure"? That a domain has no structure? If it had no structure, it would be a little difficult for computers to process.

      "View stuff"
      Pascal (or Date, can't remember) provides an iron-clad (mathematical definition) method of creating views which will always be updatable. There are structural deficiencies in SQL which prohibit this. I will not waste time/typing here illustrating them, they are all identified at their web site.

      "SQL triggers" etc.
      It is precisely the reason that applications were enforcing business rules that DBMS were invented all those years ago! There are plenty of reasons that application-enforcement of business rules is a bad thing. Again these are illustrated on their web site. Also, your quote about "SQL triggers" is basically re-stating what I mentioned: that SQL is poor at implementing business rules!

      "Plenty of support, just not intrinsically."
      Which is exactly the same as saying "no support for entity sub/supertypes". Plus, one-to-many tables are not the same, you're thinking of something else. Chapter 6 of Fabian Pascal's book "Practical Issues in Database Management" covers this in some depth.

      "That's like trying to do math without a concept of zero."
      Not quite the same. Remember that the relational model is based upon predicate logic and set theory. Set theory has the empty set, which is not the same as NULL. SQL products currently handle null in a ridiculous manner (some sort NULL oddly, comparison is difficult, summation is odd). Pascal/Date do not suggest that the concept of "unknown" is bad, just that the SQL representation as NULL is.

      --

      Thanks,
      --
      Matt
    25. Re:'scuse my ignorance but... by Doctor+Faustus · · Score: 1
      What exactly is the problem with SQL?
      I've got a couple, off the top of my head:

      You can use "Select Max(EntryDate) From Customer" to find when the last customer was entered easily enough, but when you want to find out who that customer was, it gets nastier
      Select *
      From Customer
      Where EntryDate = (Select Max(EntryDate) From Customer)
      Also, you can store hierarchies perfectly well (what you can store really isn't dependant on SQL), but you can't retrieve a hierarchy without involving procedural code (either extensions to SQL, like MS/Sybase T-SQL or Oracle PL-SQL, or client code).
    26. Re:'scuse my ignorance but... by MattRog · · Score: 1

      SQL DBMS != RDBMS.

      You're basically re-iterating my points.

      Check my other posts for a complete rebuttal/clarification of all of your points.

      --

      Thanks,
      --
      Matt
    27. Re:'scuse my ignorance but... by shirai · · Score: 2, Insightful

      The one bone I have to pick about SQL is there is no standard way of retrieving the value of an Identity, Counter, AutoIncrement (or whatever your database calls it) field after inserting a record. This is brain dead.

      Every SQL dialect has a special, unique way of getting the value in this field. Many (not all) things in SQL can be written to be compatible with virtually every database (if you are very very careful) except this one important thing. I know you can requery the database with all the fields or insert a unique ID manually but this seems like a horrible hack to me.

      --
      Sunny

      Be my Friend

    28. Re:'scuse my ignorance but... by TheProcrastinatorTM · · Score: 1

      Part of Fabian Pascal's point is that SQL is NOT RDBMS - which is most (all?) of what he thinks is wrong with it. I skimmed the article, but I have read some of his stuff in the past, and that is what I recall.

    29. Re:'scuse my ignorance but... by kiwi_james · · Score: 1

      You can be pretty certain that when "Johnny Law" does enter your info the underlying data store is going to be a Relational Database (unless the system was built over 20 years ago).

      RDBMS's are fast - their use by large organisations all over the world for fast data retrieval is a testament to this. It certainly is possible to create a relational database that has incredibly poor response, but with good design and some straight forward tuning you can get great results.

      Of course, when I talk about relational databases I'm talking about implementations such as Oracle. The article talks about how these aren't really RDBMS's, but I'm sure that your "Wise Man" was talking about the real world implementations such as Oracle, Informix etc.

    30. Re:'scuse my ignorance but... by zBoD · · Score: 0

      Too bad...
      It was funny, until this line. Rahh :(

      --
      BoD
    31. Re:'scuse my ignorance but... by EvictedHellCitizen · · Score: 1

      May not be a problem with SQL, but a problem with RDMBS' in general. I'm talking about optimzers. Should the burden be on SQL's back to determine optimal execution paths? SQL was never intended to be robust--this is evident in SQL's lack of variables, and lack of procedure (loops, etc). It cannot be viewed as other languages such as C++. I think the key problem here is that RDBMS' are not stepping up to the plate and offering optimzer engines that will intelligently execute generic SQL efficiently. Say you have an application that is running slow. You load the SQL with Oracle hints, analyze your tables on a weekly basis to ensure the cost-based optimizer executes the query in a similar fashion each day. Well, now you are no longer dealing with 'SQL', but instead 'Oracle SQL'. This is a problem next week when you want to switch to DB2. Maybe Oracle should spend more time on optimizers, and attempt to develop a system that will execute SQL consistently and efficiently, according to whatever physical storage characteristics.

    32. Re:'scuse my ignorance but... by tanguyr · · Score: 1

      As a OO fanboy, I must protest your mischaracterizatoin of us. Unlike the DB fanboys, we never claimed that OO is suitable for every problem domain.

      Well said, and very true.

      What drives me nuts aobut the DB fanboys is that they have to use DB for everything. For example, I maintain an embedded system where some nitwit ex-web-developer decided to implement the process table with MySQL. Huh?

      You have my permission to hit him hard with the clue bat. Seriousely. While you're at it, give the manager who hired a web developer to do an embedded system a couple of whacks as well.

      --
      #!/usr/bin/english
    33. Re:'scuse my ignorance but... by CondeZer0 · · Score: 1

      "Practical Issues in Database Management" is an excellent book that every database user and developer should read.

      After reading that book I realized that what I hate is _not_ relational databases, what I hate is SQL. I wish there were a popular alternative to SQL, but the future only looks worse with XML databases and other abominations...

      The essence of XML is this: the problem it solves is not hard, and it does not solve the problem well.
      -- Phil Wadler, POPL 2003

      Now, when you combine XML and SQL... my worst nightmares become true.

      uriel

      --
      "When in doubt, use brute force." Ken Thompson
    34. Re:'scuse my ignorance but... by LurkerXXX · · Score: 1
      Which is just one more stupid rant on his part.

      Yes, SQL is not really relational.

      Before pissing and moaning about that fact over, and over, and over and over ad nauseum, please suggest a working system which *IS* a true relational system that we can use instead.

      He can't. There isn't one that works. If there was, we would all transition to it in short order.

      But there's not. He should stop the constant pissing and moaning about it, and going on and on how he is superior because he understands what 'true' relational stuff is, until he can damn well propose a WORKING alternative.

    35. Re:'scuse my ignorance but... by Anonymous Coward · · Score: 0

      Unfortunately, such wisdom has made it into a 100 EJB books out there.

    36. Re:'scuse my ignorance but... by JeffWhitledge · · Score: 0

      I'd like to add a couple of things to those last two points.

      When you say "no support for entity sub/supertypes" it is a specific indictment of SQL not RDBMSs in general. I think this one issue illustrates the weaknesses of SQL for those who aren't familiar with it. SQL cannot be used to implement an arbitrary-depth tree traversal using the scheme advocated by the previous poster. The SQL apologists should really try writing an arbitrary-depth tree traversal in SQL and see it for themselves. It can't be done without mucking around with temporary tables or some other nonsense. I have run into this problem a couple of times. Experienced SQL programmers hardly notice it, because they just learn not to think in terms of tree structures. But the problem is still very real. Relational databases are great at storing trees. SQL is terrible at retrieving them.

      The NULL handling is similar. SQL programmers learn how to avoid the problems with NULL (usually by making non-nullable columns), but the specific way in which SQL handles NULLs is very problematic, and it leads to buggy systems.

      --
      These comments do express the opinions of my employers, and, personally, I think they're complete rubbish.
    37. Re:'scuse my ignorance but... by Fuzzums · · Score: 1

      Yeah, I know what you mean. These kids can't wrap their tiny minds around the following concepts:

      * A table is not a class
      * A row is not an object
      * A column is not a property


      Ofcourse it's just 'data'. But I can instantiate an object with the data from the db...

      So then why does this cause the project to go wrong?

      --
      Privacy is terrorism.
    38. Re:'scuse my ignorance but... by Buzz_Litebeer · · Score: 1

      I know I know, same old thing.

      I am a DBS and do this work every day, and I was about to thrash you to the earth with what you typed.

      Then I read it was all a joke! Man, thats better than an april fools joke.

      --
      If you don't vote, you don't matter, so don't waste your time telling me your opinion
    39. Re:'scuse my ignorance but... by Anonymous Coward · · Score: 0

      If you want to use vendor-independent SQL, just use LDBC: http://ldbc.sourceforge.net

      LDBC traslates standard SQL to the various SQL dialect.

    40. Re:'scuse my ignorance but... by FoFi · · Score: 1

      I have not seen yet in this thread the link between domains and NULL.

      In a relationnal approach, any value that can be NULL (in an SQL application) can either be put in a separate relation/table with a foreign key (it is an additionnal property) or included in a specific domain (as an expected value, ie : domain surname is string or the special value N/A). If your properly define a domain that has a specific, meaningfull "NULL" value, you should also provide operators to deal with it, in particular '=' (comparison). Consider the NULL=NULL comparison : it can be answered by either True, or False, or Unknown. Having a special value for a domain make the result of the comparison predictable and non-dependend of what the DBMS "thinks" of NULL !

      Hence the problem with SQL is that you have no facility to build domains and the NULL value, and not the opposite !

      Concerning updatable view, the point I've seen raised by relationnal proponents is that updating total is not "impossible", but that should be rejected by the DBMS iff it breaks view definition. So operation-defined attribute/column can be updated as long as it still has the expected value.

      Sure it's a theorical approach but it allows to make every view updatable and to go to the real question : how to make every view updatable ? At that point, you are hit by other SQL deficiencies, In particular the fact that SQL rows are not unique in a table (definition of a key is not mandatory)...

      On integrity constaints, I do agree that application should not make any bad query ;) . But the point of a DBMS is to ensure that every data stored is sound and valid. I've read proposals to build a constraint engine in DBMS, that just pass to applications the constraints attached to insertion (in a complex insertable view...). That could make integrity checks easier to implement in multiple applications and multiple languages. Think also that you always can replace a badly designed application, but that is is nearly impossible to re-insert wrong data, especially 2 or 3 years later.

      Also a more thorough implementation of constraints will make query optimisation more aggressive in a few cases (as in any system : there is more optimisation opportunities if the system "understand" what you are talking about)

      Also the missing "support for complex integrity constraints" should be read as a no-no for integrity triggers. Triggers are a way to specify "how" to check integrity and not "what" to check. Most people on that point would like pure declarative integrity constaints. It also goes in the same direction of telling the DBMS what you are trying the store.

      Thomas.

    41. Re:'scuse my ignorance but... by TiggertheMad · · Score: 0

      I would just like to throw in my unwanted $.02, and say that of all the programming/expression/whatever languages I have played with over the years, SQL is about the most clunk and inelegant I have seen. Is this just me?

      --

      HA! I just wasted some of your bandwidth with a frivolous sig!
    42. Re:'scuse my ignorance but... by gfody · · Score: 1

      autonumber fields are also one of those abominations that supposedly break the relational model (hence no standard for what to call them or implement them)

      --

      bite my glorious golden ass.
    43. Re:'scuse my ignorance but... by fupeg · · Score: 2, Interesting
      Ch-ching! If I had a nickle for every DBA I've had to deal with who loved to say crap like :
      • If you would let The Database make that calculation, it would be faster.
      • The Database has a feature that can do that for you
      • If you're going to use that programming language, use this extension of it that The Database supports
      • Re-write that query as a stored procedure and it will be so much faster
      DBAs love to lock you into the database. Put more logic there. Use extensions so you're tied to a vendor. Squeeze out the extra millisecond of performance so that you'll have to continue to use Oracle or DB2 forever. Blah blah blah.

      The truth is that these guys want you locked in their product, because they have training on that produce. It's all about job security. These guys hate object-relational techniques because they turn databases into storage/retrieval devices. They REALLY hate simplified databases (especially MySQL) that don't have built-in XQuery support or Java stored procedures or binary content indexing or --INSERT USELESS FEATURE HERE--.

      It's all about job security with these guys. They've always got FUD prepared like
      • You can't rely on that other vendor's database because it will not scale
      • You can't rely on that other vendor's database because it will corrupt your data
      • Don't let developers write DDLs or they will crash the database
      • Moving business logic out of the database will cause huge performance problems
      It never ends...
    44. Re:'scuse my ignorance but... by amorsen · · Score: 2, Interesting
      Recursing through a classic SQL tree with parent-"pointers" is ghastly slow. It gets slightly less slow if you use a stored procedure, but if you do that you have confined yourself to one database vendor. (So much for SQL being a standard.)

      With "modified preorder tree traversal" you can do the whole recursive query, whether up or down, in one non-recursive SQL statement. Very neat, but it is a pain to implement and you better be careful about the atomicity of updates. So far I have not found a way to avoid locking the whole table for writes for some operations, even with databases which do proper transactions. The same algorithm could be implemented cheaply and with more fine-grained locking in the database itself.

      --
      Finally! A year of moderation! Ready for 2019?
    45. Re:'scuse my ignorance but... by Anonymous Coward · · Score: 0

      Well, I don't quite agree. A database table can (and in most cases should) be class. A row can be represented as a object. And a column is a property. Ever heard of POJO / POJI? JDO? Object Persistence?

      Just be carefull with references / foreign keys: don't load all referenced objects when reading a database row. Because if you do, you end up loading the whole database in most cases.

      And don't forget you shouldn't load all the objects if all you need is count them (or, if you need only the identifiers).

      The problem with EJBs is, object loading is really slow. But that's a problem of EJBs. The solution: don't use EJBs for simple object persistence. For a nice example on simple Java persistence, see for example JOR (Java Object Relation), http://jor.sourceforge.net

    46. Re:'scuse my ignorance but... by Unordained · · Score: 2, Interesting

      Technically, it's not a relational-database concern. Relation variables (tables) aren't sorted, order doesn't matter. Any time your query asks the output to be sorted, it's asking for something extra that isn't appropriate for relations, but -is- appropriate for list/vector/array/what-have-you.

      Yes, it's more efficiently done server-side. But a relational database server could, academically, be a full-blown RDBMS and not have any sorting abilities whatsoever.

    47. Re:'scuse my ignorance but... by amorsen · · Score: 1
      The big problem is that SQL is much like COBOL. COBOL can do everything that other imperative langauges can do, and some things are even easy (printing fixed-length fields, for one). However, that does not mean that COBOL is a particularly good language. Fortunately, any decent programmer can make a good attempt at a new imperative language, including making a compiler or interpreter for it. Therefore we got further than COBOL.

      The same decent programmer would be hard pressed to implement a database system with a new query language. This is simply not something you whip up in your spare time. So we are stuck with a million dialects of SQL.

      --
      Finally! A year of moderation! Ready for 2019?
    48. Re:'scuse my ignorance but... by nosferatu-man · · Score: 1

      It would be hard to mischaracterize Fabian Pascal more thoroughly than to call him an "Oo fanboy". He's a screaming kook, no dispute, but I would wager that he hates the OODB crowd as much, if not more, than he hates SQL.

      'jfb

      --
      To spur "enterprise Linux," Big Bang, the distributed two-phase commit.
    49. Re:'scuse my ignorance but... by Unordained · · Score: 1

      You're right. I should tell my users, starting today, that they can't tell me what's wrong with my app until they have a working patch for it. That's progress. (And it makes my life easier, ne?)

      He's an academic (of sorts -- Pascal is the more ... irritating of three, the other two being Date and Darwen): his purpose is to sit there and tell us how things -should- be, and let us do what we're good at, which is making things happen. Yes, it's annoying, but we -could- try being humble for a minute and realizing this is a cry for help. He sees stuff as sucking, he wants it to be better, and he's not a coder. We're coders, and too lazy to go looking for problems to solve. Seems like the two kinda go together, eh?

      The Third Manifesto book (Date / Darwen) actually proposes a new language, with reasoning behind each feature (and choices of the form "having either A or B would be sufficient, so the implementation can select whichever it likes best".) It's a boring book, and I'm not fond of their interest in trying to define an OO basis of their own (I don't think it's flexible enough, nor better-defined than current OO languages), but the rest is there.

    50. Re:'scuse my ignorance but... by Anonymous Coward · · Score: 0

      What's wrong the theorem that relational calculus and relational algebra are equally powerful? To hell with that!

    51. Re:'scuse my ignorance but... by D-Cypell · · Score: 3, Insightful

      Here's one that'll make you howl: "sorting is a presentation-tier concern"

      Largely it is. If the user hits a table header to sort on the selected row, are we are supposed go back to the database and do a different 'order by'?... I dont think so!

      If the sort is required purely to provide the user with a list in 'alphabetical' order then sorting in the presentation tier tends to be smarter because it reduces the 'bug-space'. That data generally passes through several levels of indirection, at any time someone may decide to replace an ordered collection with an unordered one and by the time it reaches the screen... it all out of sequence. Also, its very possible that while the data needs to be sorted to give users that 'fuzzy feeling' the same API can be used to provide a SOAP/XML-RPC/CORBA interface that doesnt require sorting (or rather, let the consumer decide). Why do this, fairly expensive, operation on a tier that doesnt always require that it is done?

      If sorting is required in the middle-tier it is usually due to some search algorithm or something. In this case, I prefer to put the sort with the search so that it is clear. Where I have worked with pure SQL (tend to use ORM tools now) I like to put the SQL in a seperate repository that the DBA's can tweak without recompliation. That 'order by' gets dropped pretty quick when the PHB is complaining about DB performance... and WHOOPS... really weird bug in the search code.

      So it seems to me, that when you factor in the real world issues around using the database as a sorting tool.... its not quite so 'howl-worthy'.

    52. Re:'scuse my ignorance but... by Chazmyrr · · Score: 1
      I *have* written an arbitrary depth tree traversal in SQL. The use of temporary tables is limited to caching results from row level security logic that determines which nodes a given user can access. It's not as difficult you make it out to be.
      /* Sample Script */
      CREATE TABLE t_object(
      i_object INT IDENTITY(1,1) NOT NULL,
      i_left INT NOT NULL,
      i_right INT NOT NULL,
      i_parent INT NOT NULL,
      vc_name VARCHAR(50) NOT NULL
      )
      GO
      INSERT INTO t_object (i_object, i_left, i_right, i_parent, vc_name) VALUES( 1, 2, 0, 'ROOT')
      GO
      CREATE PROCEDURE add_child @Parent INT, @Name VARCHAR(50) AS

      DECLARE @Right INT

      SET @RIGHT = i_right FROM t_object WHERE i_object = @Parent
      UPDATE t_object SET i_right = i_right + 2 WHERE i_right >= @Right
      UPDATE t_object SET i_left = i_left + 2 WHERE i_left > @Right
      INSERT INTO t_object ( i_left, i_right, i_parent, vc_name ) VALUES( @Right, @Right + 1, @Parent, @Name )
      GO
      /* End Sample Script */
      The above example is a basic implementation of a tree. I left out indexes, transactions, error handling, and so forth for the sake of brevity. The important thing to note is that a descendant of a node will always have left and right values between those of its parent. Move, Copy, and Delete are left as exercises for the reader.

      The real problem with SQL NULLs is SQL programmers that don't understand SQL NULLs. An ANSI standard SQL NULL has an unknown value. Since the value is unknown, any logical operation involving a SQL NULL will always return FALSE. NULL = NULL returns FALSE. NULL != NULL also returns FALSE. It can be inconvenient in WHERE clauses, but that is the standard.

      Chaz
    53. Re:'scuse my ignorance but... by tanguyr · · Score: 1

      So it seems to me, that when you factor in the real world issues around using the database as a sorting tool.... its not quite so 'howl-worthy'.

      The real world issue is that SQL has built in support for sorting and sorting algorithms in most big databases are highly optimized - furthurmore, the cost of ordering a recordset will almost always be minimal compared to the cost of generating that recordset. BTW, i noticed that the "order by" clause made it into EJBQL 2.0 as well, despite the howls of protests from object purists.

      --
      #!/usr/bin/english
    54. Re:'scuse my ignorance but... by LurkerXXX · · Score: 1
      I didn't say no one should point it out. Pointing it out is fine and good. Beating a dead horse is just annoying and useless. I'm an academic myself (not of sorts, a real academic), not a coder. I'll point out problems in discussions in papers I write, but I don't rant on and on endlessly about things like this.

      Point out the problem. Then suggest feasible ways to move forward if you have one. Ranting on and on in a forum like this (which is going to consist of almost entirely end-users) that someone is incorrectly using the nuances of the word 'relational' (without properly explaining it in the fist place in his rants!) is not just useless, but annoying. Just bitching over and over doesn't progress science. He likes to hear himself talk. That's about it.

      Sorry, but the Third Manifesto is almost a decade old. Nothing material has come of it. Academicians all over the world are working on the problem of finding a working relational system, and companies like Oracle and MS are throwing lots of money at it. Finding a working truly relational replacement for SQL is non trivial. A lot of very bright minds have taken on the task. One day I'm sure someone will come up with one. In the meantime, folks who are in the field already understand that SQL isn't really relational. Beating the dead horse isn't making anyone's work progress any faster.

    55. Re:'scuse my ignorance but... by Bozdune · · Score: 1

      Beautiful post, I loved it.

      A few years ago I moved a guy like that into a position where he could do no more harm. He had written about 6 inches of Oracle SQL which we ultimately replaced with a couple of streamlined programs and MySQL. To the end of his tenure with the organization, he never admitted that his approach was flawed, and he never owned up to the fact that his "solution" would have put us out of business.

      Along with DBA-ness comes "processor vendor-ness." Just as a DBA is deeply motivated to continue to flog his particular experience niche, so the "processor vendor maven" is deeply motivated to continue to flog his favorite hardware. In Our Hero's case, it was Sun. Sun could do no wrong. Even after we ripped out the Sun boxes and replaced them with Linux Intel boxes that were demonstrably faster and cheaper, this guy continued to preach Sun to whomever would listen. People finally stopped listening.

    56. Re:'scuse my ignorance but... by tanguyr · · Score: 1

      Firt off: a disclaimer - i'm a developer, not a dba. I don't have any certification in any database product, and i don't do any of that dba stuff like back up and restore, log management, etc etc. I do, however, know my way around a couple of rdbms softwares but that's because i've built on top of them and i'm curious by nature. Many developers think of the db as a mysterious black box and never seem to get past "SELECT * FROM [TABLENAME] WHERE [PRIMARYKEY] = X" type queries. They're idiots.

      Anyways, that's not the point - i want to talk about this whole "avoid vendor specific features" - "database independent" thing. I develop custom solutions that will run against a known database, not shrink wrapped software for distribution. My customers (i'm a contractor) spend big money on their database licenses - shouldn't i get them their money's worth by using every trick to a) code the app faster and b) make it more performant? Should i use an Oracle or DB2 like a very expensive version of MySQL (and, btw, i do use MySQL a lot, especially for smaller apps and prototyping). I've ported apps from oracle to MSSQL Server. It's stupid and tedious but it's not as bad as you'd think. I'm willing to bet that it's no harder (or easier) than porting apps from Websphere to Weblogic. Sure the big db vendors stuff a lot of "useless features" into their products - but i've seen middle tier developers spend man years(!) rewriting one of these useless features the db already implemented - who's being stupid there?

      --
      #!/usr/bin/english
    57. Re:'scuse my ignorance but... by zsau · · Score: 1

      What exactly is the problem with SQL?

      My lecturers and tutors at uni can't explain it so that I still don't understand it, even though I think I was able to pretend I had a pretty good working knowledge of it on my DB exam last week.

      --
      Look out!
    58. Re:'scuse my ignorance but... by Anonymous Coward · · Score: 1, Insightful

      Argh. No, a row is a truth predicate. The correct mapping is object-per-table, each row representing a possible state of the object. As an _optimisation_, objects with identical attributes ("classes of objects") can be stored in the same table - but if you find yourself doing one-row-per-object, I _guarantee_ your data model will suck for later ad-hoc relational analysis. It's a self-fulfiling prophecy in that case that "RDBMS Sux" - the bit that's missing is "when used by an OO weenie who doesn't understand that RDBMS is a persistent predicate logic rulebase".

    59. Re:'scuse my ignorance but... by RedWizzard · · Score: 1
      Just because two queries return the same results today do not mean that they will continue to do so in the future.
      You missed the parent's point here. The parent was meaning things like:
      select * from tab1 where col1 in (select col1 from tab2)
      versus
      select * from tab1 where exists (select 1 from tab2 where tab2.col1 = tab1.col1)
      versus
      select tab1.* from tab1, tab2 where tab1.col1 = tab2.col1
      These queries all return the same set of results in all cases, but may be excuted differently and perform differently.
      That's like trying to do math without a concept of zero. Sometimes, things just don't apply and we put "N/A" on the form and "NULL" in the database.
      The difference is that zero can be treated just like any other number in almost all cases. But in SQL, null cannot. It makes many queries much more complex for little gain. I've been dealing with RDBMSs and SQL for many years and I've come to the conclusion that null is evil, and in most cases we'd be better off without it.
    60. Re:'scuse my ignorance but... by D-Cypell · · Score: 2, Interesting

      sorting algorithms in most big databases are highly optimized

      Efficient sorting algorithms are hardly a closely guarded secret, it is possible to implement the exact same 'optimized' algoritm in any of the tiers. With this in mind, the performance costs of the two sorts cancels and you are left with the additional cost of the SQL parser. So assuming the programmer is capable of either A) writing an efficient sort or B) using the efficient sorts provided in the libraries that are supplied with most mainstream languages this option will always be more efficient.

      the cost of ordering a recordset will almost always be minimal compared to the cost of generating that recordset

      Exactly! So request the record set once, load it into memory and do whatever sorts are required. If I need to sort the data on different columns for different reasons the costs of generating the record sets begin to multiply. Maybe the data is in the database cache after the first query, but maybe its not and needs to be regenerated.

      'Order by' has been useful to me in situations where the query is combined with another clause such as one that limits the size of the dataset. Something like 'load the top 10, highest priority bug reports'. If I did this in code I would need to load them all then sort and discard, which is clearly not very bright, but using 'order by' just to sort data has always proved problematic in my experience. YMMV I guess.

      i noticed that the "order by" clause made it into EJBQL 2.0 as well, despite the howls of protests from object purists.

      The howls of the object purists tend to go beyond this and recommend against EJB altogether. Even so, I dont see any reason for not including it in the EJBQL spec, I am sure there are cases where it can be very useful. Its a feature that is provided by the database and not including it 'on principal' would be pretty daft, but using it 'because its there' doesnt constitute a strong argument in my book.

    61. Re:'scuse my ignorance but... by cybergrue · · Score: 2, Interesting
      I've met people who think that joins are best done in the middle tier, on tables with hundreds of thousands of rows!

      Have you run test to confirm that this is false? As part of a project to improve the performance of an app that I developed at work, a colleague and I rewrote several key select statements in different manners, and found the select statements using joins were the fastest by far (in a very large table). We think that the DBA team has the database tuned in such a way to cause this, but I have also heard that there is a query optimizer built into the engine (sybase) that works much beter with joins then nested selects.


      The only advice I can give is to run tests comparing the performance of different SQL statements that do the same thing to determine which is the fastest. The results may surprise you.

    62. Re:'scuse my ignorance but... by Anonymous Coward · · Score: 0

      returning :.curval

    63. Re:'scuse my ignorance but... by rodgerd · · Score: 0, Flamebait
      It gets slightly less slow if you use a stored procedure, but if you do that you have confined yourself to one database vendor. (So much for SQL being a standard.)

      Or use the SQL standard commands for creating hierarchies of data. Might help to have a clue what you're on about.
    64. Re:'scuse my ignorance but... by Anonymous Coward · · Score: 1, Informative

      multi user read consistency.
      online backup.
      roll-forward recovery.
      stored procedures, views, triggers, analytic functions, partitioning, function-based indexes ...
      Enterprise class support from the vendor.

      mysql just does not cut it yet.

      I don't care what the RDBMS is provided that we don't have to recode every function provided in database kernel-space in user application space (huge waste of time).

      the license costs are not what they used to be.

    65. Re:'scuse my ignorance but... by rodgerd · · Score: 1
      If sorting is required in the middle-tier it is usually due to some search algorithm or something

      Your experience is happier than mine. Mine is that sorting is done in the middle layer because some fucknut Java programmer finds joins and subselects too hard, so reimplements them poorly.
    66. Re:'scuse my ignorance but... by mpcooke3 · · Score: 1

      Hi, I am an OO fanboy.

      Remember OO fanboys like myself are usually using OO in an attempt to develop and maintain a large complex system amongst a team of developers.

      Particularly in an Agile team (we do XP) managing a relational database and SQL is a right pain. The model is completely orthogonal to OO and often you end up having to use the database for some reason or other when all you really want to do is persist a few objects the data is already described in the object model and now you have to describe it all over again for a relational model and then produce some kind of mapping system or mapping file.

      XP and many agile processes are about developing with change in mind. Eg You will get this wrong, or the customer will change his mind. Either way you should be able to change the system with low cost. Once you decide to use a RDBMS regardless of how you achieve the object to relational mappings maintenance becomes a monumental pain in the arse.

      You need to set up processes/scripts for automating database related stuff so anyone can create a database and modify preferably locally in their workspace to ensure the tests still run before releasing.
      Schema migration will inevitably need to take place at some point on a live database that can't be taken down, meaning your object to relational mappings may need to support two schemas mappings for a while. evem without the OO mapping part schema migration can be pain.
      Automated tests could previously be fully unit based but now become semi functional and possibly have to run on a copy of the database to stop them from messing up data.
      Speed can easily become an issue, objects are invariable in memory and doing operations on them is fast. The database may need to be optimized or the queries optimised an in-memory caching layer might need to be introduced.
      But worse there are problems related to shared knowledge in a small agile team where everyone needs to suddenly understands relations and SQL in addition to OO. This is essential otherwise you will have pairs/individuals blocking waiting for a DBA or really screwed up Database OO integration. Shared knowledge of the entire system is key to good XP whereas in other non-agile projects you just give the job to a good Database administrator.

      I have not seen many good alternatives to OO -> relational mappings.
      I am of the opinion that relational databases are a neccessary evil for OO data peristance because
      a) They are the standard for persistance.
      b) They can scale well.
      c) They have a reliable track record of not breaking.
      d) You can query them for more generic stuff or producing reports.

      Remember OO developers often don't want to "query" the db in a traditional sense. They just want to be able to pull out and put in objects maybe based on some property. They like to keep logic and data and presentation separate where possible (MVC) -sacrafices have to be made for performance sometimes

      Still I hope and pray that someone comes up with something better suited to OO development. Something like Prevayler maybe: http://www.prevayler.org/wiki.jsp

      Matt.

    67. Re:'scuse my ignorance but... by rodgerd · · Score: 1

      Yes, you can. Go read the standards. Or use Oracle's CONNECT BY, whichever you prefer.

    68. Re:'scuse my ignorance but... by MattRog · · Score: 1

      "When you say "no support for entity sub/supertypes" it is a specific indictment of SQL not RDBMSs in general."

      Yes, of course. Perhaps I should have clairified -- those are 'features' possible in a RDBMS but not in SQL; hence a SQL limitation.

      --

      Thanks,
      --
      Matt
    69. Re:'scuse my ignorance but... by TheBracket · · Score: 2, Informative
      Ugh, the lameness filter kept complaining about this!

      That's why sequences (as implemented by PostgreSQL and Oracle) are handy. Simply create a sequence, and call NEXTVAL(sequence_name) to get the identity of the next record. It isn't an autoincrementer (but can be used as one with default values), and you get the advantage of knowing what ID you will use before your insert - very handy for inserting a lot of related data at once. You can also do tricks like having sequences with different increment numbers, different base values, or even concatenate them with a string to get multi-master friendly safe replication.

      For a sequence/nextval to be useful, the increment must be absolutely atomic - that is, it must return the next value and increment without any chance of the same number being given to another caller. Oracle and PostgreSQL do this for you.

      You can simulate these in SQL Server with the following stored procedure (original source here: Sequence table: CREATE TABLE sequences ( -- sequence is a reserved word seq varchar(100) primary key, sequence_id int ); MS SQL Server stored procedure: CREATE PROCEDURE nextval @sequence varchar(100), @sequence_id INT OUTPUT AS -- return an error if sequence does not exist -- so we will know if someone truncates the table set @sequence_id = -1 UPDATE sequences SET @sequence_id = sequence_id = sequence_id + 1 WHERE seq = @sequence RETURN @sequence_id I've used this to great effect in SQL server. I've had to emulate it in Access once (yuck! it worked, though!). I've never tried it in MySQL, but it can probably be done.

      This page (see LAST_INSERT_ID(expr) talks about how to do it in MySQL. I don't use MySQL when I can avoid it; it's a great, fast database for high-read environments without triggers and similar, but I've never seemed to end up working in one of those!

      --
      Lead developer, http://wisptools.net
    70. Re:'scuse my ignorance but... by Brandybuck · · Score: 1

      Actually the manager who hired this web developer to do low level systems programming was himself a web developer.

      --
      Don't blame me, I didn't vote for either of them!
    71. Re:'scuse my ignorance but... by SnprBoB86 · · Score: 1

      I have always believed that

      A table is a class
      A row is an object
      A field is a property

      granted I am an OOP programer, but still... I can't even begin to understand why this is wrong for SIMPLE OBJECTS. I could see where concerns can come in with respect to inheritance and etc. But MS WinFS and the corresponding ObjectSpace namespace of the coming .NET 2.0 seems to be carrying the objects in a database idea to extreams and working well in the betas.

      --
      http://brandonbloom.name
    72. Re:'scuse my ignorance but... by boomgopher · · Score: 2, Interesting

      "Here's one that'll make you howl: "sorting is a presentation-tier concern"

      Yeah, well when you have to sort String values using oddball locale-specific methods it sure the heck is.

      --
      Your hybrid is not saving the environment. Its purpose is to make you feel good about buying something.
    73. Re:'scuse my ignorance but... by Anonymous Coward · · Score: 0

      SQL was designed to be English-like, and easy to write simple queries in. It's the sort of language that even a manager can follow. It's also a declarative language - in other words, you don't say *how* a task is done; rather, you describe the result you want, and the interpreter figures out how to do it.

      Unfortunately, SQL fails spectacularly when you progress beyond simple joins. With large queries it can be very, very tough to figure out just what's going on. As a result, SQL is often only slightly more readable than APL. The problem is structure: although the relational model is based on simple, logical rules, SQL lacks a comparable internal logic. English just doesn't cut it when you're basically dealing with logical predicates.

      We probably should be using relational algebra, or maybe relational calculus. Both have a structure on par with normal algebra - in other words, they are simple and clear, and have rules for manipulation and simplification. In fact, early SQL implementations were built on top of RA, and simply translated SQL into RA for execution.

      However, SQL is easier to beginners and managers to understand, so it won the language wars of the 1980s. It was a bad call, but the upside is that very good SQL programmers (like me - not that I'm bragging, or anything :) are rare as hen's teeth, and paid accordingly.

    74. Re:'scuse my ignorance but... by rycamor · · Score: 1

      One thing that everyone should understand: even though Pascal, Date, etc... argue that SQL is a bad implementation of the relational model, the *still* agree that it provides value, and that it is miles better than its hierarchical database predecessors. Since it is their job(s) to provide a reasoned critique of the field, it is only natural for them to rigorously compare SQL to the goals of the relational model. Many of their complaints fall in these categories:
      1. Does too much -- too many ways of doing the same thing, and too many unecessary operations that could be better done another way.
      2. Overly complex -- the SQL 1999 standard was something like 1200 pages.
      3. Allows programmer to circumvent relational integrity. Things like "hidden identifiers", pointers, etc...
      4. Too wrapped up in implementation -- users must spend a lot of time understanding the physical storage, rather than focusing on queries in abstraction.
      5. Many small inconsistencies in SQL itself

      But the problems with SQL are impossible to judge if you only know SQL. It's like the people who used to ask what was wrong with a perfectly good typewriter that made people want to use a wordprocessor. I suggest you do some reading. The absolute best simple explanations for these problems are in a couple online documents by Hugh Darwen at www.thethirdmanifesto.com. Look for "The Askew Wall", and "The Importance of Column Names".

    75. Re:'scuse my ignorance but... by Trinition · · Score: 1

      there just seems to be something about it that drives all the Oo fanboys up the wall

      Yeah, I know what you mean. These kids can't wrap their tiny minds around the following concepts:

      • A table is not a class
      • A row is not an object
      • A column is not a property

      Those points are indeed the problems. All too often those OO-types (wait, I'm an OO-type!) try to treat tables as classes, rows as objects, etc.

      But then again, they think everything is a nail when all they have is a hammer. I find that 90% of the times I encounter RDBMS persistence in code, there is no reason for the data to be in an RDBMS.

      There are a lot disadvantages to using an RDBMS as there are advantages. You have to weigh the cost of the impedance mismatch with the gains of the robustness of an RDBMS.

    76. Re:'scuse my ignorance but... by Tablizer · · Score: 1

      The problem with standards is their's so many to choose from. Or in the case of SQL every vendor seems to think that the standardized language is inadequate and yet they make no roads towards improving the standard. This leads to every vendor having their own superset of the language...

      The solution IMO is to create a simple functional-based relational query language that allows one to extend their RDBMS with functions that match another vendor. SQL's syntax is generally too complex to do this.

      Every relational operation can be defined simply as a function which takes in one or more of:

      1. A table (real or virtual)
      2. An expression which returns a scalar
      3. A scalar

      (Lists of columns would be represented as a small virtual table. I am not sure yet what the best shortcut syntax is for such is.)

      Expressions would then look something like:

      t1 = join(table1, table2, a = b)
      t2 = rowFilter(t1, x > y)

      Which may also be represented as:

      t2 = rowFilter(join(table1, table2, a = b), x > y)

      If vendor B had a relational function that vendor A did not have, then if one can define B in terms of existing relational operations, the DBA or open source community could simply supply an implementation of such a function to A.

      An experimental relational language called "Business System 12" attempted to do something roughly like this, but it appears their syntax was still more complex than need be.

      Either way, SQL is far more syntactically complex than need be. It's parse tree is a monster that only a Cobol mother could love.

    77. Re:'scuse my ignorance but... by Tablizer · · Score: 1

      I had my asbestos typing gloves all donned and was just cracking my knuckles when I got to that ["this was sarcasm"] line...

      I bet that post set a record for number of almost-sent messages being abandoned.

    78. Re:'scuse my ignorance but... by (negative+video) · · Score: 1
      If your properly define a domain that has a specific, meaningfull "NULL" value, you should also provide operators to deal with it, in particular '=' (comparison).
      And that gets you to the "what the hell does this operator do given these data types?" problem seen everywhere that operator overloading is allowed. The permutations can rapidly become overwhelming.
    79. Re:'scuse my ignorance but... by Tablizer · · Score: 1

      Here's one that'll make you howl: "sorting is a presentation-tier concern"

      But it is generally more practical to let a single system (the database) have sorting built-in rather than each client or application. Whether it is part of relational theory or not, having the DBMS do it as a standard utility is helpful. Consider it "post processing" on the database system perhaps to avoid fights over whether it belongs in relational theory. Even in SQL, SORT BY is usually the last action it takes.

    80. Re:'scuse my ignorance but... by sql*kitten · · Score: 1

      So then why does this cause the project to go wrong?

      Because they are reluctant to develop classes that don't have an underlying table (for "persistance") and they are reluctant to develop classes that instantiate from or manipulate more than one table (for "encapsulation"). This results in horribly inflexible applications that cannot be developed beyond the original spec without horrible kludges.

    81. Re:'scuse my ignorance but... by sql*kitten · · Score: 1

      The only advice I can give is to run tests comparing the performance of different SQL statements that do the same thing to determine which is the fastest. The results may surprise you.

      No no no, rather than doing the join in SQL at all they will select everything from table A, everything from table B into the application, then the application will iterate through them, performing the join a row at a time, in memory, between objects by comparing properties. Forget the database cache, forget the query optimizer, forget indexes. The usual justification for this is "platform independance".

    82. Re:'scuse my ignorance but... by Decaff · · Score: 1


      A table is not a class
      A row is not an object
      A column is not a property


      Good Java and C++ coders aren't saying that. What we are saying is...

      A table can be used to store instances of a class
      A row can be used to store an object
      A column can be used to store a property


      There is nothing inefficient about this - it can be a huge code saver. Instead of tens of lines of JDBC, I can, using Java Data Objects, get a row out using a single 'iterator.next()' call, and the JDO system can cache data for me.

      EJBs are nothing to do with database bottlenecks - they are to do with clustering and transactional integrity. There is nothing intrinsically slow about EJBs, just bad design by the coder.

    83. Re:'scuse my ignorance but... by Anonymous Coward · · Score: 0

      Don't be stupid. Pulling data out of the database to operate on it, when you can let the database give you the correct results is usually faster. Not just that, but it's independent of any particular front-end, which means you don't have to reimplement it if you want to add a web front-end or something.

      Why is it that implementing something in the database is considered "lock-in", but implementing something in a particular language outside the database is not considered lock-in?

      It sounds like you have a major chip on your shoulder with DBAs. Deal with it. Not everybody is a vendor lackey.

    84. Re:'scuse my ignorance but... by JamieF · · Score: 1

      >every vendor seems to think that the standardized language is inadequate
      >and yet they make no roads towards improving the standard.

      That's a pretty bold claim (equivalent to "no SQL database vendors are helping to improve the SQL standard"). I suspect that the major vendors have quite a bit of involvement in the ISO SQL standards process, but that they also have customers asking for features NOW, and demanding backward compatibility in new releases.

      If company A has a proprietary extension that provides a feature you need, and company B scolds you for not being willing to wait several years for a standard to come out that provides access to that feature in a sane and efficient way, what would you do?

      IMO the correct answer is that you should use proprietary hacks now (isolating them in the code so that they can be un-hacked later), and at the same time, you should lean on the vendor to implement the standards that have already been published, and to hurry up with the next version of the standard.

    85. Re:'scuse my ignorance but... by XMyth · · Score: 1

      Haha....I thought the sarcasm was pretty obvious from the get go...but then again I'm usually the only one who can detect my sarcasm, heh.

    86. Re:'scuse my ignorance but... by XMyth · · Score: 1

      Haha...=-P That was my intention...=)

      Strange how you made the same typo I did... "entire" instead of "entirely".

    87. Re:'scuse my ignorance but... by sql*kitten · · Score: 1

      It's all about job security with these guys. They've always got FUD prepared like

      Then explain what reinventing the wheel for every application like the EJB camp want to is? You've paid for the database, why not use as many features of it as you can?

      I laugh when the same developers pay lip-service to ideas like "code reuse". By that they mean, "let us write our own libraries and we'll use them on the next project too". Ever look on Sourceforge, wonder why there are so many GUI toolkits and X window managers and little scripting languages and text editors? 'Cos give any developer the chance and they'll choose making tools over using tools to get something useful done.

    88. Re:'scuse my ignorance but... by Khazunga · · Score: 2, Interesting
      Largely it is. If the user hits a table header to sort on the selected row, are we are supposed go back to the database and do a different 'order by'?... I dont think so!
      Then think again. It's far from a black and white question. If data transfer between persistence and presentation tier isn't costly, you're better off passing the sort to the database. It may have the data already ordered, either because the a similar query or subquery is cached, or because the database has an index on the sorted column. Moreover, if you're writing the sorting algorithm yourself, rest assured you won't do as good a job as your db vendor, whose code has already been pounded and cleansed by heavy usage.
      --
      If at first you don't succeed, skydiving is not for you
    89. Re:'scuse my ignorance but... by JeffWhitledge · · Score: 0

      Would you like to share your traversal with us? All you provided is inserts.

      A binary tree is useful for efficiently handling lists of stuff, but RDBMSs already handle the effecient storage of lists, so this example really doesn't suffice.

      Here is an excersize for you: create a table of employees with a reports_to_employee_id column. Then provide a SQL statement that will return everone who reports to a chosen employee, and everyone who reports to them, etc. It's easy to store; not easy to retrieve.

      --
      These comments do express the opinions of my employers, and, personally, I think they're complete rubbish.
    90. Re:'scuse my ignorance but... by tgrigsby · · Score: 1

      They should be tattooed onto the forehead of every OO programmer, so when they're "pair programming" they can read it off each other.

      OHMIGOD that's funny! It's too bad my moderator points expired yesterday or I'd kick down some karma. I'm making that my Scathing Critique For The Week.

      As for making objects that wrap around database concepts, those objects suffice to give RDMS the "relational" aspects they lack. By putting data into objects and using only those objects, they abstract the source of the data from the rest of the program -- a highly desirable application of EJBs. By doing this, the database can be altered dramatically, swapped out with another database, the dialect of SQL in each database can be accounted for, etc., without any modification to the code dependent on those EJBs.

      Perhaps some happy coder will create a truly relational layer to sit atop all these databases with all their quirks to smooth out the differences and provide the functionality that the databases don't. Allowed to control the implementation of schemas and made smart enough to optimize those schemas for each type of database it supports, this might even help with data flow bottlenecks.

      Just an idea. Feel free to get rich off it. Just be sure to thank me in the about box...

      --
      *** *** You're just jealous 'cause the voices talk to me... ***
    91. Re:'scuse my ignorance but... by afidel · · Score: 1

      How about they not implement something until they at least submit it as a draft standard, this way the other vendors can work off the draft standard to insure cross product compatibility.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    92. Re:'scuse my ignorance but... by JamieF · · Score: 1

      Back in the earlier days of Java (1.1.6-1.1.8 era, 1997-98 I think) there was very little in the way of off the shelf persistence management. You had servlets and JDBC and that was about it. Everybody rolled their own database interface layer, and basically what worked well was to have the hand-written SQL hidden in a small set of classes that each know how to various persistence operations for one object. This practice has been canonized as the Data Access Object pattern and works pretty well.

      Even with modern persistence frameworks like Hibernate (highly recommended, BTW) that have caching, you can hide that activity in a DAO and decide later whether you need to replace the autogenerated SQL with some fancy hand coded stuff. Complex object-relational mappings (i.e. not a simple table=class setup) are actually expressed declaratively so that's not something the coders have to worry about.

      The point is that it's encapsulated, so any "impure" decisions that you must make in order to get the system working properly for real-world use are very localized.

      I would say that mapping the object model to the data model directly in such a way that you can't change it later is actually a counterexample of encapsulation. Ideally they should look pretty similar but be mapped by some intermediate layer that can adjust for the differences.

    93. Re:'scuse my ignorance but... by JamieF · · Score: 1

      They can get platform independence (which I can only guess means database product independence in this case) by using a persistence framework that supports multiple back-ends.

      I've seen this sort of thinking before but not at this level of boneheadedness. Application developers in general want to live in application-land and not leave, possibly because application-land involves a ton of complicated technologies that they already understand and they'd rather not have to learn how to make DHTML do table sorting or how to optimize a query or analyze a query plan. That's understandable if not ideal for the project.

      The problem is clearly the "everything looks like a nail" issue. In this case it's even worse because full-featured (="not MySQL") SQL DBMSs have a bunch of features that were added to minimize the pain of client-server architectures, and these features can be useful in modern multitier architectures. I find T-SQL and PL/SQL to be pretty wimpy languages, but it doesn't take a month to learn enough of them to write a trigger or stored proc that can make a gigantic difference in terms of performance.

      The thing that worries me more is the approach of using the database as little more than a transactional flat file, with no constraints and no thought given to the data model other than whatever's easiest for the person writing the load/save code. That sort of approach is very risky from a transactional sense (transactions happen for individual row updates atomically rather than for the whole multi-object operation, forcing developers to use an application-layer transaction manager for what could be a simple transaction at the DB layer) and from the sense of longevity (it's unsafe to touch the database unless you go through the application's persistence layer).

      Usually what this ends up with (in my experience) is a situation where application developers slowly reimplement[1] the stuff that's already in the DBMS, except they do a worse job and miss a whole bunch of stuff that the big SQL DBMS vendors worked through and nailed down in the mid- to late 90s.

      My own solution for this was to take a job working on an application built with a wimpy RAD scripting language, to "see how the other side lives" so to speak, because there's really no way to avoid leaning heavily on the DBMS in such an architecture. That certainly validated my inclination to be anal-retentive about declarative constraints, and actually we only needed two triggers.

      I think the biggest thing that teams can do going forward to avoid this sort of problem is to make DMBS skills a mandatory part of the architect's skill set, and to use hard-skills tests when hiring developers. The architect needs to have SQL skills beyond trivial select/update/insert/delete, and somebody on the team should be a "database developer". To me that role means that you're not an application developer and not a DBA, but somewhere in between - the person who writes (or approves) the hairy SQL statements, writes (or approves) the data model and changes to it, and perhaps most importantly, who knows how to look at a slow chunk of code that hits the DB and can fix it, whether that means a stored proc or changing the application code or data model, or calling the DBA over for some low-level optimization (which in my experience is never the problem). This sort of person is apparently also most likely to be found riding a unicorn because I've not had a lot of success finding them, but they are out there.

      [1] on a project-by-project basis, on their employer's dime, instead of improving the user-facing functionality of the application

    94. Re:'scuse my ignorance but... by JamieF · · Score: 1

      >It's all about job security with these guys. They've always got FUD prepared like
      >* You can't rely on that other vendor's database because it will not scale
      >* You can't rely on that other vendor's database because it will corrupt your data
      >* Don't let developers write DDLs or they will crash the database
      >* Moving business logic out of the database will cause huge performance problems

      Based on your other posts and the fact that you keep sticking up for MySQL and disparaging the features that it lacks, I'd have to guess that in each of these cases the DBA was right and you were wrong.

      MySQL cannot scale up as well as Oracle, DB2, or MS SQL Server. That may not matter much for most projects, but for folks at the UPS and Wal-Mart end of the scale, it matters.

      MySQL even with InnoDB has major flaws and missing features that make it a lot easier to corrupt your data compared to other SQL DBMSs. This would be excusable except for the fact that MySQL happily accepts DDL and DML that uses these features, and quietly does the wrong thing. Would you use a crypto product that claimed 128-bit keys but just used ROT-13? The vendor assures you that most people don't really need encryption, and hey, it's really a lot faster!

      Moving logic out of the database very often can do horrible things to performance. You are aware that in-memory access is faster than getting something over the network, right? DBMSs also use very carefully tuned algorithms and data structures, which brand-new application code typically does not. Even if it does, why are you duplicating code that already works?

      Whether your DDL crashes databases, I can't say. I've never seen DDL that bad. I've never seen a database crash. (I have seen Java code that wedged an E450 so that it was totally unresponsive and frozen and had to be manually power cycled in order to get it running again.)

    95. Re:'scuse my ignorance but... by Chazmyrr · · Score: 1
      Sorry, I thought the traversal was apparent from the structure of the tree. The reports_to_employee_id is i_parent from my example. The tree table is more expensive to initially build than a simple parent child table but much faster to retrieve from.

      add_child 1, 'Bob'
      add_child 2, 'Carol'
      add_child 2, 'Ted'
      add_child 3, 'Alice'
      add_child 4, 'Bill'

      Carol and Ted report to Bob. Alice reports to Carol. Bill reports to Ted.

      Your table looks like:
      i_id i_left i_right i_parent vc_name
      1 1 12 0 ROOT
      2 2 11 1 Bob
      3 3 6 2 Carol
      4 7 10 2 Ted
      5 4 5 3 Alice
      6 8 9 4 Bill
      Here's the retrieve.
      CREATE PROCEDURE list_descendants @ID INT AS

      DECLARE @Left INT, @Right INT

      SELECT @Left = i_left, @Right = i_right FROM table WHERE i_id = @ID
      SELECT * FROM table WHERE i_left > @Left AND i_right < @Right
      Simple, no?

      As a side note, I generally use a clustered index on i_left, i_right with a unique index on
      i_id.
    96. Re:'scuse my ignorance but... by plover · · Score: 1
      That's actually a very common trick in the mainframe world to radically improve performance. It's usually combined with dumping the table to a flat file beforehand. Nested selects suck CPU at the same rate an application would. Raw code, with the same knowledge of the data as is required to perform the join, can be far, far faster than torturing SQL to do what you need.

      A buddy of mine who consults to our mainframe group on performance issues recently was given a deeply nested join (6 or 9 levels) that the predictor estimated would take hundreds of years of CPU to run. By simply replacing the straight SQL with parameterized SQL they dropped it to 25 hours of run time. Still not good enough for a nightly job. Dumping the tables to flat files, performing the function in code, and recreating the table with the data took 20 minutes of CPU, which was finally an acceptable number for the job.

      He's just annoyed that he doesn't get a cut of the $$$ savings when he pulls these rabbits out of his hat.

      --
      John
    97. Re:'scuse my ignorance but... by sql*kitten · · Score: 1

      the database can be altered dramatically, swapped out with another database, the dialect of SQL in each database can be accounted for, etc., without any modification to the code dependent on those EJBs

      In around 8 years of developing Oracle and Sybase applications for large companies, I've NEVER come across a case in which switching database mid-stream was even considered. I have however seen thousands upon thousands of man-hours wasted on "database independance". It's a wild goose chase, one of those things that sounds like it might be a good idea, but has no practical worth. You think Ford's engine designers lay awake at night wondering how they can be compatible with Honda exhausts?

    98. Re:'scuse my ignorance but... by sql*kitten · · Score: 1

      I would say that mapping the object model to the data model directly in such a way that you can't change it later is actually a counterexample of encapsulation.

      Oh, I quite agree. Doesn't stop people doing it, tho'. Data needs to be stored, but often objects themselves don't - often objects are just handy containers for algorithms, you feed them data (or instantiate them with data), they do some processing, then you store the massaged data or pass it on to the next algorithm. In this case persistence is a red herring and the objects themselves should be stateless. Instead, the objects are made stateful, and the only way to store the data is to persist the object, which is massively overcomplex and performs terribly.

    99. Re:'scuse my ignorance but... by sql*kitten · · Score: 1

      somebody on the team should be a "database developer".

      Luckily, my present project does this (I'm one of these people). I've done client-server and n-tier development, and I've been a DBA and a sysadmin, now I'm inbetween the two camps, able to speak to both of them in their language, and (usually) with the authority to say "this should be done in OCI" and "this should be a trigger". But you're right, people who have this broad experience are quite rare, there's far too little cross-pollination between programmers and DBAs. Mostly in fact the two sides see each other as enemies, which is a shame, at the end of the day our job is to deliver an application, not to bicker.

    100. Re:'scuse my ignorance but... by sql*kitten · · Score: 1

      That's actually a very common trick in the mainframe world to radically improve performance.

      I didn't know that. I suspect it's because mainframes have amazing I/O and relatively not much processor. Smaller systems are more balanced, their processors are much faster relative to their I/O subsystems than mainframes. It would make little sense to use this trick on a Sun.

      But, none of the people I've seen do this are old mainframe hands - they're kids who got taught Java in their CS classes and think EJBs are the only way to write serious apps...

    101. Re:'scuse my ignorance but... by tgrigsby · · Score: 1

      I have, or more to the point, I have come across situations where one of our customers decided they were standardizing on a database the rest of our customer didn't have. For example, our software was designed to work with SQL Server, but one of our larger customers standardized on Oracle. It happens, and the way I architected the software allowed for an easy customization. In fact, we didn't have to touch 98% of the code. Not bad, huh?

      --
      *** *** You're just jealous 'cause the voices talk to me... ***
    102. Re:'scuse my ignorance but... by tgrigsby · · Score: 1

      Oh, and as far as "man hours wasted", that same software also works with MySQL, PostgreSQL, mSQL, Pervasive SQL, and Informix, with very little extra effort.

      If you saw thousands of man-hours wasted on database independence, you saw a badly architected system. Try using a class factory and listed templates. If a new database came out with a unique dialect of SQL, I could program in support for it in two to three days, tops. And since there was no SQL hardcoded anywhere in the applications that used the toolkit, all one had to do was recompile.

      Bingo. Database independence.

      Last but not least, I run the Anti-Ford page, so don't get me started about what Ford's engine designers lay awake at night doing....

      To get your free transcript of tonight's program, dial 1-976-MAD-SKLZ. Operators are standing by, have your Visa, Mastercard, American Express, Diner's Club, Chevron, or library card ready....

      --
      *** *** You're just jealous 'cause the voices talk to me... ***
  5. Wht not.. by Anonymous Coward · · Score: 0

    ..use a Relational DBMS like PostgreSQL?

  6. My Two Cents by Doesn't_Comment_Code · · Score: 3, Insightful

    I think relational SQL databases are just fine, easy enough to use, and serve their purpose very well. They DO take some serious thought when designing tables and queries - but we shouldn't be afraid to think. If your head hurts from SQL, keep on it for a couple more minutes and you'll probably have it! If your head hurts from SQL, you've tried thinking about it - and you still don't get it - you're probably in the wrong business. Complex information retrieval is complicated and sometimes difficult to understand.

    On to the next part. XML serves its purpose very well. Although I wouldn't consider XML and SQL to serve the same problem sets equally well. There are certain situations where SQL is perfect. And there are other situations where XML is preferable. If you think of the two as two sides of the same coin, I think you're making a mistake. Likewise, you can't just flip between the two on a whim. Choose the format that's most suited to what you want to do and go forward.

    It aint broke and don't need fixin'

    --

    Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.
    1. Re:My Two Cents by Anonymous Coward · · Score: 0

      > we shouldn't be afraid to think.

      Ok, great advice. I think...

    2. Re:My Two Cents by lukewarmfusion · · Score: 1

      I'm surprised that you were modded up - you're defending something that a lot of people here don't like.

      I've used Oracle and MS SQL Server in environments ranging from web sites to enterprise apps. The biggest problems I've seen are not usually problems with these DBMS; they are problems with the application.

      However, a financial guy from a major automotive corporation told me of their latest problem. Each pay period, there are seven transactions that take place for each employee. That chalks up to millions of transactions a week, and they are having problems processing all those. Since the largest databases I've personally experienced involve only a few million records total, I haven't seen the effects of volume processing first hand.

      I think XML is a step backward, if you're using it to replace a SQL Server database.

    3. Re:My Two Cents by Doesn't_Comment_Code · · Score: 2, Interesting

      I've used Oracle and MS SQL Server in environments ranging from web sites to enterprise apps. The biggest problems I've seen are not usually problems with these DBMS; they are problems with the application.

      I've mainly used MySQL for my large projects, and like you, those projects have not had more than several million entries. But my experience with SQL has been wonderful. I have spent several hours pounding my head on a desk trying to understand why something did or didn't work. But all my applications have run as smoothely and as quickly as one can expect.


      I think XML is a step backward, if you're using it to replace a SQL Server database.
      I agree - provided your application is best suited for SQL to begin with. There are some small problems where flexibility (XML) is more important than the great features SQL has to offer. I can't see a large number of XML databases at the core of huge operations though.

      --

      Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.
    4. Re:My Two Cents by Bozdune · · Score: 1

      Your finance guy is having the same problem that everyone has with database performance when the row count gets high. Commercial database products do not perform efficiently on many millions of records, no matter what they claim.

      Pascal and Date wave their hands at this, blaming vendors for failing on the implementation side - "if only they really understood what they were doing," blah blah blah. Pascal of course then admits, "I'm not a programmer." Pascal sees nothing in relational theory that presupposes that a database should slow down when there are millions of rows. Instead, he blames vendors for "mixing physical and logical models," which is evidently the "fault" of SQL.

      I'm not smart enough to know if he's right or wrong. I am smart enough, though, to figure the chances of a database company starting from scratch and flogging a new query language. "Not good," would be my assessment, unless the product could do something remarkable, like provide reasonable join performance on huge datasets.

      Don't get me wrong: I find Pascal and Date amusing, useful, and precise. I love Date's dismantling of UML and OO terminology. I enjoy Pascal's digs at idiots who clearly don't understand anything. But it's put up or shut up time for both of these guys, and Darwen as well. Let's see them take their ideas into practice. I'd love to see an RDBMS with OLAP performance, for example. If it's so easy, bring it on!

      In the meantime, we'll suffer with what we've got, I guess. Maybe if I make a bazillion bucks someday I'll throw a few million Pascal's way and see what happens. Whatever comes out of it, it's likely to be amusing!

    5. Re:My Two Cents by ciggieposeur · · Score: 1

      Your finance guy is having the same problem that everyone has with database performance when the row count gets high. Commercial database products do not perform efficiently on many millions of records, no matter what they claim.

      That's sort of true, but not entirely. I've worked on DB2 databases that have a few million rows each in several token tables and billions of rows in the star / join tables. With the right indexes in place and of course a schema designed for high volume performance is certainly "reasonable", in which "select * from BIG_TABLE where ... " can return in merely seconds.

      The problem is that at this size you have to resort to vendor-specific features to keep things chugging along. For instance, you pretty much *have* to insert rows using the custom bulk loader that can temporarily turn off constraint checks. You also need to map the physical layout correctly to the medium, so that database pages are aligned on a good boundary for the SCSI-like IO subsystem.

      If you're still curious, there's a great book on DB2 optimization called "DB2 High Performance Design and Tuning". I'm sure Oracle has a similar volume out somewhere.

    6. Re:My Two Cents by Angst+Badger · · Score: 1

      It aint broke and don't need fixin'

      This attitude is so wrong in so many ways that it occasionally baffles me that it's so popular.

      My feet ain't broke, either, but if I want to get from Portland to Atlanta in less than a year, I'm going to take a plane. My next big programming project probably isn't going to be in 6502 assembly language, either.

      A better rule might be, "If it can be improved, improve it," with the caveat, often ignored, "Make sure it's really an improvement." Meanwhile, the don't fix it crowd can write SQL queries by the light of the fire they started by rubbing two sticks together when something better than SQL comes along.

      On to the next part. XML serves its purpose very well.

      Actually, it's fundamentally flawed. It would have been far better to define a specification language for describing the structure of existing binary representations, and some sort of procedural language for making transformations between them, instead of stuffing everything into a hierarchal format -- as if all, or even most, data was merely hierarchal -- that makes uuencoding look like a form of compression by comparison. But we probably do have to live with that now that the whole industry is on the bloody bandwagon, just like we have to accept the more ridiculous excesses of OO, another good idea carried to its reductio ad absurdum by obsessive-compulsive freaks bent on completeness and symmetry at the expense of basic common sense.

      --
      Proud member of the Weirdo-American community.
    7. Re:My Two Cents by Bozdune · · Score: 1

      Sure, and it's also possible to get great performance from SQL Server, Essbase, SAP Data Warehouse, or a dozen other ROLAP, MOLAP, or HOLAP products. Problem is, mere mortals like that poor finance guy aren't going to be able to get there from here. The accounting system vendor drops Crystal Reports in his lap, tells him to "write any report you want," and then he sits there for hours wondering why his first join hasn't finished yet.

      Thanks for the note! I guess Pascal and Date will solve all these problems for us, so no worries... ;-)

    8. Re:My Two Cents by lawpoop · · Score: 1
      "If your head hurts from SQL, keep on it for a couple more minutes and you'll probably have it!"

      Another thing you can do is get an SQL Wizard for your desk. If you ask him any *well phrased* question, the answer will magically come to you in moments! Remember, don't tell him your problem, but rather /ask him a question/.

      --
      Computers are useless. They can only give you answers.
      -- Pablo Picasso
    9. Re:My Two Cents by Anonymous Coward · · Score: 0

      10 out of 10 Terrorists agree - Anybody but Bush in 2004

      Are you kidding? George W. Bush is the best thing that could have ever happened for Osama bin Laden, and vice versa. There will be a (successful?) terrorist attack in the US leading up to the election for this very reason... Dubya wants to be re-elected, and Osama wants to keep his own recruits flooding in.

    10. Re:My Two Cents by leandrod · · Score: 1
      > relational SQL databases

      This is a contradiction in terms.

      --
      Leandro Guimarães Faria Corcete DUTRA
      DA, DBA, SysAdmin, Data Modeller
      GNU Project, Debian GNU/Lin
  7. Link to history of SQL.. by MisanthropicProgram · · Score: 5, Informative

    Here is a history of SQL. I wanted to check the article's facts. Also, I was curious... History of SQL

  8. Ignores the good points of SQL by b0lt · · Score: 3, Insightful

    Easy to use, easy to debug, easy to understand, powerful. Isn't this good enough?

    --
    got sig?
    1. Re:Ignores the good points of SQL by jayhawk88 · · Score: 1

      This is a good point I think. Reading through the article, you get the sense that this Pascal dude is a real BOFH type, the King on His Throne, deigning to gift some Holy Computer Knowledge onto the plebians if and when he sees fit.

      Is SQL the perfect database query solution? Obviously not, but show me the perfect solution anywhere in computing. Hell, give me a week and a buzzword dictionary and I can write up a similar article decrying binary as the root of all Original Sin in computing. Looking for a better way is fine, but at some point you have to make concessions to the real world.

  9. SQL sucks? by localman · · Score: 5, Insightful

    From the article:

    It has been quite obvious that the designers of SQL had little understanding of data fundamentals in general, and the relational model in particular

    Gimme a break. Love it or hate it, SQL is an amazingly powerful way to work with arbitrarily complex data sets. Need proof? It is the backbone of nearly every non-toy scale data storage project. No amount of psuedo-academic argument can make irrellavent the fact that it works.

    Everybody goes through a phase where they bitch about SQL. So did I. And I built a clever OO DataModel module that abstracted it into pretty heirarchies and all sorts of clever crap. Then I tried actually building systems with it and realized I was better off with ugly ol' SQL.

    I've been part of too many projects where people pulled out the UML books in favor of a decent First Normal Form DB design and led the team down the tubes.

    I'm not saying these other methods don't have their place -- they do. But they aren't going to displace SQL because it has it's place also. And it's place isn't theoretical, it's been practically demonstrated a million times.

    Cheers.

    1. Re:SQL sucks? by Karamchand · · Score: 1

      Sorry, but while SQL might work - just because it works that doesn't mean it's actually very good at its job.

    2. Re:SQL sucks? by kpharmer · · Score: 5, Insightful

      Keep in mind that Fabian Pascal is generally considered a crackpot purist. He's been insisting for years that there is no such thing as a relational database product - since none implement a purely relational model.

      However, he hasn't delivered an exaple of one, he hasn't clearly articulated the differences between his vision and the commercial options, and he apparently refuses to acknowledge that some problems in life fail to fit well into the relational model (hierarchies, networks, inheritance, etc).

      Much of what he, Celko, and Date complained about were actually responses by vendors to adapt to the real world. They were somewhat successful - and now SQL can be used successfully to solve a far greater set of problems than Pascal has ever admitted exist.

      A perfect example of this nonsense is there insistence that good indexing in a 3NF model outperforms denormalized data in a star schema. Sounds great, absolutely doesn't work. Across eighteen years working with relational databases I've never seen their suggestions work. Of course they have a response to this - the vendors should just "make the databases faster" - like it's fucking magic or something.

      Of course, this isn't to say that he's wrong about xquery - trying to work with unbalanced networks or hierarchies in which the rules change change throughout the schema causes a few problems.

      We already have extensive support for recursion & networks in the more powerful RDMBS (db2 & oracle for sure). But combining that with data structures supporting optional branching, complete lack of declarative constraints, optional rules, etc - sounds like something that will never work well.

      And going back to the days in which you have to spend a day writing code against a hierarchical database in order to answer a simple question sucks.

      Hmmm, haven't people gotten tired of the xml hype yet?

    3. Re:SQL sucks? by StraightTalkExpress · · Score: 1

      Everybody goes through a phase where they bitch about SQL. So did I. And I built a clever OO DataModel module that abstracted it into pretty heirarchies and all sorts of clever crap.

      To be fair, Date and Pascal hate OO even more than SQL. Their argument is that SQL sucks at modeling relational algebra because it's based on set theory rather than predicate logic. Your point about SQL being too entrenched to be displaced and proven by use may still be correct, though.

    4. Re:SQL sucks? by Tassach · · Score: 4, Insightful
      people pulled out the UML books in favor of a decent First Normal Form DB
      I'm not sure I'd ever use the words "decent" and "First Normal Form" in the same sentence.

      In 15+ years as a database developer, I've never seen a non-trivial problem which could be modelled correctly in 1NF. In my experience, 3NF is pretty universially considered to be the default level of normalization. Any decision to deviate from 3NF, either up or down, should be documented and backed up with a solid engineering case as to why it's necessary.

      --
      Why is it that the proponents of "one nation under God" are so eager to get rid of "liberty and justice for all"?
    5. Re:SQL sucks? by Doctor+Faustus · · Score: 1

      Much of what he, Celko, and Date complained about were actually responses by vendors to adapt to the real world.

      Joe Celko seems to be particularly fixated on Identity/AutoNumber/Serial/AutoIncrement/<whatever other name is used> fields. I was arguing with him about it on a newsgroup a few weeks ago, here.

      He stopped responding when I said I thought that the dumb keys were better in a relational sense than using real data to link tables, and that I thought using real data violated the first and second normal form rules. I'm not sure if he decided I might have a point, or that I was an idiot who wasn't worth reponding to; I suspect the latter.

    6. Re:SQL sucks? by Guillermito · · Score: 1

      > people pulled out the UML books in favor of a
      > decent First Normal Form DB design

      In favor of a *decent* First Normal Form DB design?

      First normal form is regarded as *very bad* design. You could take that design, transform it (normalize it) and get it up to the Fifth Normal Form. However, I guess you can call a Boyce-Codd Normal Form decent. (Boyce-Codd is between the 3rd and the 4th Normal Forms)

    7. Re:SQL sucks? by jellomizer · · Score: 1

      It depends on the job. Many times your data just needs to be in a grid and you need to weed out the excess data. SQL is great for that. Actually there are a lot of good uses for it. Yes XML is really fancy but for most applications it is actually to complicated for the needs. Plus if you traverse to many layers say in XML it can get confusing to the programmer and make it harder to clean the data in the future.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    8. Re:SQL sucks? by Karamchand · · Score: 1

      Actually I am not saying XML rules (personally I even dislike XML;) but I don't think it's good to say "SQL works, so keep it". In my opinion, even though SQL works for most of the tasks we should continue to search for something better (which certainly exists!), not rest on our laurels.
      Regards.

    9. Re:SQL sucks? by orderb13 · · Score: 1

      Ok, time to expose my ignorance. Being a programmer (and gerenally simply told "This is how the DB is, deal with it), all I ever really need to know is what the first 3 normal forms are, but I'm curious. What are the other normal forms?

    10. Re:SQL sucks? by Luyseyal · · Score: 1
      Like this one?

      -l

      --
      Help cure AIDS, cancer, and more. Donate your unused computer time to worldcommunitygrid.org. Join Team Slashdot!
    11. Re:SQL sucks? by Skjellifetti · · Score: 1
      It has been quite obvious that the designers of SQL had little understanding of data fundamentals in general, and the relational model in particular.

      Date has been hammering on this point for nearly 20 years:
      A word of warning: DB2 and systems like it are "state-of-the-art" products, which is why we discuss them here. However, that staement should not be taken to mean that those systems are ideal in any absolute sense. In particular, they do not support all aspects of the underlying theory (i.e. the relational model -- see Part 3 of this book). In certain respects, most present-day systems are regrettably ad-hoc [pg 96].
      In the late 1980s, Date semed more taken by INGRES which used QUEL as its query language:
      QUEL may be regarded as a fairly "pure" implentation of the relational calculus; as such, it is a considerably less idiosyncratic language than SQL, and indeed is clearly superior to SQL in some ways [pg 209].
      Date worked for IBM until 1983 where he worked on SQL/DS and DB2. He has written a large number of well respected papers on data base theory. Even if you don't agree with what he writes, he is definitely worth reading.

      Date, C.J., An Introduction to Database Systems, Fourth Edition, Reading, MA: Addison-Wesley Publishing Company, 1988.
    12. Re:SQL sucks? by Guillermito · · Score: 1

      See this for a short (and practical) explanation. For an in depth answer use any of the books mentioned in the references. Any decent Database Theory book usually devotes an entire chapter to these higher Normal Forms.

    13. Re:SQL sucks? by Anonymous Coward · · Score: 0

      Maybe instead of dismissing Pascal's comments as a "psuedo-academic argument", you should read some of his books where he fleshes things out in greater detail.

      Try "Practical Issues in Database Management - A Reference for the Thinking Practitioner", by Fabian Pascal (dude who wrote the article). Read from p 94, and learn about how SQL is a wonderful system for working with relations that was royally fucked by people that just didn't understand the principles of data structuring.

      Of course SQL works, it was designed by some brilliant people. It'd work even better if it wasn't for all the design-by-commitee shit that got tacked on afterwards (like allowing duplicate rows)

    14. Re:SQL sucks? by iabervon · · Score: 2, Insightful

      Actually, SQL is the interface to every non-toy scale data storage project. Non-toy databases are not SQL inside, but use SQL as the high-level language that they compile into their execution plans. SQL is the sole interface language supported by databases with efficient internal implementations, and therefore all of SQL's flaws are overwhelmed in the marketplace by the fact that it's what the good software uses. It's like the named.conf format, which is widely used, but only because it's what BIND uses, not because it's actually good.

      If Oracle started supporting a more friendly API, people would use it instead. I've actually written (for a former company) an OO system for putting together relational queries. With this, I wrote a reporting engine such that you could define a report in an XML file, and it would build the right query and organize the data. It ran better than hand-written SQL queries, and was much easier to write. Of course, it used a SQL back-end to interface with Oracle, but it would have been more efficient if it didn't have to build up a huge query string and then have Oracle parse it. It also would have been nice if I weren't constrained by SQL's limitations. Why can't I have a temporary table with completely specified contents nested in a SELECT statement? It would greatly improve my implementation of "average(x), index(range) for y in range in list(ranges)" (This is just a gloss; the actual code was a bunch of Java applied at various points to a query object.)

    15. Re:SQL sucks? by Anonymous Coward · · Score: 0

      Interesting, and proof that Joe Celko is an idiot more interested in academic arguments than real world systems. Any app that uses business data (ISBN, SSN, VIN, UPC, etc) as keys is pretty much guaranteed to break.

    16. Re:SQL sucks? by Anonymous Coward · · Score: 0

      All these comments stating that SQL is ok the way it is miss one thing: mathematically it is an incomplete and inaccurate description of the relational model.

      Those of you that have actually read at least part of Date's textbook on database theory should realize that.

      They are not asserting that SQL doesn't/hasn't worked or is not useful in building production applications.

    17. Re:SQL sucks? by Anonymous Coward · · Score: 0

      Listen up people! We must rewrite everything in Lisp!! It's the only solution to these purity debates.

    18. Re:SQL sucks? by localman · · Score: 1

      I've actually written (for a former company) an OO system for putting together relational queries. With this, I wrote a reporting engine such that you could define a report in an XML file, and it would build the right query and organize the data. It ran better than hand-written SQL queries, and was much easier to write.

      Since I don't know the specifics I really can't comment, but this sounds just like the type of thing I'm very averse to. I'm sure it seemed wonderful to you, maybe a few of your friends, and perhaps even suited the problem well. So did my attempts at such lofty "improved" systems.

      But I'd like to see how people walking in off the street would relate to it. And more importantly (so we're not relying on just the fact that SQL is widely known) I'd like to see how it adapts to a hundred different projects with vastly different needs.

      I'm not trying to put down your work, because I've been there and I'm reasonably good. But don't confuse impressing yourself while solving a specific problem with creating a framework that applies reasonably well to nearly every problem there is.

      Cheers.

    19. Re:SQL sucks? by Baki · · Score: 1

      Hmm, we always use business data as primary key. I don't see the point of using artificial keys except for very few cases (to optimize away a composite key that would get too many elements and be too cumbersome to use).

      I have never understood the use of extra indirection and indirectness by using artificial and not understandable numbers to identify records.

      An AC mentioned that any app that does this is guaranteed to break, but yet our 800 tables application (it is a very large app) has been running and continuously extended for many years without any problems. B.t.w. I did performance measuerements (on oracle) many times comparing access times via short artificial keys as compared to longer composite keys. The performance difference was hardly measurable.

    20. Re:SQL sucks? by localman · · Score: 1

      Right you are. But the fact that there isn't a practical replacement available indicates it is better at it's job than a casual observer might realize.

      It's very easy to say that something could be better. It's much harder to make something better.

      Cheers.

    21. Re:SQL sucks? by localman · · Score: 1

      I guess I'm a slob. I would say that 3rd normal form is about as high as I usually go, but there's always a few bits of 1st normal form hanging around to accomodate some bit of slop where cleanly modelling the real world becomes unweildy. So I consider the DB 1st normal form. Anything higher is nice if you can get it. Anything lower is probably unacceptable.

      In my pragmatic little world, anyways.

      Cheers.

    22. Re:SQL sucks? by localman · · Score: 1

      Maybe instead of dismissing Pascal's comments as a "psuedo-academic argument", you should read some of his books where he fleshes things out in greater detail.

      Maybe I should. But I've been too busy keeping up with the real-world business demands of my employer for the past several years. And during that time (and before) I can't recall ever seeing a good ROI for clever data modeling.

      I'm not saying there's no value in his work or opinions, but I just laugh a bit when I see people going on about how awful SQL is when so much is built upon it with great success.

      Cheers.

    23. Re:SQL sucks? by localman · · Score: 1

      And mathematics is an incomplete and inaccurate description of the real world; and the real world is where I work.

      I'm not saying that justifies making sloppy languages or not trying to get things as clean as you can. My point is that it's incompleteness and inaccuracy don't appear to be a hinderance to it's practicality.

      If Date & crew realize that, great. I wish them much luck in coming up with a new system that I may use and love some day down the road. If it's superiority outweigh the transition costs, that is.

      Cheers.

    24. Re:SQL sucks? by johnnyb · · Score: 3, Interesting

      You can do some really neat tricks if every row is identified by a globally-unique integer (i.e. - not just for the table, but the whole database). For example, I can create a notes table that looks like this:

      create table notes(rownum bigint primary key default nextval('rownums'), the_row bigint, notes text);

      and have the_row be a foreign key to the whole rest of the database (unfortunately, I'm unaware of any DB that has direct support for this). Then, I can use this table to attach notes to ANY record in the whole database. You can use such generics very easily if you have a common format for a primary key.

      In addition, only very rarely do I find a primary key that I know for absolute certain will be unique.

    25. Re:SQL sucks? by joe_plastic · · Score: 1

      I thought all 3NF were by definition also 1NF and 2NF . That each normal form above one added a *additional* requirement.
      1NF each enity occurrence have a fixed number of single-valued attributes.
      2NF is 1NF plus each nonkey attribute in a enity be fully dependent on the entire primary key
      3NF each nonkey attribute in a enity be fully dependent on the entire primary key and nothing but the primary keys
      I wont define Boyce/Codd normal form(BCNF), 4NF, or 5NF here. However usually if you got it to 3NF than it's typically also in 5NF.

    26. Re:SQL sucks? by Anonymous Coward · · Score: 1, Informative
      However, he hasn't delivered an exaple of one, he hasn't clearly articulated the differences between his vision and the commercial options, and he apparently refuses to acknowledge that some problems in life fail to fit well into the relational model (hierarchies, networks, inheritance, etc).

      FP, Date, and Darwen may or may not be extremists, but the above characterization is just plain wrong.

      Date and Darwen have extensively documented the differences their vision and the commerial options in The Third Manifesto Chapter 7 (IIRC) of Pascal's Practical Issues in Database Management is devoted to tree processing in a relational language. There's extensive discussions of tree processing and handling transitive closure in The Third Manifesto as well. The cover hierarchies and networks, and in addition have several chapters devoted to inheritance.

      You may not agree with their critique, but to say they haven't articulated the specifics is just wrong; they've spent 20 years doing exactly that.

    27. Re:SQL sucks? by iabervon · · Score: 2, Interesting

      At the lowest layer, it was essentially a system which let you put together a SQL command incrementally. If you want to do an insert, you create an InsertCommand for the table, and then call setColumn() methods for the columns with the desired values. If you want to do an update, it is similar, but you can also add restriction clauses, either by simply requiring an exact match or with a method taking Operand, Relation, Operand. There was support for, essentially, "insert or update row" (like "create or replace view"), which would do the correct operation for the current state of the database without having you write each case and test them yourself. The support for "select" was somewhere in the middle of the range of functionality supported by different SQL extensions.

      So it actually followed almost exactly the functionality provided by SQL, but it allowed you to put it together incrementally rather than with a string constant, so you could have a method on a different object which would take a query so far and a user and apply to the query the access restrictions for that user. Applicability to different problems should, therefore, by approximately the same as an average SQL implementation. It also had the advantage of being able to generate SQL suitable for different databases, so it had a wider applicabilty than completely portable SQL would.

      Of course, the reporting engine was solving a specific problem in a clever way, using the framework. It is true that I extended the framework in a few ways (to include GROUP BY and nested queries) which writing the reporting engine, so you could argue that, until I used it to solve the third (or so) dissimilar problem, it wasn't yet sufficiently general. Still, it would regularly build up substantially more complex queries than either I or our DBA could have written in SQL.

      A particular point I want to make about it is that it wasn't an OO databse system; it was an OO interface to queries in a relational database system. A lot of OO database systems in OO languages have been written, and these are generally less applicable to a large class of problems than relational databases. What most people write are persistance layers for objects, which are much less useful in a lot of situations.

    28. Re:SQL sucks? by localman · · Score: 1

      Pretty interesting. It sounds somwhat similar to what I did, also an OO interface to relational data. My system would construct the SQL automatically using a small data description struct in each class. Then the class could retrieve objects (or sets of objects) and each field was wrapped in an accessor method, and you could verify the validity of the data through the object and store it back to the DB if it was changed. It was really beautiful, and I received many pats on the back for my work.

      And as the company grew and new needs arose I had to hack it and hack it until it was more unweildy and harder to work with than if I had just grouped a few SQL wraper functions that returned plain old data. Modifying the OO layer became a ongoing maintenance task unto itself. That company failed for other reasons (or maybe not?).

      At the place I work now we've gone with the simpler approach and every developer we hire can quickly grok what's going on and get things done without asking the keeper of the OO system to give him/her more functionality and without having to hack around the system's cleverness. I worried at first that such a ad-hoc system would collapse eventually. Well, four years and thousands of major changes later I think that it is a more adaptable and maintainable system than my clean OO system was. Warts and all.

      It's all just my opinion, but it is backed up a bit by a couple of real world experiences.

      Cheers.

    29. Re:SQL sucks? by Doctor+Faustus · · Score: 1

      I'd never thought of that, but it is a neat trick.

      I don't know of anything that supports a single counter for the entire database (directly -- you can always use a NextID table, but you then have to do iterative inserts), but using a GUID in MS-SQL does seem to be fairly common, and I think it has a function to generate them. The downside is that you then have 128 bit keys rather than 32, which will cause a performance hit.

    30. Re:SQL sucks? by bucknuggets · · Score: 1

      > I don't know of anything that supports a single counter for the entire database (directly -- you can
      > always use a NextID table, but you then have to do
      iterative inserts), but using a GUID in MS-SQL does
      > seem to be fairly common, and I think it has a function to generate them. The downside is that you
      > then have 128 bit keys rather than 32, which will cause a performance hit.

      Actually both Oracle & DB2 support this. It's called a sequence in db2 (vs an identity which is table-specific).

      The other benefit of not having your object ids tied to a physical schema implementation is that you can remodel your schema - without many concerns about having to convert primary keys.

    31. Re:SQL sucks? by geekoid · · Score: 1

      I would say any decision, including 3NF,should be documented and backed up with a solid engineering case as to why it's necessary.

      Yes, 3NF is very widely used, but that doesn't mean it should always be assumed. Better to find out up front that 2NF, or 4NF is better for a particullar task, then to end up fighting it becasuse it was assumed. Plus, in general people will take the least path a resistance, so if 3NF take less paperwork to decide on up front, then it will be chosen, regardless of need.

      --
      The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
    32. Re:SQL sucks? by maraist · · Score: 1

      I have one good example of why the 2D data-strutures are not ending their lives just yet. spread-sheets. The world today is not run by databases, but by middle-management that use MS Excel to quickly throw-together vast collections of data and ask questions of that data.

      While MS does provide a rudimentary database system in office, the defacto-standard these days is Excel. Yes it's a HORIBLE data depot (unreliable, corruptable, mangleable), but computers are fast enough and with enough memory that few in management can fully tax it's ability, so they rarely find a need to delegate out an Access or centralized database-system.

      Modern Excel allows you to perform full blown SQL queries on a disconnected set of excel spread-sheets. And the fact that an excel editor is far superior than most front-ends to a database-backend is the real killer.

      Since spread-sheets are fundamentally two dimentional objects, and they are so prolific in even the non programming world, I strongly challenge claims that suggest such facilities are pasee'.

      I'm not sure if your argument is merely about SQL or whether you're referring to 2D flat structures. But even if it is only about SQL, I'd suggest that SQL can be far more expressive than a complex setup of pivot-tables in excel (which essentially performs the same operation as joins on a table). The fact that most GUI's are merely SQL-generators to a common ODBC backend is a testement to this.

      Of course, I'm ignoring that using Excel to ask questions is often done in poor practice (slash had an article about how statistical applications should be used instead). But most OODB / RDBMS system designers don't bother with the statistics either.

      --
      -Michael
    33. Re:SQL sucks? by maraist · · Score: 1

      You could use inheretance. postgres, for example implements inheretance in tables by internally performing the same search across all tables that are derived by the base table.

      So if, for example, you had a base tabe

      create sequence unique_seq;

      create table base (id bigint default nextid('unique_seq'), varchar(255) comment);

      create table foo (my_data varchar(255)) inherents base;

      create table bar (my_data int) inherets base;

      select comment from base*;

      Doing syntax off top of head. Internally this expands to
      union
      select comment from foo;
      select comment from bar;

      Granted since comments might be null, this is a waste of space and a violation of one normal form or another.

      But as another poster pointed out, most modern database systems implement 64/128bit named pk's.

      Cool idea though. :)

      --
      -Michael
    34. Re:SQL sucks? by Anonymous Coward · · Score: 1, Insightful

      Hi, glad to hear your app has held up. The issue is that every business rule always has an exception. Not every part has a number, not every part number is unique, not every person has a SSN, not every SSN is unique (it's true).

      While you can deal with this sort of thing in your appliction, it becomes much more difficult if your missing/duplicate data happens to be physical database key. I wonder how often your users have to work around things like that with your system (by not entering things, or using 'fake' numbers).

    35. Re:SQL sucks? by maraist · · Score: 1

      OJB, torque, and hybernate are excelent well supported 3'rd party OODB tools which provide optimistic/pessimistic locking, Object caching (to speed up object-graph navigation) and near minimalistic support (especially when used with xdoclet).. You literally write the class as you would use it (not even fully mapping all the columns to fields if you don't need them), and create collections wherever you would have a 1:n or m:n association. Then in the javadoc prior to each data-structure, put a one liner in xdoclet form which allows the reflection-based (or intermediate compiler, depending on the technology) interpreter to determine how to associate your OO data with the relational data.

      If you use aspect oriented programming (aspectj or dynaop) you literally never even have to worry about how and when the data gets persisted or retrieved, since that's handled as a separate configuration (typically as a prefix or suffix to all create or remove methods within the model).

      The caching system makes for faster operation than pure SQL when you have more than one query per operation. Yes you can obviously create a single custom query which does everything all at once, and this will be optimal, but for enterprise applications there are often too many unrelated activities going on. Having an intermediate layer cache the most-recently accessed rows (in their post-transformed state) means you don't have to worry about such performance issues... When profiling, you find the hot-spots and do some custom SQL to "pre-fetch" the necessary data so that subsequent sections of the activity are once again fast.

      Probably not a one-size-fits all solution. But after several years of attempting various custom OO layers atom RDBMS's (specifically, postgres, MS SQL server), these open source 3'rd party tools have put me to shame. :)

      --
      -Michael
    36. Re:SQL sucks? by justins · · Score: 1
      Much of what he, Celko, and Date complained about were actually responses by vendors to adapt to the real world.

      Even a cursory reading of the second article indicates that lumping all these guys together is probably a mistake:

      Best-selling author Joe Celko, who says he "built a career on disagreeing with Chris Date," admits there are some things about SQL that could use improving.

      For instance, Celko said he wishes the wild card symbol were not an underscore, now that laser printers are so prevalent. But overall, he said, SQL is logical, and as for NULLs, he sees no way around their use.


      That's a lot less radical than some of the other positions taken by Date or Pascal. Sorry to nitpick but Celko's work has always seemed pretty "real world" to me, what little I've read of it.
      --
      Now before I get modded down, I be to remind whoever might read this that what I am saying is FACT. - bogaboga
    37. Re:SQL sucks? by (negative+video) · · Score: 1
      Hmm, we always use business data as primary key. I don't see the point of using artificial keys except for very few cases (to optimize away a composite key that would get too many elements and be too cumbersome to use).
      Example: Transactions outside the database engine. Typical case would be using an abstract SKU for an e-commerce system, so things don't fall apart when the marketing fools decide to "improve" the model number. SQL CASCADE can't help you there.
    38. Re:SQL sucks? by zenzen667 · · Score: 1

      Works fine if your key is coming from a system that guarentees uniqueness and a 1:1 relationship and is immutable and will always have the same format and that these constraints will never change in the future.

      Unfortunatly, it is a real pain to refactor the db design when it turns out you were not as precient as you thought and the systems outside your database or outside your control evolve in an incompatible way.

    39. Re:SQL sucks? by Tassach · · Score: 1
      I would say any decision, including 3NF,should be documented and backed up with a solid engineering case
      Excellent point. A good engineer should always think about WHY s/he is chosing one architecture or technology over another, and be prepared to back up that decision with hard data.
      --
      Why is it that the proponents of "one nation under God" are so eager to get rid of "liberty and justice for all"?
    40. Re:SQL sucks? by iabervon · · Score: 1

      Those are persistance layers for Java, giving you OODB functionality. They're basically useless, however, if you have problems which are relational in nature. If you're doing a lot of complex searches, you really want to have the database implement the query. If you have an OLAP-style fact table, the OODB design doesn't make any sense.

      Torque et al do work well for persistance of objects, which is something that a lot of people want. But that's not the problem we're discussing here; we're discussing OO code for handling relational data, not code for handling object data in a realtional database.

    41. Re:SQL sucks? by iabervon · · Score: 1

      I think that the reason our system was good was that it was a pretty thin layer over SQL. It kept the same model as SQL has, but did little things like having a set of constants for your database tables so that their names (and schemas) could be changed easily. It let you add an additional copy of a table with an unused alias without keeping track yourself of what aliases were used. It let you write a subquery like a query, and use the subquery like a table (without knowing that it wasn't actually a table).

      There was a bunch of trickiness, but that was in a separate layer on top of the query generation layer. I believe we ended up with code that just used the query (and other command) generation directly, code that used a "search with restrictions" layer, and code that used a "build an n-dimensional table of values" layer. I think part of why it worked and didn't drive anyone particularly insane (beyond the fact that arrays of varying dimensionality in Java are a bit insane, and, in the JDK we were using, buggy) was that the layers were kept strictly separate with all of the cleverness out of the way of code that needed different cleverness.

      I'm actually in the process of trying to write something similar for my own use (and GPL, if I ever overcome my shyness about releasing things). My current problem is that I don't have any problems which use a sufficiently complex database to really hone the tools. Oh, and the low copiousness of my spare time, of course.

  10. It would seems SQL is better for RDB than XDB by grunt107 · · Score: 1

    It is rather intensive to do a relational 'hiearchy' search, whereas an XML representation would allow fast 'object' modeling. One example: Bill of Material. An iterative SQL call would be needed unless the overall part depth was known at query time. I like the idea of hybrid DBMS that allow XML-stored data to be accessible like a relational database. Oracle does this somewhat in their newer releases.

    1. Re:It would seems SQL is better for RDB than XDB by TheSunborn · · Score: 3, Informative

      And the problem happens as soon as you want to know which producs uses a specific screw. Doing that query against the xml would be difficult, and/or extreamly slow.

      Martin

    2. Re:It would seems SQL is better for RDB than XDB by kpharmer · · Score: 2, Informative

      > An iterative SQL call would be needed unless the overall part depth was known at query time.

      Both oracle & db2 have very good support for recursion. DB2 in particular can easily handle hierarchies and networks of unknown depth in a single query.

    3. Re:It would seems SQL is better for RDB than XDB by grunt107 · · Score: 1

      Which is why a switchable view is desirable - the 'specific screw' is easy and fast in RDBMS. The parts hierarchy could be very easy and fast in XML representation. Both, of course, require good designers (I have seen some nasty stuff out there).

    4. Re:It would seems SQL is better for RDB than XDB by AKAImBatman · · Score: 1

      And the problem happens as soon as you want to know which producs uses a specific screw

      What makes you say that? Assuming that the table is built as a "Product" XML block contains its parts, you could simply run an XPATH query against your "Product" table, looking for the part name or number. The results should actually be slightly faster than an SQL database since no JOINs are involved.

  11. More detailed articles by deepchasm · · Score: 4, Informative

    Readers interested in what Pascal and Date have to say may be interested in visiting Database Debunkings. It has lots of articles written by one or both of them.

    Personally, I recommend the articles written by Date because they are clearly, concisely, accurately, and calmly written. Pascal's tend to turn into a rant, which I wouldn't mind but he always seems to refer to his books rather than give a detailed justification of his arguments.

  12. What the?!... by __aagmrb7289 · · Score: 5, Insightful

    As a rant, this article does a great job. But here's what is missing - what the heck is he talking about? Everything he says is liberally sprinkled with statements telling us these things are self-evident, when they are anything but. He is constantly is referring to how this will clearly show that, or pointing out that this proves this or that later on, but never gets there.

    Can anyone summarize what is being said here in some sort of logical way? Because I'm confused. I see the title, I see no information supporting the title (unless, perhaps, I was to do the research myself).

    1. Re:What the?!... by OscarGunther · · Score: 4, Interesting
      Ever read any Trotsky? Or Lenin? Pascal sounds like any of the old Communists (not the later totalitarians, but the true believers who were old enough to have known Marx or Engels personally). His diatribe is entirely typical of the species. He gratuitously belittles his targets:
      "Natural" perhaps for those without a grasp of data fundamentals.
      (Yes, Fabian, the co-inventor of SQL probably doesn't have a grasp of data fundamentals.) He sprinkles his text liberally with "quotes" and italics so you can "feel" his anger, his dismay -- indeed, you can almost hear him spitting the words in Chamberlin's face. You can almost hear him chortling to himself as he bangs away on his keyboard, demolishing his opponents.

      He venerates the Founder. Finding a quote that supports your argument settles the matter. Codd the Wise avoided the errors that Chamberlin made; clearly the latter is the inferior intellect. And there's only a small core with the Founder. "We" are the true believers; all others are apostates and heretics.

      Overstatement is a definite tell. Chamberlin's explanation of the difference between SQL and XML data is "unbelievable." The nesting argument is "ridiculous." Industry pronouncements are "incoherent." And most prominent of all is the cutting remark that's meaningless to anyone not in the know or already in agreement:

      Unbelievable. Any wonder that SQL fails so abysmally at relational fidelity? We may not expect the average practitioner to distinguish between pictures of relations, which are "flat" due to the presentation medium, and relations of N cardinality themselves, which are N-dimensional logical structures. But we sure expect "industry experts" to be aware of the difference.
      And I sure expect a polemicist to know enough about his art to understand when he's descended into self-parody.
  13. Good grief by Dracolytch · · Score: 2, Insightful

    I'm a pretty good software developer, but if someone doesn't explain to me what the argument is in plain english without extreme haughtiness, I'm going to write off this whole issue as a pissing contest.

    ~D

    --
    This sig has been enciphered with a one-time pad. It could say almost anything.
    1. Re:Good grief by LostCluster · · Score: 2, Funny

      I'm a pretty good software developer, but if someone doesn't explain to me what the argument is in plain english without extreme haughtiness, I'm going to write off this whole issue as a pissing contest.

      The issues in dispute can't be expressed in plain english. That's why we need to upgrade to haughtiness and phase out plain english immediately.

    2. Re:Good grief by cachorro · · Score: 1

      Okay, here's as much as I can figure out about it.

      Some assert that storage and retrieval of data is best done using a "relational" methodology.

      The gurus of relational theory complain that the most ubiquitous implementation of relational operations, SQL, does not truly represent the relational model due to the fact that the designers of SQL did not truly understand the its subtleties.

      The author asserts that those same designers are making the similar mistakes in implementing XQuery.

      AFAICT, the big mistakes are:

      1) XML is intrinsically a format for representing data across domains, and it is ill-suited to the task of representing and/or exercising the relational logic that may exist between data elements.

      2) As XML is inherently hierarchical, not relational, database functionality developed with XML will trend toward a hierarchical architecture which has long been known to be less than optimal.

      For me, the most salient point was the line: ...why use a data interchange format for data management, something for which it was not intended?

  14. Relation arithmetic by Baldrson · · Score: 2, Interesting
    While at HP's E-speak project I spearheaded some work in reviving Bertrand Russel's relation arithmetic based on the work of Tom Etter, a researcher who had been working at Interval Research on some advanced theories of quantum software. We were trying to solve some basic problems with the way RDF and predication were being pursued in systems like Cyc and the semantic web. Unfortunately, basic research like this is rarely afforded any support at all, and what little support I was able to get wheedle out of the E-speak project dried up after a few months. At least we had a preliminary paper written up for future work.

    I described a general vision for this sort of formalism in a prior slahsdot post. Suffice to say some progress has been made since then -- and work in other areas is starting to converge. There is much yet to be done.

    1. Re:Relation arithmetic by metamatic · · Score: 1

      Is that the same Tom Etter who co-wrote RACTER?

      --
      GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
    2. Re:Relation arithmetic by Baldrson · · Score: 1
      Is that the same Tom Etter who co-wrote RACTER?

      Yeah, that's the Tom Etter who wrote the paper.

    3. Re:Relation arithmetic by Anonymous Coward · · Score: 0

      Cyc is a joke. Its only purpose is to suck down funding from government contracts.

    4. Re:Relation arithmetic by Troy+Baer · · Score: 1
      Cyc is a joke. Its only purpose is to suck down funding from government contracts.
      I thought that was the Globus project's job?
      --
      "My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
    5. Re:Relation arithmetic by Baldrson · · Score: 1

      Actually there was a pretty good predicate engine built for Cyc and RDF could easily be attributed to that engine. Agreed, there were fundamental problems with Cyc, which is why the work on relation arithmetic was necessary -- not that there weren't other, less fundamental, problems with the Cyc project. That's beyond the scope of this discussion I think.

  15. Rediculous by TheRealMindChild · · Score: 4, Insightful

    SQL is meant for a relational database. XML is a hiearchial database... have you never worked on a project where your manager HAD to convert a projects database in XML because it was the new shiny buzzword? It NEVER works out very well because they are logically different. The same will go for SQL->XQuery.

    --

    "When life gives you lemons, don't make lemonade. Make life take the lemons back!" -- Cave Johnson
    1. Re:Rediculous by TWooster · · Score: 1

      Well, except the article states:

      ...the SQL language retrieves information from SQL databases, not relational ones (the former, Chamberlin's own contribution) due to failure to understand the latter. ...

      (emphasis not mine)

      Makes me wish I understood a bit more, for it's all a bit confusing.

    2. Re:Rediculous by LurkerXXX · · Score: 4, Insightful

      Don't worry, they don't really understand it either. Guys like this bash and bash SQL for not 'truely' being relational (and it's not, but it's the best by far that we have), but they fail miserably at proposing any workable system that *is* truely relational. It's not at all a trivial task. They'll throw a lot of theoretical stuff at you, but never show you a working product that is a feasible replacement for SQL. Tossing SQL/relational stuff into XML is going to get ugly.

    3. Re:Rediculous by Anonymous Coward · · Score: 0

      Argh. Google "Tutorial D". Google "Ap5". Google - perhaps- "postgresql" (which supports most of the "non-sql" extra bits needed for true relational modeling, but is not accepted by these purists either because it supports sql cruft too...)

    4. Re:Rediculous by TheGavster · · Score: 1

      I love how people bash SQL for not following the model, when in my experience SQL follows reality (isn't that the real key here?) better than the model.

      --
      "Because Science" is one step from "Because old book". Try "Because of my experiment testing my falsifiable assertion".
    5. Re:Rediculous by (negative+video) · · Score: 5, Insightful
      ...the SQL language retrieves information from SQL databases, not relational ones (the former, Chamberlin's own contribution) due to failure to understand the latter. ...
      Makes me wish I understood a bit more, for it's all a bit confusing.
      Fabian Pascal is smart and well-informed, but a zealot. Like all zealots he is willing to sacrifice anything and everything for his vision of technical purity.

      One of his specific complaints is about SQL NULL values being "unrelational". As an example, a real-world designer might use the following table in a genealogy DB:

      CREATE TABLE people (
      person_id INTEGER NOT NULL,
      name VARCHAR(100) NOT NULL,
      birth DATE NOT NULL,
      death DATE
      )
      If a person hasn't died yet, then people.death would be NULL. Well that just isn't relational enough for our friend Pascal. Since relations can be used to express optional values, then by God they have to be:
      CREATE TABLE people (
      person_id INTEGER PRIMARY KEY,
      name VARCHAR(100) NOT NULL,
      birth DATE NOT NULL
      )

      CREATE TABLE deaths (
      person_id UNIQUE INTEGER REFERENCES people,
      death_date DATE NOT NULL
      )

      It's pure, correctly formed, and worse than useless. It causes a profusion of tables: one for every optional value. It turns every simple query ("tell me useful stuff about this person") into a join ("find matching rows from several tables about this person"). The database server has to waste time enforcing deaths.person_id's pointless UNIQUE constraint. Cascading deletion has to be used to clean up deaths when a row is deleted in people.

      The simple fact is that the world is full of optional, single-valued data. NULL-allowed columns express that data efficiently, without confusion, and without breakage. Community college database designers have no trouble using the convention productively. It may be a little inelegant, but it is pragmatic and balanced engineering. The only whining you hear is from zealots like Pascal who heap fire and fury on others, but never seem to deliver the mythic PerfectoRDBMS.

    6. Re:Rediculous by askegg · · Score: 1

      Maybe a directory would be better? Most directories have optional values on objects and many do not store the attribute if there is no value (assuming you count nulls as not valuable).

      It is then trivial to determine who has died based on ldap searches:

      ldap:///o=org??sub?(died=true)
      or
      ldap:///o=org??sub?(objectClass=died) (assuming aux classes)

      --
      I don't make predictions, and I never will.
  16. Error: User doesn't know how to use program. by LostCluster · · Score: 5, Interesting

    Most of the problems that I've seen with SQL commands that are more complex than they really should be are a result of mistaken assumptions made during the design phase of the database. As a result, extra tables get added late, and therefore create new "features" that code then has to be revised to take advantage of...

    XML's going to be no better in this area. Mistakes made during the design phase will always come back to haunt while implementing and using the system. If a single query can't return the results desired, then that should have been thought of while designing the tables of the DB. Trying to get a query to specify "All things that are red" like Pascal suggests is only going to work if all objects implement the "color" property the same way. If somebody uses CMYK, somebody else uses RGB, and a third uses Play-Doh color names, it's still gonna be a mess that requires code to figure out who really matches whom.

    I don't see how this "new model" fixes the real problems with working with SQL between databases that weren't designed to work with each other.

  17. Peddling a better way? by BillsPetMonkey · · Score: 5, Insightful

    A legacy application is one that works. And the same can be said of SQL. Actually XML works too.

    It's important to understand what XML is replacing - binary or proprietary interfaces. This means an acceptable tradeoff between human readability (a hugely underrated requirement of a structured data format) and efficiency.

    An example is EDI vs XML. EDI "efficiency" accrues only to the intermediary that invented the means to setup trading relationships across their proprietary network. XML uses the end users' bandwidth but it simultaneously eliminates the intermediary completely. This single fact saves literally millions in kilocharacter and storage charges.

    SQL is what my old compsci prof would call opportunity-cost efficient. It's quick and can be implemented a number of ways - it's a data query framework, not a requirement for databases. There may be better ways to do it but you'll have a tradeoff somewhere.

    And if someone offers something better, then when they've finished telling you how good the new way is, wait and listen for the inevitable sales pitch.

    --
    "It's not your information. It's information about you" - John Ford, Vice President, Equifax
    1. Re:Peddling a better way? by rjamestaylor · · Score: 2, Insightful
      XML doesn't eliminate the intermediary. Eliminating the intermediary eliminates the intermediary.

      I mean, one can kiss of the intermediary with CSV files, financial 5-bit data files, packed decimal data files, string-and-cans dictation systems, etc.

      And one can sign up to use any number of intermediaries with their brand-new XML data files backed with excellent XSDs, etc.

      The data file does not bear on the intermediary.

      I don't have time here but I will postulate what anyone doing data integration with disparate parties knows: XML only solves the parsing questions; data meaning still requires conferences and/or tomes of explanatory text to explain beyond the grammar of XML, XSD, etc.

      Automated data integration is not possible unless one side is willing to do all the coding to match a well-publicized protocol with a test server for honing the solution.
      <rant>

      • In fact, I think XML, with its inherent repetition of "column" and "field" names (as in the repeated-for-each-tuple tag names) is a conspiracy amongst Intel, Western Digital and Level3 to sell more advanced processors, bigger disks and higher bandwidth allotments.

        Doubt me? Name one project having complicated two-way data interchange that did not require a face-to-face meeting between development teams where both sides used all current-state-of-the-art XML tech and tools. Just one.

        The only thing not needed to discuss is "how do you delimit fields containing your field delimiter." Other than that all the old questions are still valid (as to meaning and optional representation) and new ones exist (DTD?, XSD?, doctype? Version of XML? Order of data elements? Attributes? Whitespace handling? Characterset?).

        XML: Big Waste of Time (for data integration). Bigger waste of process cycles, bandwidth and disk storage.

      </rant>
      --
      -- @rjamestaylor on Ello
    2. Re:Peddling a better way? by danharan · · Score: 1
      XML uses the end users' bandwidth but it simultaneously eliminates the intermediary completely.
      I used to work for B-Process, the European leader in "electronic bill payment and presentment" (they even use the acronym EBPP when speaking in french... but go figure french suits).

      Anyhow, let me tell you... there isn't any solid agreement on formats yet. A lot of companies are still using EDI, various legacy and other XML formats. So a company can be faced with the task of massaging data of numerous types for every single supplier or buyer.

      Instead, a few companies are setting up that have their own internal (and proprietary) format. They set up two transforms, XYZ->proprietary and proprietary->XYZ. Any 2 companies connected on the platform can send invoices and receipts back to each other, regardless of what legacy system either one might use.

      A universal format would be great... but like all standards, this is as much a political problem as it is a technical one. The potential is there for getting rid of intermediaries. More important to me is that we could automate a lot of book-keeping while saving a lot of paper :)
      --
      Information: "I want to be anthropomorphized"
  18. Complete coincidence I'm sure by ShootThemLater · · Score: 0, Troll
    Let's see... an article the pretty much rips into someone who is "one of IBM's representatives in the W3C XML Query Working Group", "author of two books on the DB2 database system" and "a staff member at IBM's Almaden Research Center".

    At the foot of the page, we have a "sponsored by Oracle" graphic.

    I'm sure those two things are entirely unrelated.

    1. Re:Complete coincidence I'm sure by latham · · Score: 1

      Before you go spouting off, check out that very W3C committee and you'll see that Oracle is on it and a very active participant.

      And if you had checked any of the other pages on the dbazine site you would have seen that the BMC and Oracle logos are part of a standard footer that is appended to all of the pages. Ever hear of advertising?

      Ya gotta do better if you're going to troll.

    2. Re:Complete coincidence I'm sure by ShootThemLater · · Score: 1
      Sorry - it was honestly not a troll. I didn't think anyone would take my comment at all seriously. I guess I just find the occasional mismatch of advertising/sponsorship and editorial content amusing sometimes (in a not all that funny really sort of way).

      Oh, and I had checked the other pages - interestingly, the Oracle logo is not appended to the DB2 section of the site. Very sensitive of them.

      But I'm not seriously suggesting that they are any sort of Oracle stooge, if that was how you read my post.

  19. SQL Has Served Us Well by lenmaster · · Score: 1

    My first professional coding job was to write financial reports in a mix of C and SQL. When I started learning SQL I was amazed how 10 lines of SQL could do what would take dozens of C code easily. I have since seen a number of highly successful projects use SQL-based databases. This article calls SQL a failure -- BS.

    1. Re:SQL Has Served Us Well by Anonymous Coward · · Score: 0

      How many lines of C code do you think that SQL query parser is?

    2. Re:SQL Has Served Us Well by johnnyb · · Score: 1

      I think it's saying is that SQL is a failure compared to what we could have gotten, which is somewhat true. However, I think they're just complaining that tomorrow is not today. However, on the other hand, they are reminding us that we need to be asking our vendors for the tools of tomorrow or they are just going to keep sending us today's tools repackaged over and over again.

  20. And the Better Choice Is? by nyc.!fnord · · Score: 1

    Just wondering if the author had any suggestions to wean me from the SQL/XML data exchange habit.

  21. RDF by j1m+5n0w · · Score: 1

    I've recently been trying to understand the significance of RDF (I picked up the o'reiley book, but haven't gotten around to reading much of it yet). It seems that it allows you to represent data in a graph structure instead of a tree structure, as in XML.

    What do people out there think of RDF? Can it represent the same things that people use relational databases for? What is it good at? What is it not good at?

    -jim

    1. Re:RDF by Baldrson · · Score: 1

      If you read the relation arithmetic paper you'll see there are 3 equality axioms: row, column and value. These three axioms correspond to the "triple stores" that are fundamental not only to RDF but to XML databases generally (as well as other flexible-schema stores such as LDAP). There are a couple of big differences however: Whereas most formal systems start with the idea of "ontology" or "things with names and definitions", relation arithmetic makes possible mathematics without ontology and the idea of negative presence (see phenomenology for the underlying philosophy). Both of these are a radical advance, although the mathematics without ontology is perhaps the more fundamental of the two, as subsequent work has shown.

  22. Let's use a Python query language: by stienman · · Score: 1

    Shopkeeper: You've got a pet database?
    Customer: Yes. I chose him out of thousands. I didn't like the others, they were all too flat.
    Shopkeeper: You must be a looney.

    -Adam

  23. XML Misunderstandings by Decaff · · Score: 4, Interesting

    The author seems to have many serious misunderstandings about XML.

    The fact is that in order for any data interchange to work, the parties must first agree on what data will be exchanged - semantics - and once they do that, there is no need to repeat the tags in each and every record/document being transmitted

    A major point of XML is that the semantics should be explicit. If you don't repeat the tags, you reply purely on position to indicate meaning. This is a Bad Thing. For example, it does not allow sparse data in which non-default or null values can be excluded.

    Inter-system data exchange requires an agreed-on efficient machine-readable delimited file format.

    XML was designed to avoid the issue of 'yet another machine-readable format'. XML can be read reasonably efficiently, but always preserves meaning, ignoring the horrors of character sets and byte order. Compressed XML is a very efficient way to transmit data.

    An important part of XML design was that it should allow information to be expressed in a way that is independent of the software that uses it. In this way, it has something in common with SQL. The point of 'human reading and writing' is that in the last resort, you still have you data! It also makes data transfer hugely easier to debug.

    There is also a lot of confusion about the order of tags in XML. Its possible to specify in a schema or DTD that some tags are required and should be in a certain order, but its also possible to just not care about order. XML is neutral about this.

    1. Re:XML Misunderstandings by Knos · · Score: 1

      The author is actually actually arguing that tags are syntactic, not semantic.

      --
      . . . . . . . .. . . . . . . .
      may u!sh 2 sm!le at dz!z bad nn.!m!tat!ion
    2. Re:XML Misunderstandings by Decaff · · Score: 2, Insightful

      The author is actually actually arguing that tags are syntactic, not semantic.

      He seems to be not sure what he is arguing, as far as I can tell. After all, XML is fundamentally a semantic markup, using HTML/SGML syntax. He seems to be arguing that sematics can be indicated by position, and that tags are unecessary and Any agree-upon delimited format will do. I strongly disagree with this.

      You can use XML as a syntactical markup, but its rather a waste:

      <record><data>1</data><data>Surname</data></reco rd >

      This is just silly, and is no better than CSV, but seems to be the kind of thing the author is arguing for; positional markup in some format.

      Its far better to have an XML schema that includes semantics:

      <record>
      <data type="int" name="id">1</data>
      <data type="String" name="Surname">Surname</data>
      </record>

      or even better
      <record>
      <id>1</id>
      <surname>Surname</surname>
      </record>

      If you do something like this, semantics ARE included in the tags. With the appropriate namespaces and schema/DTD, you can mark up pretty clearly what the data means in a way that just about anyone can understand.

    3. Re:XML Misunderstandings by ralphbecket · · Score: 1

      A major point of XML is that the semantics should be explicit.

      No, XML is concerned with representation (i.e. syntax). The semantics are associated with an interpretation of that representation.

      If you don't repeat the tags, you reply purely on position to indicate meaning. This is a Bad Thing.

      I agree completely. Fractions like 3/4 always confuse me. If only people would write
      <RATIONAL NUMERATOR=3 DENOMINATOR=4/>

      XML can be read reasonably efficiently,

      No, it can't. Tag-soup parsers are not a solution (have you seen the size of the XML standard?) Parsing XML properly is definitely not cheap. There's no sensible way to seek ahead in the data stream. The list goes on.

      but always preserves meaning, ignoring the horrors of character sets and byte order.

      Yes, these problems had never been solved before XML came along.

      Compressed XML is a very efficient way to transmit data.

      Ha ha ha ha ha! My God, you're serious.

      The point of 'human reading and writing' is that in the last resort, you still have you data!

      It's no harder to write a pretty printer for an arbitrary tree structure than it is to write one for XML (although no sane person would so obscure the real data behind all those tags an attributes the way that XML does.)

      XML is a screaming hack.
    4. Re:XML Misunderstandings by allenthelee · · Score: 0

      Hear, hear. A great quote from an early version of Philip Wadler & Jerome Simeon's _The Essence of XML_ paper goes something like "The essence of XML is this: the problem it tries to solve is not hard, and it doesn't solve the problem well". Which, of course, isn't to say that it's completely useless.

    5. Re:XML Misunderstandings by Decaff · · Score: 1

      No, XML is concerned with representation (i.e. syntax).

      No, XML is semantic. Syntax defines that '<' and >' and '</' are used in XML. The tags include words. Those words can have meaning; hence semantic. Get it? words: meaning.

      I agree completely. Fractions like 3/4 always confuse me. If only people would write..

      This is not what is being discussed. There is a big difference between '3/4' and...

      a,1.5,2,x,zxxx etc. What do these mean? Suppose someone supplies you with a file that has data in columns separated by spaces. Are the spaces simply separators, or are they part of the data? What do tabs mean? What about embedded quotes in text strings? XML is a way to remove this ambiguity.

      There's no sensible way to seek ahead in the data stream.

      You don't need to. XML can easily be handled serially, with apis such as SAX.

      Yes, these problems had never been solved before XML came along.

      I never said that. These problems had been solved, but in proprietary ways. XML is a common, agreed standard.

      Compressed XML is a very efficient way to transmit data.

      Ha ha ha ha ha! My God, you're serious.


      Yes - what a laugh - I am. Note the word 'compressed'. The use of (tee hee) repeated text tags (giggle) means that it can (chuckle) be easily compressed (titter) and is small.

      XML is a screaming hack.

      No. XML is a very carefully designed and widely used specification. You seem to have a bias against usefulness. If you want to use arbitrary and non-portable binary formats, do, but don't expect anyone else to be interested in using that format.

    6. Re:XML Misunderstandings by leomekenkamp · · Score: 1
      If you do something like this, semantics ARE included in the tags.

      The question is: semantics to what? The answer is: the semantics of the meta-data; the semantics of the data itself still stays unclear. Take for instance:
      <x><y type="string">radar</y></x>
      The only thing we know about 'radar' is that it is a string. And more importantly, a machine 'knows' what a string is and how to handle it. What a machine does not know is the meaning of the string itself. We do not know either; the 'y' tag could stand for 'MASHCharacter' or 'MilitaryTechnology', which impacts the semantics in quite a drastic way.

      --
      Wenn ist das Nunstueck git und Slotermeyer? Ja! Beiherhund das Oder die Flipperwaldt gersput.
    7. Re:XML Misunderstandings by Decaff · · Score: 1

      The question is: semantics to what? The answer is: the semantics of the meta-data; the semantics of the data itself still stays unclear.

      I disagree, as XML IS the meta-data.

      In your example:

      <x><y type="string">radar</y></x>

      You say that 'radar' is a string. Yes, it is, but there is also nothing to stop you adding additional attributes to define the meaning:

      <y type="string" objectType="technology">

      (This may look verbose, but compressed, repeated tag names and attributes don't matter much in terms of space).

      This meta-data and semantics discussion is recursive - after all, who defines the definitions? There is an internationally agreed meta-data standard called the Dublin Core, which can be used in schemas and DTD and namespaces to provide an agreed base for all semantics.

    8. Re:XML Misunderstandings by leomekenkamp · · Score: 1

      But objectType="technology" is still meta-data, because it does nothing more than putting a label on 'radar'. There is no way a machine understands what we mean with 'radar', the semantics of 'radar' are out of reach for a machine. A machine can however 'understand' that 'radar' has a label named 'objectType' with a value of 'technology', because it knows the semantics of tags. That's it, that's the best a machine can do with the current state of AI.

      Look at it this way: XML can be verified through a DTD/Schema/whatever. The only verification that takes place (without writing code yourself) is if all meta-data is correct. And that is the only thing a parser can do, because a parser semantically knows how to parse XML. If you change a DTD (and the XML stream) then the XML stream will still parse correctly, but you need to change your code in order to cope with the changed semantics of the actual data itself.

      An XML stream is a combination of both data and meta-data.

      --
      Wenn ist das Nunstueck git und Slotermeyer? Ja! Beiherhund das Oder die Flipperwaldt gersput.
    9. Re:XML Misunderstandings by ElfKnight · · Score: 1
      <record>
      <id>1</id>
      <surname>Surname</surname>
      </record& g t;

      If you do something like this, semantics ARE included in the tags.

      No they aren't. The above XML is precisely equivalent to

      <x>
      <y>1</y>
      <z>Surname</z>
      </x>

      The only reason it appears to have semantics is that the names have English meanings to humans. It means diddly-squat to a machine.

      --
      -- I would have got out of bed earlier...but I was asleep.
    10. Re:XML Misunderstandings by ralphbecket · · Score: 1


      No, XML is semantic. Syntax defines that '<' and >' and '</' are used in XML. The tags include words. Those words can have meaning; hence semantic. Get it? words: meaning.

      No, words are just syntax. You need to have an interpretation of those words to give them a semantics. Communicating parties need to agree upon a common interpretation.

      An example: consider the binary number 111100100101010100101000...
      What does this mean? It could be a phone number, an XML document, an executable binary, next week's winning lottery ticket numbers - anything at all in fact. Even if we treat this number as a character stream and see words within it in some kind of order, that *still* doesn't give it an independent semantics. Until you supply an interpretation it's just a stream of bits - syntax, in other words.

      There is a big difference between '3/4' and... a,1.5,2,x,zxxx etc. What do these mean? Suppose someone supplies you with a file that has data in columns separated by spaces. Are the spaces simply separators, or are they part of the data? What do tabs mean? What about embedded quotes in text strings? XML is a way to remove this ambiguity.

      This isn't an argument against place-value notation (which, by the way, works very well for numbers, words, sequences, fractions, expressions, procedure calls, etc. etc. - look at functional programming if you think there's a problem here.)

      Yes - what a laugh - I am. Note the word 'compressed'. The use of (tee hee) repeated text tags (giggle) means that it can (chuckle) be easily compressed (titter) and is small.

      So efficient to you means (a) turning your data into a huge text stream in which there is more tag information than data, (b) compressing said stream (how? with text compression? That isn't cheap.), then (c) having the receiver decompress the stream (not cheap) and (d) parse the stream (definitely not cheap if you're going to do it properly.)

      Efficient to me means compact and self-parsing. I can easily provide viewing/editing tools for those that need them. Unfortunately it's clear the XML designers couldn't.

      No. XML is a very carefully designed and widely used specification. You seem to have a bias against usefulness. If you want to use arbitrary and non-portable binary formats, do, but don't expect anyone else to be interested in using that format.

      If XML were carefully designed it's specification document would be very thin, rather than a tree-killing pile of volumes detailing hack upon hack to fix problems with the underlying scheme.

      I have no problem with a widely adopted exchange scheme. What I have a problem with is a shockingly badly designed widely adopted exchange scheme. Its unbiquity is the only positive thing I can think of to say about XML.

  24. The Criterion here is no longer machine efficiency by maverick97008 · · Score: 1, Insightful

    The fact is that in order for any data interchange to work, the parties must first agree on what data will be exchanged -- semantics -- and once they do that, there is no need to repeat the tags in each and every record/document being transmitted. Any agreed-upon delimited format will do, and the criterion here is efficiency,

    WRONG!!!
    The point of al those repeated tags is that machine time and bandwidth is very cheap now, but human time is not. The beauty of XML is "agreed-upon" is optional now. We can understand the data without a formal meeting to come to an agreement.

    As a guy who has spent a significant portion of his life in meetings about data mapping between systems, I love XML because of how easy it is to read, understand, communicate about and map to the structures I need the data for.

  25. Things you use XML for by Decaff · · Score: 2, Interesting

    Use Open Office? unzip that .sxw or .sxc file, and what do you see? XML.

    You probably use XML a lot without knowing it.

  26. XML hasn't been designed to be efficient by hsoft · · Score: 1
    Be that as it may, the way XML "handles" missing data is not an advantage, but a liability for data exchange. In fact, much of the XML being transmitted is tags, not data. And given that tags are actually unnecessary, the inefficiency should be obvious.

    I don't think XML has been designed to be efficient. I don't think any business stores all their data in XML files (unless speed/datasize isn't a problem). XML has been designed to be universal, easily "morphable". I think XML/XSLT is an awesome tool to create reports out of SQL results, and I don't see what kind of format would fit the job better. Compare Wix with raw MSI tables. Which one do you think is the better design tool? XML, of course, because the foreign key links (Components, Files, Shortcut, UI stuff) are assumed by the tags. I'm sure that XML fits for a lot of other kind of jobs I'm not aware of. Thus, we can see XML as a pretty good jack-of-all-trade, not a efficiency champion.

    --
    perception is reality
  27. Never heard so much nonsense.. by wdavies · · Score: 3, Insightful

    ok, aside from efficiencies in implementation and writing, the main concern should be with the expressability.

    Question - can you express the recursive ancestor relation in SQL? You can express a single relation such as grand parent, but not the full relation.

    Reason? Because SQL is not full relational calculus. It is basically propositional calculus (actually I maybe slightly wrong, and it falls in a higher calculus than propositional).

    Prolog for example is closer, but still not true relational calculus (I forget whats missing).

    There's a reason for SQL's limitations, and that's decidability - guaranteeing that the query will terminate... (admittedly in theory, and you can write some pretty horrendous statements).

    Anyway, just wanted to get that off my chest. Doesn't say much about where XML falls in the scheme of thing - I guess strictlt speaking it doesn't - its just a layout. In fact, to compare XML and SQL is a complete misunderstanding. Its XSLT which should be compared.

    1. Re:Never heard so much nonsense.. by FnH · · Score: 2, Informative
      Question - can you express the recursive ancestor relation in SQL? You can express a single relation such as grand parent, but not the full relation.

      I beg to differ:

      Given ParentOf(parent,child)
      The ancestors of 'Mary' are:
      WITH RECURSIVE Ancestor(anc,desc) AS
      ( (SELECT parent as anc, child as desc FROM ParentOf)
      UNION
      (SELECT Ancestor.anc, ParentOf.child as desc
      FROM Ancestor, ParentOf
      WHERE Ancestor.desc = ParentOf.parent) )
      SELECT anc FROM Ancestor WHERE desc = "Mary"

      SQL might still not be true relational calculus (but then again, it might as well be, I don't know, and the parent doesn't know either), but at least it't possible to express the ancestor relationship.

      Now, if only the PostgreSQL guys would implement this ...

    2. Re:Never heard so much nonsense.. by wdavies · · Score: 1

      follow up - I googled RECURSION and SQL, and came back with very little - apparently its only Oracle that's implemented this. I found a little example from MS showing how to use Stored Procedures (heh, cheating SOB's !).

      I guess in theory there is nothing really stopping SQL becoming like Prolog in expressability, after all, they run on regular computersm no magic here. Although personally, I'd *really* like to just use proper relational syntax, instead of having to do unions etc. In logic, if you say:

      p(x,y) := q(x),r(y)

      You can introduce supplementary relations like

      p(x,y) := s(x),t(y)

      arbritarily. To do this in SQL you have to start UNIONing sub views together...

      anyway, this isnt a thread about how Prolog > SQL, but rather that X Query language is going to be a mess...

    3. Re:Never heard so much nonsense.. by jlrobins_uncc · · Score: 1

      Recent SQL specs include recursive queries. Oracle implements recursive queries, but no with the standard syntax. There's a patch for postgres to implement the oracle syntax, but it won't be considered for inclusion into the core because it is rumored to be unclean (doesn't handle large rows, etc.) as well as because the postgres hackers feel that if there's a way to do it in the standard, then don't commit new implementations that differ from said standard.

      See this thread for more.

    4. Re:Never heard so much nonsense.. by wdavies · · Score: 1

      Question: Can you leave desc out? ie, can you do an enumeration of all Ancestors?

      In the back of my mind there is some memory that grounding variables makes recursion safe, but I could have been smoking crack that day in class...

    5. Re:Never heard so much nonsense.. by Anonymous Coward · · Score: 0

      Recursive SQL is available with DB2 V8.

  28. An extreme position, but with good points by MobyDisk · · Score: 3, Interesting

    This guy is really smart. He takes issue with the design of SQL, and with some of the commonly-used relational-database concepts (such as the NULL). His criticisms here are valid, but his position seems to be extreme. With that said, he is absolutely right about XQuery/XPath/XQueryX++/whatever.

    XML is great for data exchange. Schemas are a wonderful way to describe data. But it is completely inappropriate for querying. If you need to query XML, you should import it into a database then query it that way. The very design trade-offs made when building XML were to make it extensible, and hard to query. XPath is nice to have for simple dinky import scripts. But trying to build a whole hierarchical query language on top of XML is silly.

    If someone wants to build a standardized hierarchical query language, that's great. Very few people use hierarchical databases and need such a language, but I wish them luck in that endeavor. But don't pretend that it is appropriate to use it on XML, or that it has anything to do with XML. XML is an interchange format. Leave it where it works well.

    1. Re:An extreme position, but with good points by eraserewind · · Score: 1

      What If I on my Vendor1 RDBMS, export XML data, and send it to you on your Vendor2 RDBMS, where you import it.

      Then I send you my "Vendor1 SQL" Query that works wonderfully on my RDBMS. What do you do? Would it be easier if I sent you some sort of "XML Query" instead?

    2. Re:An extreme position, but with good points by winchester · · Score: 1

      There is no need to import XML in a database in order to run SQL queries over it. You can run SQL queries over everything that is internally structured. Why do you think the S in SQl stands for Structured?

      I used to run SQL queries on my inventory listings which were connected to CAD drawings... works like a charm.

  29. Fabian likes OO? Noo!! by soloport · · Score: 1

    I'm on Slashdot so I haven't read the article, of course. But I have read other articles and books written by Mr. Pascal.

    If I recall correctly, his usual rant is not that SQL isn't object-oriented; Quite the contrary. He feels (or used to feel) that SQL -- the actual RDBMS part of it -- is fine, as is (as per Cobb), but that it are the *vendors* who have missed the mark.

    In fact he goes so far as to say that OO DBMSes are redundant. No one needs an OO DBMS if the real McCoy (RDBMS) were produced by a worthy vendor.

    Perhaps, if MS can open up their code (little by little), I suppose Fabian could come around. But I don't see why he needs to -- as I think the whole OO-RDBMS thing is just another way to make a n extra buck, if you're a vendor.

  30. Solution Today! Alphora Dataphor by Anonymous Coward · · Score: 2, Interesting

    To see the solution to almost all of the problems identified by Pascal, Date, Darwin and Codd please checkout Dataphor from Alphora. I am a user of their project and it is a marvel. It uses any vendor's DBMS for storage and heavy lifting, but adds untop of it true relational lanaguage and constructs. The system is so highly formalized that Dataphor provides *instant* derived user interfaces for all your tables in your existing database. After taking a few days to supply the missing information about your data that you currently cannot do in your DBMS the instant UI knows how to display all tables and forms in a logical workflow. It also handles all form validation and referencial integrity. If you change your database schema the UI is automatically updated because it is *derived*. It is not the result of some lame MS Access style wizard.

  31. My take on it. by gr8_phk · · Score: 1
    I think he's trying to make some sort of arguement that a real RDBMS is to a SQL DBMS as LISP is to C++ or [insert favorite modern language here]. Unfortunately this guy is not Paul Graham, and he doesn't offer any specifics or solutions - he's just saying what's wrong and not what's right.

    I did database work for a few years and understand why 3NF is good (or 4NF perhaps) and I found it really really great for certain things. However I was able to create databases that could easily represent some really complex things that I could not query without writing a mixture of code (VBA for me) and SQL - like traversing a (potentially recursive) data structure. It may be that I didn't know enough, but I think the problem was with SQL - or my expectations for it. I suspect that's along the lines of what he's trying to get at.

  32. Nitpick (OT) by wtrmute · · Score: 2, Informative

    This is off-topic, but it bothers me when someone blasts another for ignorance and then links into an otherwise perfectly good article ("The Myth of Self-Describing XML", by Eric Browne)which contains the following pearl:

    The business logic cannot be constructed post priori!

    Doesn't it bother anyone that someone who doesn't know his Latin (but uses it, anyway, where a perfectly good "after the fact" would suffice) is used as an authoritative source by the author? If you're trying to prove someone's incompetence, you won't want to counteract it with further incompetence. Mr. Browne may be a genius in relational databases, but makes stuff up just the same.

    And in the unlikely case Mr. Browne should read this post, the correct form is a posteriori, as any law student probably knows.

  33. SQL is not a Relational Database by Anonymous Coward · · Score: 2, Interesting

    The consensus clearly is that the Relational Database is an unqualified success, and that is true. But don't confuse SQL and the Relational Database. SQL was an early hack intended for querying the RDB only.

    Part of the problem is that the RDB was such as success that people do confuse the two. SQL, however, has shortcomings, that are addressed in the third manifesto.

    This product http://www.alphora.com/ is a serious attempt ot address the shortcomings of SQL, with a new language for programming an RDB, and a new Query language.

    SQL itself is not complete, and even the RDBs out there today break many of the rules as set out for the correct RDB.

    Fabian Pascal is a bit of a nutter, you wouldn't ask him around to a dinner party, but he is correct.

    XML is a badly thought out hack, but it works to an extent.

    UML is hardly that, the U is really an act of hubris.

    SQL can be improved, made more complete, and RDBs should be made to confrom more closely to the rules of what makes a real RDB.

    1. Re:SQL is not a Relational Database by Anonymous Coward · · Score: 0
      You are right, SQL can be improved. Basically, SQL (as a language) is similar to COBOL: basically english, with little structural elements in the language.

      There is a replacement for SQL, and there is even a converter to standard SQL: NewSQL.

    2. Re:SQL is not a Relational Database by scrytch · · Score: 1

      > UML is hardly that, the U is really an act of hubris.

      It merely Unified OMT and Booch Notation, that's all. It doesn't stand for "Universal". UML is closer to a visual representation of existing OOP languages, and for that reason, actually lacks some of the neater features that OMT had. If you can ever find a copy, _OMT Insights_ by James Rumbaugh has modelling tips for OMT that use said features most elegantly. Not that it's entirely useful today, since there's really no choice but UML (and the giant encrusted edifice of MDA) today, but it's always useful to know the history...

      --
      I've finally had it: until slashdot gets article moderation, I am not coming back.
  34. Darwin-Pascal ? by alexborges · · Score: 1

    They must be really old by now. But still producing good ideas. I am glad we still have such luminaries in our scientific community.

    --
    NO SIG
  35. What ever happen to Pick by cmcsonar · · Score: 1

    What ever happen to the Pick database?

    1. Re:What ever happen to Pick by Larry+Lightbulb · · Score: 1

      Still out there, being used by thousands of companies in every industry. But as it's not relational, and has always suffered from poor publicity, it hardly ever gets mentioned.

    2. Re:What ever happen to Pick by rttichnor · · Score: 1

      PICK is still alive and kicking.
      D3 database The latest version of the PICK database.

  36. Who the f*ck designs databases in SQL anyway? by Qbertino · · Score: 4, Insightful

    I know some people who do in some cases, but I wouldn't exactly call that a standard procedure. Or call those people DB designers for that matter. 'Cause that is NOT database design.
    You design a DB best with a pen and a large sheet of paper. Or some drawing tool your extremely good at.
    SQL is the language you feed you results into the box so it builds a more or less representative imprint of the abstract reality you've designed. Which can be as relational as you want it to - as long as it meets the physical constraints of non-abstract reality. As soon as you put it onto a computer, you'll have to cut corners. That's the difference between a database _model_ and a database _implementation_. That takes stuff into account like DB load, DB Server Features and data types.

    Types for instance - somewhat relevant when dealing with DB Servers and SQL - are a thing you don't want to touch with a ten-foot pole when designing a _model_.

    I'm suprised a supposedly db expert guy get's all worked up about this and doesn't seem to be able to keep apples and pears apart.

    Anyone initially designing a non-trivial DB with SQL and - on top of that - bitchering about this DB language not being rational deserves a clobbering.
    My 2 cents.

    --
    We suffer more in our imagination than in reality. - Seneca
    1. Re:Who the f*ck designs databases in SQL anyway? by period3 · · Score: 1

      Types for instance - somewhat relevant when dealing with DB Servers and SQL - are a thing you don't want to touch with a ten-foot pole when designing a _model_.

      What? Types are absolutely necessary in a relational database. You can't even have a relation without types. Are you suggesting that the entire theory of relational databases be ignored during modeling?

    2. Re:Who the f*ck designs databases in SQL anyway? by rycamor · · Score: 1

      >Types for instance - somewhat relevant when dealing with DB Servers and SQL - are a thing you don't want to touch with a ten-foot pole when designing a _model_.

      In an ideal database language, types can be arbitrarily complex, or include user-defined constraints and operators, and can be created as domains over the base types. So in that case, indeed you would want types to be part of your data model. This sort of thing is exactly what Date, Darwen, and Pascal argue should be in a relational language.

      In fact, some SQL systems have this capability, such as Oracle, PostgreSQL, and a few others.

      And, in fact there is a modeling system that can take into account types, domains, and constraints. It is called Object Role Modeling (ORM). See www.orm.net.

  37. An attempt at a better solution. by perlchild · · Score: 4, Interesting

    Until then, SQL works. What more can you say?

    That SQL is mostly a kludge?

    Let me restructure that...

    The experts who know what the heck the relational model is and is not argue that the language we use to query a specific type of relational-like database, that they call the SQL databases, the SQL language, has unsufficient representation power to represent the whole model, and hence can't be used to get the whole power of the model.

    That's certainly interesting, and leaves us to ponder two things:

    1) a) Just how much more power could we get? b) And at what cost?

    2) What about alternatives, can we get that same power elsewhere, cheaper?

    1)a) is beyond my mere predicate logic skills at this time.

    1)b) The cost of a model for data storage, representation and management is directly linked to it's adaptability to the data you represent. The article mentions a lot of errors with NULLs(I remember thinking, while reading the article: a NULL was an attempt by the language developers to simulate an interrupt in a language that doesn't have any, this is of course, an oversimplification on my part, but considering stored procedures and triggers[SQL's own exceptions] weren't around yet, they sound like a good basis for further research.) There are a lot of other hidden "costs" for people who use a relational tool for not-quite-so-relational data, but that's not part of the cost of a relational language, per se.

    2) Brings up a few notions: there are the types of databases relational databases replaced, like network databases, and there are attempted replacements, like object databases. There are also further possibilities that I will explore deeper later. Object databases can certainly be interesting, in the sense that by bundling data with code, you can have data that can handle itself, in the very basic sense that we humans apply it to ourselves. The problem is that we tend to have a very fuzzy, real-world view of such data, and can't work with it that easily(we are using computers to make data easier to work with, so if we had software that could handle real-world data complexity outside of our brains, we wouldn't be having this discussion). Object data is certainly very adept with data that has some broad commonalities, re-usable behaviours, and follows set-rules. We can call those business rules for now. Those business rules imply that a certain subset of "The Universe" interests us more than the rest, and follows predictable commonalities, making our mental models a lot less complex. On the other hand, object methodology is not always well understood, and the documentation and models it generates sometimes dwarf some production systems implemented to solve the same problems.

    Now, at the beginning, Relational Systems were data-handling "toolkits" set to handle specific subsets of data, who also followed business rules.
    That's interesting to my purpose, simply because I can envision, at this time(some vendors have similar concepts, but don't formalize them in any way), a new set of "toolkits" where the relational model is only one of many "toolsets" available.

    Indeed, what is probably the most used sql-based server available(MySQL) has been lacking true relational functionality for most of its life, yet that doesn't make the tool less useful for most of its users. Future toolkits can inspire themselves by focusing on specific uses of technology to solve specific problems, and yet keep the SQL as a sort of security blanket, since that's where most of the training about databases(and indeed, usually most of the training about data, period, is in database classes and perhaps, some algorithmics classes)

    After reading the linked articles about XML's weaknesses, though, I don't think it belongs into any toolkit of that nature. Simply because the tool that belongs in the toolkit is the "self-documenting data", and XML's weakness in that area is evidenced there. XML's early focus as a medium of e

    1. Re:An attempt at a better solution. by justins · · Score: 1
      The experts who know what the heck the relational model is and is not argue that the language we use to query a specific type of relational-like database, that they call the SQL databases, the SQL language, has unsufficient representation power to represent the whole model, and hence can't be used to get the whole power of the model.

      A few of them do.
      --
      Now before I get modded down, I be to remind whoever might read this that what I am saying is FACT. - bogaboga
  38. MultiValue by rttichnor · · Score: 1

    A Multivalue database model is better suited for XML, and better suited for real-world data, period.


    *donning flame proof suit - because I know the 'Relational Zealots' are out to get me*
    __________________________________________________
    Free your mind and your database will follow -- MultiValue

    1. Re:MultiValue by geekoid · · Score: 1

      you know, you'll get less flams if you explain why you beleiveMultiValue is better.
      Just making the statement "x is better" is either a troll or flamebait...possible both.

      --
      The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
    2. Re:MultiValue by rttichnor · · Score: 1

      I agree, however, there are so many points to ponder that I'd like to wait until I receive a direction to move towards. I like waiting for someone else to supply the discussion points.
      I'll make an exception this time since I do not have many bites...
      The nested structure of MultiValue fits very well with the same type of structure of XML. Another benefit of MultiValue databases are variable-length fields. OK, that's a start...

    3. Re:MultiValue by Larry+Lightbulb · · Score: 1

      MV databases work well with XML because of already using delimiters to seperate fields, and becuase of treating everything as a string. As for being the best for everything else - well, it's an easily modified data structure and can be changed to suit whatever reality throws at it, without requiring a major redesign. Which is also it's weakness - the programmer has more control and authority than any database designer, analyst, or manager.

    4. Re:MultiValue by rttichnor · · Score: 1

      Which is also it's weakness - the programmer has more control and authority than any database designer, analyst, or manager.
      How is that a weakness? How many times have developers been restricted by the local DBA, database designer, analyst, or manager? Too many.
      I like the idea of having control from beginning to end. Not that I'm such a control freak, but if I have any problems, I have a better idea of the 'Big Picture'. I can resolve the vast majority of any issues without having to rely on someone else.

  39. Totally off topic but... by DeepDarkSky · · Score: 0, Offtopic

    Does anybody find people who end their writing with "Caveat Emptor" pretentious and snobbish? I mean, it's like this: "I am superior enough to recognize a problem with something, and I advise you to be aware, because after all, I know better". "Caveat Emptor" is fine, it's how people use it, and from my perspective, I think it's taking on a snobbish tone, even if used correctly.

    1. Re:Totally off topic but... by SmurfButcher+Bob · · Score: 1

      (Smile while you read this, it isn't an attack. It's fun.)

      You might call it snobbish. I might call it being cynical. When all is said and done, I'll either be proven right... or pleasantly suprised.

      But it isn't snobbery - and it isn't about the topic that precedes such a closing. It's a meta-statement warning that some people are expected to ignore what's said in that topic and will insist that they know better. Such a closing is simply an admission that you expect to be disappointed about being correct about someone's self serving stupidity, yet again, and that the merits of your arguments will be overridden by someone else's fear of looking bad. There's snobbery afoot, for sure - but it isn't the writer, it's the dumb putz who intentially discards what's been said.

      Anyone who was forced to live through the bullshit buzzword bingo of the 90s will know *exactly* what I'm talking about... "Channels!" "PUSH Technology!!" "Three Dimensional Database!" "Client Server!" "SEEEEEQUEEELLL!!!" All of these things will answer *everyone's* problems. Yeah, whatever you say, lady... you're still a moron, and have no clue what you're talking about.

      "Caveat Emptor" is not necessarily a closing meant for the reader... it's more often an expression of frustration. Proven right... or pleasantly surprised. Snobbery? Or experienced.

      --

      help me i've cloned myself and can't remember which one I am

  40. The Problem, stated more accessibly by IBitOBear · · Score: 4, Informative

    The problem is that most "Relational Database Management Systems" only represent one type of "relationship", that being "the table".

    This, in turn, means that every operator (programmer, statement, etc) on the database must _individually_ "already understand" all the relationships that lie outside "the table" before they can act on the data at any significant scope.

    That is, you, the programmer or operator must know, from some source besides the RDBMS itself, how the different tables work with eachother.

    In simpler SQL-biased terms, you have to know, before you start, what is "good" to put in your WHERE CLAUSE to make a join. And then the RDBMS query optimizer needs to guess how to satisfy your needs in something other than glacial time.

    Consider a new verb "EXPOSE thing, thing, thing, thing, thing..." that would fish out of a database the one-or-more relationships between the things, and produce a table-looking vector of "tuples" that consist sets of actual values for those things. This is what the theoretical "perfect" RDBMS would do.

    Given (somewhat denormal 8-):

    Customer ID -> Customer Name
    Customer ID -> Street Address
    Customer ID -> Zip Code
    Customer ID -> Purchase Order ID
    Purchase Order ID -> Part Number, Quantity

    One should be able to "EXPOSE Part Number, Zip Code" and have the database "know the relationship" and produce the correct vector of tuples.

    But that doesnt happen.

    Now all the people bleating about the Higherarchial databases and bad things from the past are doing this harping because they remember the bad-old-days when a database would maintian one tree-structured set of relationships like this. In the higherarchical model, you could basically do this EXPOSE operation, but only if you had, by dint of pre-knowledge, asked for things lying on one linear path through the tree. (* simplified for brevity).

    In essence, SQL requires you, the programmer, to be in the business of making up relationships that should be in the data or schema structure but isn't.

    A magically complete RDBMS would take a series of vectors of the form "Independent Datum (key)->Dependent Datum (value)" (where either side of the arrow could be a list of atomic values). The RDBMS would then assemble and maintian tables or linked lists or whatever with no exposure of SQL-esque "tables" and the accessors would be storage method agnostic. (That is, there would be no such thing as a FROM CLAUSE.)

    For instance, in the above list of relationships, Customer ID, Customer Name, Street Address, and Zip Code *could* all live in a "table", or not, but you would never know that. But a better vector of
    Customer ID -> Street Name, Building Number, Suite to replace "Customer ID -> Street Address" has a table-feel, but would not bias against "EXPOSE Street Name, Part Number".

    The power of this comes from being able to do:

    EXPOSE Building, Part Number, Quantity
    Where Building == "Word Trade Center 2";

    And have the RDBMS already know the sequence of relationships to get from Building to (address elements) to Customer ID, to Part Number without the programmer writing the three stage join across the "uninteresting middle tables".

    (The above presumes you have a building relation that has Street Name, Street Number -> Building or some such.)

    All the XML nonsense is nonsense *_BECAUSE_* the strict-nesting enclosure requirements of XML make it "naturally" become hierarchically organized. But data exists outside the single-inheretance strict parantage trees that the hierarchical model dictates.

    The problem is that SQL got real popular and so the idea of structural inferrence got pared down to tables and Query-Like constraints on tables before anybody had a chance to formalize the idea of living, encoded relationships between arbitrarily stored datum. So we never really got a language or system that could "EXPOSE".

    --
    Innocent people shouldn't be forced to pay for inferior software development.
    --"Code Complete" Microsoft Press
    1. Re:The Problem, stated more accessibly by Anonymous Coward · · Score: 0

      In simpler SQL-biased terms, you have to know, before you start, what is "good" to put in your WHERE CLAUSE to make a join. And then the RDBMS query optimizer needs to guess how to satisfy your needs in something other than glacial time.

      NATURAL JOIN -- I never specify join columns if I can help it (which is nice if I decide to change keys). Query optimizer is always going to favor keys and indexes anyway, and I don't tend to join what doesn't fit in natural joins anyway.

      Main problem is that it joins ONLY on column names, and only between two tables -- but that's just syntax, since sql makes you work in pairs, with low-level predicate calculus instead of compiling relations of 3 or more into pairs as required (and believe it, it ain't always required).

    2. Re:The Problem, stated more accessibly by emurphy42 · · Score: 1
      The problem with auto-determining the link from A to B is that sometimes the link is ambiguous. Consider the following:
      Address -> ZIP Code
      Customer -> Billing Address
      Sales Order -> Sales Order Number
      Sales Order -> Customer
      Sales Order -> Shipping Address

      EXPOSE Sales Order Number, ZIP Code
      WHERE Sales Order Number == "12345"
      Are you asking for the ZIP code where the physical goods should be sent, or the ZIP code where the bill should be sent? The system has no way of knowing.

      What you could do is create a meta-system that determines all the possible choices and lists them, then you pick the correct one and it gives you a template of a SQL query to copy+paste. You would need the following:

      1. A system table that contains the definition of each application table (already exists)
      2. A system table that contains the pair-relationships between tables
      3. Appropriate indexes on these system tables
      4. A depth-first search program that reads these system tables, and handles recursive relationships correctly
      You still need to understand basic RDB concepts - PHBs wouldn't know which of the options presented to them is correct. What this would do is allow you to grok a specific RDB with less grunt work on your part, at the expense of more grunt work on the part of the RDB's creator.

      Don't waste time trying to auto-determine pair relationships based on field names, either. Even if the RDB's creator makes an effort to keep field names identical, any non-trivial system is practically guaranteed to have (1) fields that have different names but should still be linked, (2) fields that have the same name but should not be linked (e.g. you're buying widgets from a vendor in ZIP code 98765, and selling whatsits to a customer in ZIP code 98765, but there is no relationship beyond the ZIP codes just happening to be the same), or (3) both. The most elegant approach is, when adding a new table to the design, also adding its relationships ("field X in this new table is linked to field Y in the existing table Z").

    3. Re:The Problem, stated more accessibly by Anonymous Coward · · Score: 0

      Uh... 'Scuse my ignorance, I'm only a LAMP weenie :-)

      What's wrong with:
      EXPOSE Sales Order Number, (Billing Address) ZIP Code WHERE Sales Order Number == "12345"

      Extensible to:
      EXPOSE Sales Order Number, ((Wholesale Customer) Billing Address) Zip Code WHERE Sales Order Number == "12345"

      and so on?

    4. Re:The Problem, stated more accessibly by emurphy42 · · Score: 1
      Nothing in particular, except that knowing enough to specify "Billing Address" is pretty darn close to knowing enough to specify the links yourself:
      SELECT SalesOrderNumber, ZIPCode
      FROM SalesOrder x
      INNER JOIN Address y ON x.BillingAddressID = y.AddressID
      WHERE SalesOrderNumber = '12345'
      I remain strongly convinced that trying to remain ignorant of the database structure is the wrong route, and that making it easier to understand the structure is the right one. Views, stored procedures, and a list of pair-relationships (whether in code or documentation) are all significant steps toward the latter.
    5. Re:The Problem, stated more accessibly by Sven+Tuerpe · · Score: 1
      One should be able to "EXPOSE Part Number, Zip Code" and have the database "know the relationship" and produce the correct vector of tuples.
      But that doesnt happen.

      So what is the correct vector of tuples? Zip codes of all customers that bought [part number]? Zip code of the manufacturer's headquarters? Manufacturing facilities? Warehouse? ...

      Your EXPOSE thingy is just underspecified. You are dreaming AI.

      --
      http://erichsieht.wordpress.com/category/english/
    6. Re:The Problem, stated more accessibly by shic · · Score: 1

      I'm intrigued. I've been well versed in the failures of SQL as an adequate interface to DBMS - and on that score you've been preaching to the choir. However, I remain curious...

      Your example seemed very concrete - with your "EXPOSE" syntax you essentially require a new addressing strategy. Detractors suggesting a problem in identifying the type of zip-code you intend to expose miss the point - clearly you intend that the names uniquely identify the domains you intend to expose - for which there are many approaches - even in SQL.

      I am concerned by the complexity of an algorithm to establish which relations are to be used to (indirectly) join the exposed domains - I guess only a brute force search for a sequence of relations linking two domains would suffice... this might make it very slow to detect impossible expose queries. Another concern is demonstrated by the following example:

      People : PersonID->Name
      Friend : PersonId->PersonId

      Now consider a named person's friends n-times removed. This simple relational model captures the concept - however it is not straightforward to see how this would be addressed in the EXPOSE syntax.

      I was wondering, since I believe the above problems are surmountable, do you have any references regarding a design for EXPOSE - or is this all "off the top of your head?"

    7. Re:The Problem, stated more accessibly by IBitOBear · · Score: 1

      Actually I was doing the "limited theory example" thing. It is off the top of my head, but it is "easily addressable" once you (magically) have the relationships stored in the database. Once the relationships exist as things themselves, the relationship names become, essentially, adverbs.

      It's kind of like the invention of lisp; you need lisp to invent lisp so there is a limping-along period. That is, we'd have to run the examples and pin down some adjcency issues like "nearer" and "further".

      Consider Six Degrees of Keven Bacon...

      EXPOSE 'Kevin Bacon', Name CLOSER THAN 7; // gets the list of name pairs
      EXPOSE CONNECTIONS('Kevin Bacon',Name) CLOSER THAN 7; // gets the list of vectors through their entities.

      Still from the top of my head, you get something like:

      EXPOSE thing

      EXPOSE thing, thing;

      EXPOSE thing , thing THROUGH relation // the chain of reference must pass through relation.

      EXPOSE thing, (thing, thing, thing) // meaning each triplet must be within any sigle entity

      EXPOSE thing, entity(thing, thing, thing) // triplet must come from a spesific entity

      [NOTE that in the last two cases there are for "columns" returned as "entitiy" or the parentheses is a limiter not a "function"

      So, to go to one of other examples in the thread:

      "EXPOSE PartNumber, ZipCode" could could retrieve lots of zip codes from the same company while "EXPOSE PartNumber, ZipCode THROUGH ShippingAddresses" would retrieve just the shipped-to zip codes as would "EXPOSE PartNumber, ShipTo(ZipCode)" where "ShipTo" was an entity name (like a table, etc.)

      Most of the above stuff could be done through an overlay of an existing RDBMS. Consider instead:

      EXPOSE [HOW] thing RELATES [TO] thing; // produces one column

      Which might go like

      EXPOSE PartNumber RELATES ZipCode;
      PartNumber->CustomerOrder->BillingAddres s->ZipCode
      PartNumber->CustomerOrder->ShipTo->ZipCod e
      PartNumber->Supplier->CatalogOrders->ZipCode
      PartNumber->Wherehouse->Address->ZipCode
      PartNu mber->WherehouseBillingAddress->ZipCod e
      (and so on).
      (clearly this would need to be expressed as a functionally meaningful value, but said value could likely then be used as an operand.)

      You end up with a meta-language of sorts.

      The set is quite large and I havn't really written up my musings on the topic in any formal way, but the syntax should be fairly direct, if a tad parentheticical.

      The reason that we retreated into SQL all those years ago is that we weren't really ready for namespaces and named relationshps as a technical community.

      Most of the problems disapear once you relize that the simple names I have been using are, well, simple. In practice you will use something analogous to SQL's "table name" when spesifying a thing that may be otehrwise ambiguous. But rather than being a simple table_name.column_name kind of thing, it would have to chain or nest or something so that you might have domain.named_relation.*.entity or something.

      EXPOSE PartNumber, ZipCode; will do the branching thing that could take you many places.

      Something like:
      EXPOSE WITHIN OrderFulfillment PartNumber, ZipCode; would trim down the possible relationships that could be followed, on the proviso that there were a relation doman named "OrderFulfillment". As would EXPOSE PartNumber, ShippingAddress::ZipCode; or something.

      I really don't have a syntax in mind. I am wandering conceptually through the fields of "what it would mean to have the relationships encoded along with the data and the shema" and what effect that would have to have on the resultant language.

      Do your own... it's fun... 8-)

      --
      Innocent people shouldn't be forced to pay for inferior software development.
      --"Code Complete" Microsoft Press
  41. Re:It's about fscking time! by Anonymous Coward · · Score: 0

    The very first premise in that doesn't make sense.

    If they want freedom why would they join a terror organization?

  42. Theory vs. practice by 14erCleaner · · Score: 2, Insightful
    I'm not really much into data modelling theory and such, but I do have two perspectives from which to view this dispute:

    I've done several years of application programming using SQL

    I've also implemented the (XQuery-derived) query processing module for a native XML database

    In my former life as a application programmer, I really liked SQL. It allowed some pretty complicated computation to be done in the query, and very concisely in many cases compared to doing the same thing in, say, C++. For example, things like grouping are very nice for many application purposes.

    In my current job, I'm hoping to create an XML query language that supports the same sort of capabilities as SQL. Our XML query language implementation has decent path/predicate, sorting, and output structuring capabilities, mostly derived from earlier drafts of XQuery.

    My feeling about XQuery 1.0 is that it is extremely bloated. XML seems really simple; querying it shouldn't be all that complicated, should it? But the XQuery committee has created several hundred pages of specifications for the new language. This seems excessive, to say the least. We basically have implemented a subset of an earlier version (with paths, predicates, sorting, XML construction, a few dozen functions), and stopped tracking what they were doing. This is kind of unfortunate, but we really don't have the resources to support his behemoth in all its awesome grandeur.

    We just want a language that lets programmers efficiently access our database. I think we're on the right track. I'm not at all sure that XQuery is going to wind up as a long-term success, partly because of its bloat factor.

    My favorite illustration of the XQuery bloat is this: early versions (up to about April 2002) of the XQuery language description contained this sentence in the introduction:
    It is designed to be a small, easily implementable language in which queries are concise and easily understood.

    Starting in August 2002, this was changed to:
    It is designed to be a language in which queries are concise and easily understood.

    The "small, easily implementable" part got smothered up by the avalanche of features they were adding.

    --
    Have you read my blog lately?
  43. WITH RECURSIVE by wdavies · · Score: 1

    My bad :) I'd never come across the 1999 SQL spec.

    Interesting, I must say.

    Not quite as concise as say (now its been a while, so no critiquing my Prolog).

    ancestor(x,y) := parent(x,y)
    ancestor(x,z) := ancestor(x,y), parent(y,z)

    I dont know if the WITH RECURSIVE spec has any limits that arent in say Prolog (Prolog isn't full First Order Predicate Calculus either).

    Winton

  44. RPG by stanwirth · · Score: 1

    It would have been a more interesting article if he'd compared SQL and XQuery to some more positive examples. eg RPG.

  45. The academia is forgetting the whole point of XML by anttix · · Score: 3, Interesting

    "XML is not meant to be read by humans, it's a data interchange format and thus meant to be read by machines" - Good heavens!
    Someone saying something like that has really got a BS in BS! Or perhaps even worse: a PhD in BS.
    XML is all about programmers being able to understand the data! Yes, because we are not anywhere near that nirvana of fully semantic systems that can (semi)automatically understand each other. NO! Programmers have to do the work to make the systems fit together and XML gives them the advantage that they do not have to reverse engineer another proprietary data format or dig into a horsepile of documentation. XML makes it easy to understand how to process the information at hand - without any extra work!
    Also it's great format for storing small amounts of constantly changing data (like user preferences) cause it's extensible and with only a tiny bit of effort backward compatible as well.
    Anyone trying to use XML for processing large amount of data (like data warehousing) is either nuts or doesn't give a damn about the speed or costs.
    However anybody using XML for long term data storage is a genius since other "more efficent" formats will be obsolete ten years from now and the software that can read it can be extreamly difficult to obtain (anybody who has tried to decode data from some long gone accounting package from the '80-s knows what I am talking about).
    So yes XML is self describing only to humans and that's the whole point of it. Formalizing data semantics is not the goal of XML, has never been and will never be, thats what we have RDF, RDFS, OWL and other nice initiatives from Semantic Web movement for.

  46. Relational Database....pooyee by Danathar · · Score: 1

    Flat files are the wave of the future. If I can't grep it...I don't want it.

    Plain ole ASCII flat files. Best database ever!

  47. NULL by warrax_666 · · Score: 1
    6) Supports NULLs (Date/Pascal/Darwin do not like NULLs)

    That's like trying to do math without a concept of zero.

    No, the "zero" or relational databases is the empty set.
    --
    HAND.
  48. XML? Obligatory Dilbert... by SmurfButcher+Bob · · Score: 1

    Hmmm... what if we made an XML database based on EMail? Then, we could create a whole new schema paradigm using IPSec!

    Wow, suddenly my scalp begins to tingle. I think I'll call a meeting...

    --

    help me i've cloned myself and can't remember which one I am

  49. Conflicting goals by esap · · Score: 2, Interesting

    I think this whole dispute is not necessary. It's just a simple case of conflicting goals. One side wants "efficiency" and another wants "extensibility". What both sides miss is that real systems can't afford to choose just one of these, you have to get both. So you have to have just enough extensibility to allow reasonable extensions, and still attain reasonable levels of efficiency. But trying to get all of either thing will totally lose the game. So the XML camp is wrong to think that extensibility is everything (ever tried parsing XML in a real-time system?). And the "binary transfer format" camp is wrong to think efficiency is everything (ever tried to make two versions of your protocols to interoperate?).

    --
    -- Esa Pulkkinen
  50. Re:It's about fscking time! by MindStalker · · Score: 0, Offtopic

    Because they think they are going to fight for freedom, also realize most of these people join young. Anyways its basically like a cult.

  51. Re:The Criterion here is no longer machine efficie by warrax_666 · · Score: 2, Informative
    The point of al those repeated tags is that machine time and bandwidth is very cheap now, but human time is not. The beauty of XML is "agreed-upon" is optional now. We can understand the data without a formal meeting to come to an agreement.

    No. You. Can't.

    You simply cannot know what the data means without specified semantics (which you have to agree upon somehow; they aren't magically apparent from the XML itself -- that's why such abominations as DTD and XML-Schema exist).
    --
    HAND.
  52. You Are Missing the point of NULLs entirly by IBitOBear · · Score: 5, Insightful

    table: order ID, part number, quantity shipped

    select part number, sum(quantity shiped), avg(quantity shipped)
    Group By Part Number;

    This works with NULLs in the column for quantity shipped on parts which have not yet been shipped. If you just use zero for "no shipment" then your average number will have no real value for answering questions like "how much do we spend shipping these parts, on average?" etc.

    If you wan't to throw an exception you can throw the execption or not in your program. In that case you fetch the individual values and do the math yourself and the "that's not a number" that is caused by the null gives you the chance to throw your exception.

    But since, in aggregate operations, your program isn't even interractin with the data yet, where would such an exception go?

    What would the SQL syntax be communicating a list of results PLUS a list of exceptions to your program? Which order would things be processed in?

    Your boolean analogy is also flawed. "You have stopped beating your wife?" is not a yes-or-no question because it carries a predicate around with it that you may not fulfill for serveral reasions (not married; you are hetrosexual female, so you don't have a "wife", you have never beaten your wife so you can't "stop" doing it; etc). There are a surprisingly large number of "real data" that nature. For those of you who have trouble abstracting this, the "real comparason matrix" is "True, False, and Not Applicable". NILL buys you "Not Applicable" so very cheaply.

    In poin of fact, people who don't like NULL, usually because they don't understand its purpose and use, make a hell of a lot of work for themselves.

    My current employer has a large database of test values that grows by huge numbers of elements each day. The programmer "didn't understand" NULLs (ro RDBMS' for that matter) and has "-" in fields that should be NULL.

    Consequently we cannot aggregate. All of our client applications end up haveing to bulk-fetch whole table ranges and run through elaborate statistical routines full of conditionals; or do separate fetches with "field != '-'" in the where clause and run a concordance operation in ram after the repeated bulk fetches.

    This costs bocup time and degrades the quality of the product.

    You call "academic bullshit", I suspect you have never had to work the really large or significant data sets. I suspect that you don't ever ask the server-side to aggregate for you. And I suspect you have never worked time-critical transactions across a "slow" link.

    You can't have. You think of "NULL" in terms of equality.

    I will give you the "syntatic" point that "Where X = NULL" ought to be unversal. But, for instance, the cartesian nightmare of having "NULL == NULL" in a join is beyond idiotic.

    --
    Innocent people shouldn't be forced to pay for inferior software development.
    --"Code Complete" Microsoft Press
    1. Re:You Are Missing the point of NULLs entirly by a_n_d_e_r_s · · Score: 1

      table: order ID, part number, quantity shipped

      select part number, sum(quantity shiped), avg(quantity shipped)
      Group By Part Number;

      This works with NULLs in the column for quantity shipped on parts which have not yet been shipped. If you just use zero for "no shipment" then your average number will have no real value for answering questions like "how much do we spend shipping these parts, on average?" etc.


      Well your table needs normalization:

      table1: order ID, part number
      table2: order ID, quantity shipped

      That way NULL are not used in any table.

      select table1.part number, sum(table2.quantity shiped), avg(table2.quantity shipped) from table1,table2
      Group By table1.Part Number where table1.order ID = table2.order Id;

      Mostly NULL are used by people who do not normlise the database enough - they are lazy.

      --
      Just saying it like it are.
    2. Re:You Are Missing the point of NULLs entirly by thecorndogofdoom · · Score: 1

      Mostly NULL are used by people who do not normlise the database enough - they are lazy.

      While this is true, there are definitely instances where NULL is necessary (unless one is unreasonably inclined against it and takes ridiculous steps to avoid them). Prime example being the "Date of Death" example above. Some information is not known. Preventing users from making a record entry due to not knowing a non-key field or designing a subtable for each attribute is ridiculous for real world applications.

      Just goes to show some people need to have a glass of reality...I use XML to read from SQL and that works flawlessly for my job. While Puritans scoff, the rest of us are delivering information to end-users.

      Machine cycles are cheap, but performing a 20x INNER JOIN is gonna take a hell of a long time, especially with 20,000+ records (student database at a large state U like mine), compared to grabbing a null (takes zero extra time). Get real!

      --


      -- Tim
      Asst. Mger - Software Team, CSU College of Business
    3. Re:You Are Missing the point of NULLs entirly by IBitOBear · · Score: 2, Interesting

      In your example, you still need part_number in table2 unless you allow only one part per order ID.

      The iterative join needed to then needed to print 10,000 invoices of average-of 20 parts per invoice gets too expensive to be reasonable.

      It can be done, but it becomes anti-helpful.

      Remember, normalization can go too far. By splitting the table you have now doubled your storage requiremet and multiplied the training, implementation, and execution costs since every operator (person or statement) and ad-hoc-query agency now needs to "know" that these two tables are needed to fulfil the "normally dense" query.

      (That is, over the lifetime of the database, most orders will have been shipped so most quantity shipped datum will be present.)

      With NULLs properly used, you can "SELECT COUNT(quantity_shipped)/COUNT(*)" on a single table and the answer will come from the traversal of one table or index. With two tables it would be a join.

      If you actually look at the assembled wisdom of the field, you will note that you are supposed to STOP NORMALIZING when it stops helping and starts causing harm. /sigh

      --
      Innocent people shouldn't be forced to pay for inferior software development.
      --"Code Complete" Microsoft Press
    4. Re:You Are Missing the point of NULLs entirly by Anonymous Coward · · Score: 0

      If you wan't to throw an exception

      "want".
      No apostrophe. Jesus Christ.

    5. Re:You Are Missing the point of NULLs entirly by Threni · · Score: 1

      > Well your table needs normalization:

      It's safe to assume that? What if the table is going to be only ever used to read from? Wouldn't you want to have it in 1 convenient table, indexed appropriately, so that reads from it are fast?

  53. NULLs and Normalization by PatientZero · · Score: 2, Informative
    You can't get rid of NULLs and maintain Normal Form. The two are simply at odds with one another.

    No, they are not, but the way to rectify them is a bit extreme. Keep in mind that -- as Date says again and again -- there is a difference between the logical model and the physical model. I'll summarize the example he used: a EMP_SALARY table.

    Let's start simple:

    EMP SALARY
    ---------------
    Alice 100,000
    Bob NULL
    Chris NULL

    [Sorry, ecode doesn't seem to want to do vertical alignment.]

    Now, what do those two NULLs mean; do they carry value? If Bob is unemployed, we could write 0, but then it could be confused with unemployed and employed for no pay. Perhaps we don't know how much money Chris makes, but we do know that he makes some money.

    You could fix this by adding a TYPE enumeration column that would take on values like EMPLOYED, UNEMPLOYED and UNKNOWN, but you'll still need the NULL value for the UNEMPLOYED and UNKNOWN cases.

    This last part can be solved by logically segregating the table above (sans TYPE column) into three tables (one per type). Both the UNEMPLOYED and UNKNOWN_EMPLOYMENT tables would lack the SALARY column -- they would have only the EMP_ID as there's no more information to add. All rows in the EMPLOYED table would have a known salary.

    NULLs have been removed and the design is further normalized -- some would say to the extreme.

    Now, how you would model that physically without using NULL and still managing some level of performance I do not know. But that at least explains the reasoning behind NULL not being necessary.

    As for myself, after nearly fifteen years of database design and implementation, I'm quite satisfied with using NULL where appropriate. I've never been befuddled by it nor sidelined by its behavior with respect to logical operators. Learn the rules and move on.

    --
    Freedom to fear. Freedom from thought. Freedom to kill.
    I guess the War on Terror really is about freedom!
    1. Re:NULLs and Normalization by AKAImBatman · · Score: 1
      I'm thinking that the above example is even easier to solve given your criteria:
      TABLE: EMP
      ----------
      id (INT)
      name (VARCHAR)
      employedid (INT)

      TABLE: Employed_Lookup
      ----------------------
      id (INT)
      name (VARCHAR)

      TABLE: Salary
      -------------
      empid (INT)
      salary (MONEY)
      The second table would hold Employed, Unemployed, and Unknown. This way a NULL will never appear in your database, merely a lack of information.

      Now that I think of it, I suppose that first normal form wouldn't be violated as long as you ALWAYS separated out the nullable field. Considering the complexity this adds, it's difficult to say whether it is worth it or not.

      Thanks for the discussion. :-)

    2. Re:NULLs and Normalization by PatientZero · · Score: 2, Insightful
      Now that I think of it, I suppose that first normal form wouldn't be violated as long as you ALWAYS separated out the nullable field. Considering the complexity this adds, it's difficult to say whether it is worth it or not.

      Exactly! I've always been in the position of needing to get stuff done, and that has meant focusing on implementation at some point. While some people get all weird with NULL values, I simply look at it as a workable solution.

      I would never fully normalize to the point above, but the discussion is helpful in clarifying the logical design. Who knows, maybe we'll get a "truly" relational DBMS someday. Until then, I'll keep the NULL, thanks! ;)

      --
      Freedom to fear. Freedom from thought. Freedom to kill.
      I guess the War on Terror really is about freedom!
  54. What an arrogant twit by Salamander · · Score: 2, Interesting

    It's one thing to say that either SQL or XQuery have their problems, because they do. It's quite another to say that SQL is bad because it doesn't live up to some arbitrary never-achieved (and perhaps unachievable) standard of relational purity that even Codd himself found superfluous. When Pascal does nothing but the latter, and in addition takes a dozen thoroughly unprofessional swipes at Chamberlin for having been involved in both SQL and XQuery, his professional jealousy is becomes thick enough to choke on. I wish he would, so we would be spared the incessant ranting of someone whose whole career has been marked by a lot of words and not a single deed to back them up.

    --
    Slashdot - News for Herds. Stuff that Splatters.
  55. the problem with stored procedures is the language by GunFodder · · Score: 1

    It is true that in many cases a customer will never change their database, so designing an application for portability seems pointless. However in my direct experience I have had to port an application from Sybase to Oracle, and it was a nightmare due to the number and complexity of the stored procedures.

    The real problem with stored procedures is that PL/SQL is an awful programming language. The syntax is primitive. I didn't recognize it until I saw some examples of discredited languages of a bygone era. the parser is crappy; is returns crummy compilation errors and reformats your source. The DB eats your source, adding a level of indirection between your source and the implementation. And the runtime environment is very finicky.

    And if I spent a couple of years learning about PL/SQL I would still know nothing about how to solve more general computing problems. Instead if I learn Java I can use that knowledge to program almost anything.

  56. FUD FUD FUD by fupeg · · Score: 1

    Case in point! FUD FUD FUD! Or is it just ignorance...

    The first three things you listed have all been MySQL features ever since they added the InnoDB engine (version 4.0 I think, two years ago). Maybe you should RTFM!

    As for "stored procedures, views, triggers, ..." these fall under the category of --USELESS FEATURES--. The only exception there might be views, but even that is debatable.

    As for enterprise class, you can certainly pay somebody for support if you want to. No it's not the vendor, but they have the source code (along with anybody else who wants it.)

    1. Re:FUD FUD FUD by Anonymous Coward · · Score: 0
      Any process that has a client fetching a large volume of data only to distill or summarize it can be improved by having a server cluster do the heavy lifting and just sending the summary to the client. That's why we use query languages in the first place. Sprocs, triggers, and views also have their behavior constrained when they're defined, so the creator can safely give them privileges they don't trust generic clients with.

      The thing about MySQL transactions is that there are still an awful lot of deployments out there that not only don't enforce them but don't even warn you that they'll cheerfully give you wrong answers.

    2. Re:FUD FUD FUD by ianezz · · Score: 2, Informative
      As for "stored procedures, views, triggers, ..." these fall under the category of --USELESS FEATURES--.

      ...useless until you realize you have to connect to the DB from several applications written in several different languages for which you have either to reimplement your way to manipulate the data everywhere, or you have to put in a middle layer of some sort which is able to talk multiple languages (via CORBA, SOAP, plain XML RPC, custom protocol, whatever) and ensure that everyone is accessing data exclusively through it.

      Reimplementing logic everywhere across different languages is usually a bad approach because it doubles the development and testing effort.

      Middle layers are usually hard to get right the first time, much harder than using stored procedures and triggers, since the typical procedural language is not so at ease at manipulating relational data.

      On MySQL: it has a somewhat bad reputation in the field because of the people abusing it: it's fast, it's free, it's easy to set up, but living with these gotchas is definitively too painful for developers more concerned about correctness than speed.

      In the end: firing up Firebird/Oracle/PostgreSQL/SAPDB for simple data is plainly stupid, but often it is done anyways since they do a decent job even in that cases; firing up MySQL for your 30+ GB DB containing your whole network topology which is used by everyone for billing, service assurance, troubleshooting, network planning and whatever is stupid as well. MySQL AB knows that, and in fact it now proposes MaxDB (was: SAPDB) as well.

    3. Re:FUD FUD FUD by JamieF · · Score: 1

      >The first three things you listed have all been MySQL features ever since they added the InnoDB engine

      from Creating InnoDB Tables:
      To create an InnoDB table, you must specify and ENGINE = InnoDB or TYPE = InnoDB option in the table creation SQL statement:

      CREATE TABLE customers (a INT, b CHAR (20), INDEX (a)) ENGINE=InnoDB;
      CREATE TABLE customers (a INT, b CHAR (20), INDEX (a)) TYPE=InnoDB;


      Do I also have to say "pretty please can I have a real database now?"

      >As for "stored procedures, views, triggers, ..." these fall under the category of --USELESS FEATURES--

      OK, you've just revealed yourself as someone who has never worked on a project that involved large data volume. When you get to that point, triggers and stored procs are your friend. Otherwise have fun setting up that gig-E between your MySQL server and application server.

      Oh wait, let me guess... same box?

      Here's a hint: the key to a multitier architecture is that code runs in multiple tiers. There's a reason for that. It's not "useless".

    4. Re:FUD FUD FUD by Fweeky · · Score: 1

      Note that, while you can enable InnoDB to get a few features unsupported by MySQL's default storage engine (MyISAM), MySQL doesn't consider it an error for the table type you requested not to be supported. The really funny bit? MySQL doesn't consider it an error for you to attempt a transaction on a table type that doesn't support it either.

      It's not even warning material, it just silently "OK"'s the transaction. The only time you get a warning is if you try to ROLLBACK and it helpfully mentions that it can't; your unwanted changes are commited.

      Pretty well sums up how seriously MySQL takes data integrity.

  57. How is this new???? by finarfinjge · · Score: 1

    I seem to recall reading such things from Date et. al. for years. My copy of the second edition of the third manifesto (there's a mouthful!) is dated May 2000. This is a half decade old holy war. Not news, not really stuff that matters. For nerds maybe.
    Cheers.
    JE

  58. Re:The Criterion here is no longer machine efficie by geekoid · · Score: 2, Interesting

    "The point of al those repeated tags is that machine time and bandwidth is very cheap now, but human time is not"

    yeah, you tak a file that has 1000 linse, each line has 1024 characters, and 300 fields. Transmitted from Brazilla at 28K.
    Now convert the to an XML format, and then try sending it. suddenly it take 10 times longer to get the data. Literally going from a couple of hours to all day. Not exactly cost effective.

    XML is easy to read, if you know the format and context in which your reading it. It still needs to be designed, you still need to go to meetings, and you still need to create paperwork on the structure.
    It's got its place, but it is not the end all, or even cloce for that matter.

    --
    The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
  59. Pick by Anonymous Coward · · Score: 1, Interesting

    I work with a 1960's era throwback of a database that uses Unidata, (or U2), a derivitive of a Pick database. It uses multivalue attributes and sub-multivalue attributes in fields. The technology predates SQL and is dying (unlike BSD).

    This is relevent because this dying technology inherently supports non-flat data. If you want to store a purchase order as a single record you can with as many line items and as many deliveries per line item as you like. Field size is limited only by available disk space.

    After working with this non-flat data for a couple of years, I have concluded that anyone who wants to go back to it is nuts. It does work, you can run a business on it, but it is cumbersome.

    Yes it's true that building up an object is easy because you can get a whole object in one read, but you end up with a bigger problem on the other side. When you want to look at your data differently you end up twisting and bending to get it.

    For example if you wanted all the purchase orders due on a certain day, you could easily end up reading every record to find them. Yes, carefull indexing solves the problem and you can index the value of a multivalue list to get the po number, but flat tables and relations will solve the same problem more simply.

  60. Xplain by Anonymous Coward · · Score: 1, Interesting

    Some years ago, Xplain was a very good solution to the SQL problem (in a project i worked on for the Guadalajara government) Someone uses it today?

  61. *couRDFgh* by Anonymous Coward · · Score: 0
  62. Example of when NULL = NULL would be nice by the-matt-mobile · · Score: 1

    I agree with you that 99% of the time you would never want NULL to equal NULL. "BatMan" shouldn't ever equal "555-5555" the same way an unknown Name (NULL) is not the same as an unknown PhoneNumber (NULL). And my phone number, though you don't know what it is (NULL), is not the same as your phone number thought I don't know what it is (NULL).

    However, since you asked, I can give you an example of when it would be nice to have NULL = NULL. In data warehousing, you have two types of tables - dimension tables and fact tables. There is always a 1-to-many relationship from dimensions to a fact table. Dimensions ALWAYS have a surrogate key (if your data modeler knows anything at all about dimensional modeling) and should only receive an insert if that record in its entirety (except for the surrogate key) does not already exist in the table. In order to check if a new dimensional record needs inserted or not, you have to do something like this:

    insert into dimGeography
    (
    -- ALL COLUMNS FROM dimGeography EXCEPT FOR THE SURROGATE KEY
    -- GeographyId (autonumber surrogate key)
    Country,
    State,
    City
    )
    select distinct
    (
    Country,
    State,
    City
    )
    from
    dimGeography_Stage a left join
    dimGeography b on
    a.Country = b.Country and
    a.State = b.State and
    a.City = b.City
    where
    b.GeographyId is null

    Now where you run into problems is when a field in your join allows NULLs. For example, Mexico does not have states, so when adding a record for Mexico, you may have NULL as your State. So... no matter how many times ('Mexico', NULL, 'Mexico City') shows up in dimGeography_Stage, it will always be inserted into dimGeography as a new record. This is where NULL = NULL would be handy. However, there are some easy workarounds such as saying COALESCE(a.State, 'SOME DISTINCT VALUE') = COALESCE(b.State, 'SOME DISTINCT VALUE') in your join. Or, you could always disallow NULLs in the State column and use the Country as the State when inserting values from Mexico.

  63. Philosophical point by solferino · · Score: 1

    I wonder at the emotional energy tied up in railing against less efficient and elegant solutions. Yes, there are always more powerful and efficient ways of thinking about things. However, the majority of people may not be at a level where they can conceptualise in the way required by these higher technique.

    An example related to the article is where people mistake the representation of relational databases in 2D tables of rows and columns as their underlying logical structure. It may turn out that conceptualising an N-dimensional logical structure is beyond these people.

    However, these people should not then be browbeaten for their less efficient methods. People at whatever level need to work with tools and conceptual models that they can relate to. Thus if people want to use flat databases, let them do so and live by the limitations of that tool.

    It is possible to get too stuck on the importance of efficiency and elegance. It is perhaps like some supposed transcendent being who can teleport instantaneously from one place to another pouring scorn on us lower mortals who still rely on physical movement to move about in space. That is our level of accomplishment and it is proper that we work at that level, whatever it's inefficiency.

    So in summary, work at whatever level of efficiency and elegance is right for you, but be careful of criticising people who are working at a lower level than you. By being critical, you jeopardise the opportunity of appearing as a model for higher levels of ability when those people are ready to look around for more advanced techniques, and instead your criticism can instead help construct emotional blockages in those people against future progress.

  64. The problems of preemptive data typing by ynotds · · Score: 3, Insightful

    (I just used my mod points in another thread, so I gotta hope other mods recognise the parent post.)

    In the world of self taught dabblers, NULL is not well enough understood to be expected to do anything more than cause the kind of problems you alude to with the likes of '-' to (partially) imply what NULL should be used for.

    SQL has to coexist with other components where an empty string and a numeric zero are assumed null and treated accordingly, the quantity shipped example you give being just as easy to understand and implement with zero meaning not shipped as with a separate null (just add "where quanity shipped > 0").

    There are also several possible reasons for a data value to be left NULL or undef, not all of which are mutually exclusive. Is it "not yet", "not known" or "not applicable"? In the real world we sometimes need to pair a status enum and a (numeric or string) value column to properly represent a single logical datum which needs to sometimes take state values not sensibly representable by numbers or strings.

    We used to use a string of 9s in a numeric key field to represent end of data and even today Perl's DBI interface uses the 0E0 kludge to represent a "true" zero.

    --
    -- Our systemic servants do not good masters make.
    1. Re:The problems of preemptive data typing by LoztInSpace · · Score: 1
      (I just used my mod points in another thread, so I gotta hope other mods recognise the parent post.)
      Doh! Me too. Mod GP up!
  65. Yah by sbszine · · Score: 1

    The real world issue is that SQL has built in support for sorting and sorting algorithms in most big databases are highly optimized

    This is why I do sorting in the db. Aesthetically, I would rather do it in object-land so that the logic is kept separate from the data, but obviously the db vendor has more hours to throw at optimisation than I do. Also, I can't assume that my code is the only client hitting the db -- the db may be able to cache common queries / sorts between multiple clients.

    --

    Vino, gyno, and techno -Bruce Sterling

  66. Pascal should go into Theology by Precipitous · · Score: 2, Insightful

    Pascal's entire line of argument fails to explain why I should care. I RTFA'd. I even read some of the links. This is a topic near and dear to me, as I fight for better design, better technologies at work against various technology and platform zealots.

    I'd argue that if you have a thesis or hypothesis, you need to make a testable predication and test it against the real world. That's the gist of science. Pascal tests XML and SQL against the Gospel of Formal Data Model's according to Codd. Pascal's arguments are theological, not practical, not scientific.

    I want to know how XQuery will or will not make my job of designing and implementing solutions that save my company money. How will the problems of XQuery, or the blasphemous SQL implementations cause me problems that cost time? How do these deviations for orthodoxy really make code harder to write, harder to maintain, more prone to error? These are the relevant questions. These are issues that can also be tested against production scenarios.

    I don't give a hoot about how well XML falls into Codd's formal model. I care whether or not it saves me time. I'll happily accept that normalization, taken not too literally, does saves me time. XML also saves me lots of time: makes a great prototype to store data in (gasp!) before I implement the database layer. Even for a somewhat complex schema, I can blast out a small XML document in a few minutes in my text editor of choice, and test out how the components use the data that they store. Sometimes, for small data sets such as configuration information, I even leave it in XML for years before finding a compelling reason to deal with an RDBMS. Web services (the horror!) have made the middle tier much more accessible - I can drive access to a service based on policy now, rather than to whom I can distribute the damn DLL and it's dependencies. None of this is perfect, but XML works where it makes a difference to me.

    In short, regardless of how well or poorly Pascal addresses his questions, they are the wrong questions addressed in the wrong way. He should go into theology or critique Chinese literature and leave us working folks alone.

    --
    My motto: "A cat is no trade for integrity."
  67. um "duh"... ? 8-) by IBitOBear · · Score: 1

    Yes, IN SQL, it is ambigous.

    That is *EXACTLY* the point.

    If SQL were adaquate to represent "everything the relational model represents" then the ambiguity would be resolved by that NON-SQL language that doesn't exist.

    SQL is inadaquate to the task of fully utilizing the RELATIONAL MODEL because it doesn't have any way to EXPRESS RELATIONSHIPS other than "table".

    Hence the reason (which I was trying to explain in clear terms) the author of the article referenced in the news item was expressing lack-of-happiness for the way SQL works, and the way it is being further debased (his opinion, I haven't read XQUERY) by XQuery.

    This also why I didn't use SQL legal identifiers in my examples, I instead used semantic place holders like "Customer ID". *I* havent invented an adaquate language to express the full set of relational operations *either*. 8-)

    The place were we totally lost out is the COMPLETE ABSENCE of ANY standard language or RDBMS which actually has a means to "store relationships".

    So we create tables, and indexes, and then ask the programmers to go look in documents to figure out how the tables interract. And they each come up with their join-of-choice across the SQL-forced pairs-of-tables.

    And the people who really understand the model of what an "magically complete RDBMS" could really do, are forced to sigh and shake their heads.

    As the market exists today, if you did make the non-SQL RDBMS that would answer this deficiency, you couldn't sell it because it *wouldn't* *be* *SQL*.

    And yes, you could make a meta-system over an existing RDBMS that really understood the RELATIONAL MODEL. But you would pay a hefty entropy cost because you would take your higher understanding and have to decompose it into the SQL_Database operations which are not atomic enough to factor out of the transaction.

    So your four-step over-wrapper (which I have toyed with as an OOS project idea) would suffer terrifically (eat memory or be really slow in round-trip costs) exactly because it cannot adaquately communicate with the underlying Query Optimizer (et al), or influence the real storage/communication/duplication cost.

    It's like putting a future-car "coversion kit" on your sedan. It will change the look but it won't make it fly.

    The shape of an N-way chain of inference, natively executed, would be far different than the shape of a series of 2-way executions that are recombined to look N-way-ish.

    The closest you can come is to, in special cases, use sub-selects to create transient tables in the from clause to pre-filter and anti-join you data deep in the heart of your query engine. You can get some real wins, but again, this is you doing this and not the RDBMS.

    If you truely doubt this, check out the Oracle extension that makes self referental queries work. I don't remember the syntax off the top of my head, but it allows you to do depth-first queries on tables that have a structure like:

    table: Object_ID, Parent_ID, etc...

    It it purely and obviously NOT SQL since it uses (if I recall) "PRIOR" (or something) as a verb and SQL returns "Sets" not "ordered sets".

    But this prior/next kind of relationship (which builds an N-Tree within a table) is an example of one of the many kinds of relationships that the Real(tm) RDBMS over-layer would have to create.

    Then the enforcement of relationships would become an endless string of cascading constraints expressed as (slow) trigger procedures.

    So you might as well then just read the data sets into your own store and do your own work, elsewise the cost is multiplicative.

    Hell, the number of systems that wont let you do the (legal) task of taking the database totally out of state mid-transaction is surprising. By that I mean you should be able to construct

    Table1: Object_ID, feature, thing, whatever
    Table2: Object_ID, property, attribute, whatever_else

    and then constrain the two tables *mutually* such that any Ob

    --
    Innocent people shouldn't be forced to pay for inferior software development.
    --"Code Complete" Microsoft Press
  68. Pascal has a Relational Perpetual Motion Machine by Anonymous Coward · · Score: 0

    The "Pascal should go into Theology" post has a point.
    But my problem with Pascal and his followers is that they beat everyone over the head by claiming that a true relational system would be the software equivalent of a perpetual motion machine. Every flaw of SQL or RDBMSs are critiqued by saying, they suck because if they had implemented a _real_ relational database (and query language) it could magically use lots of different kinds of underlying physical storage schemes that would optimize performance in situation X and use a different underlying scheme to optimize performance in situation Y.
    That is like saying - "See how beautifully and elegantly this calculus function computes limits? If you implemented a perfect software solution that matched the theory of limits, it could do limits instantly and accurate to 6 billion digits.
    One can use that type of argument to bash every software solution to every mathematically articulated system.

    Second, for all their railing against NULLs, I believe Codd admited they were a sort of necessary evil.

    As one poster noted, maybe simple predicate binary "Have you stopped beating your wife [Yes or No]?",
    logic doesn't map well to the real world - OMFG! maybe the fact that the relational model is internally consistent doesn't mean that it is perfect when mapped to real world data and real world needs. Godel had something to say about perfectly internally consistent systems.

  69. Re:um "duh"... ? 8-) by adam6 · · Score: 0

    SQL is inadaquate to the task of fully utilizing the RELATIONAL MODEL because it doesn't have any way to EXPRESS RELATIONSHIPS other than "table".

    Isn't this a limitation of the relational datamodel itself? I think only the entity-relationship model can distinguish between relationships and other data. Once you convert you're model to the relational datamodel, you've lost that distinction. This is not an SQL-specific problem.
    At least this is what we learned at our university on a database course.

  70. Relational does NOT have to be "hard" by Tablizer · · Score: 1

    is beyond my mere predicate logic skills at this time.

    Relational "math" as tought is only the underlying principles, not the final end-form. New operations that "wrap" the mathy stuff can be defined from the mathy operations such that they are easier to relate to.

    It is similar to how we use AND, OR, and NOT as our Boolean logic primatives. However, in theory we only need NAND. But NAND is "ugly" to most humans. Thus, we extended the "language" of Boolean to closer match human language. As long as our extensions are defined using the primatives and don't break the rules of our system, we can have our cake and eat it to. True Boolean geeks can still use NAND if they want.

    In other words, Dr. Codd was a brilliant theoretician, but a lousy marketer and packager.

    We just have to agree on and/or find relational operators and syntax that we find more intuitive than those in the original papers. Sometimes I feel that "look-up" would be more intuitive than "Join", for example.

    Relational as a practice is still young.

    1. Re:Relational does NOT have to be "hard" by perlchild · · Score: 2, Interesting
      You quote imprecisely in this.

      I was saying
      1) a) Just how much more power could we get?
      is beyond my mere predicate logic skills at this time.

      Explaining just what power is missing from SQL to explain "the full power of relational models" in the context of the article, where it is said relational algebra can represent tree-like structures etc... is beyond me.

      I am not interested in how hard relational is to use.

      1)a) was referring to expressing, just what's missing in all the SQL servers, that could be reached if they were truly relational. In this case it has a lot more to do with how we define what you can AND on, a lot more than what operations you can use. I can understand AND just fine, I have a lot more problems with understanding how the experts say on one breath "it's not flat tables anymore" and on the next, no explanation on what operations you can use on a non-flat table... Say is the AND of a pair of arrays of floats the matrix product of the ands of the arrays? In what precision? Or are they just saying "the tables aren't flat, they can be linked" but you gotta restrict your operations on the virtual flat tables that overlay your more esoteric structures?

      ADABAS D had arrays and such(it's now MySQL MaxDB I believe) inside sql-addressable tables, but that was a relatively poorly supported feature, once which wasn't well modeled( you can see from my examples why it wasn't well modeled, I think, it brings up questions about the domains of each operation)

      We can certainly agree on the lousy marketer and packager, but you know what? The marketers won't help you write your complex queries either, Codd's theories just might(well again, they might not, but that's what the original post was about).

      I'm sure I misunderstand more about predicate logic than you've understood in the time it took you to read these lines, but I was aware of those limitations, and kept my post in line with that.

      You however chose to limit the relational model's power, to the logical operators it uses, without mentioning that the relational model also includes rules on how you can organize what objects you can apply those operators TO, which is a rather important distinction, and probably closer to weaknesses of SQL, as evidenced in the article anyways. Anyone can write relational algebra if you got scalar booleans, but if you got an array of complex numbers to multiply by a matrix of floats, and all that needs union(or any other type of join etc...) with userids, full names and addresses, it gets a little bit more fun.

      Thus, we extended the "language" of Boolean to closer match human language. As long as our extensions are defined using the primatives and don't break the rules of our system, we can have our cake and eat it to. True Boolean geeks can still use NAND if they want.


      Well "the rules of our system" also include just what we can NAND (or AND or XOR) ON, not just those operations. The definition domain of those operations is also important, that's why in SQL's fuzzy logic, ANDing with a NULL gives you a NULL.
      Now if there's an impedence mismatch between sql and relational theory, it's probably on the definition domain of the operations(NULLs, non-scalars values, strings) and not on AND itself, which is pretty much a "simple" operator, at least when you limit yourself to scalar values like booleans, and logical predicates.
    2. Re:Relational does NOT have to be "hard" by Tablizer · · Score: 1

      I am not much into static typing (another holy war for another day), so I wan't comment on extending which "types" one can do relational operations on.

      Regarding trees, there are some dielects that have added tree-oriented operations to SQL. As far as I know, it has not been ruled out as being against relational theory.

  71. Re:um "duh"... ? 8-) by emurphy42 · · Score: 1

    No system, SQL or otherwise, can automatically resolve a question that is conceptually ambiguous.

    Again, I point to the example of "give me the ZIP code of sales order #12345". Even if you have a system that efficiently specifies and prioritizes relationships between tables, you've still got to explicitly tell it whether you mean the billing address or the shipping address. And the only ways to do that are (1) tell it to use BillingAddressID rather than ShippingAddressID when linking to the address table, or (2) give up normalization and store BillToZIPCode and ShipToZIPCode directly in the sales order table.

    Yes, there are plenty of things that could be added to SQL, or added by tossing out some parts of SQL and revising them. I'm just pointing out that there are also plenty of things that absolutely require the designer to provide more information.

  72. SQL is the COBOL of relational languages by Tablizer · · Score: 1

    Gimme a break. Love it or hate it, SQL is an amazingly powerful way to work with arbitrarily complex data sets. Need proof? It is the backbone of nearly every non-toy scale data storage project. No amount of psuedo-academic argument can make irrellavent the fact that it works.

    SQL is a lot like COBOL. It works and it is practical and runs the world. Few dispute that. However, most relational affectionados (me included) feel that we can do better. Rather than live with the COBOL of query languages (SQL) forever and ever, let's experiment a bit. We have thousands of procedural and OOP languages, but only a handful of relational query languages to learn and play with. It is time for a Cambrian Explosion of relational languages. The hardware is now up to it.

    I tend to kick around a "dynamic relational" view of relational theory, as apposed to the "static" version offered by others in the field. Just as imperative languages tend to fit a dichotomy of compiled (static typed) or interpreted (dynamically typed), I feel that there is room for dynamic relational also. It may serve better in rapidly-changing ad-hoc and RAD environments.

    1. Re:SQL is the COBOL of relational languages by localman · · Score: 1

      I wish you luck... I didn't intend to sound like there was anything wrong with other relational models, or that SQL was the ultimate relational language.

      But I do think that anything that is proven to work in practice is a lot more impressive than something that would work better in theory. I guess I'm just saying that even with all it's flaws, I am impressed what SQL can do and has done.

      I hope there is more development in relational languages. And when some of the new ideas rise to the top and supplant SQL because they are so compelling, then I'll extol their virtues as well.

      Cheers.

  73. Put these arguments in perspective, please by rycamor · · Score: 4, Insightful

    One thing that everyone should understand: even though Pascal, Date, etc... argue that SQL is a bad implementation of the relational model, they *still* agree that it provides value, and that it is miles better than its hierarchical database predecessors. Since it is their job(s) to provide a reasoned critique of the field, it is only natural for them to rigorously compare SQL to the goals of the relational model. Many of their complaints fall in these categories (although there is much more):
    1. Does too much -- too many ways of doing the same thing, and too many unecessary operations that could be better done another way.
    2. Overly complex -- the SQL 1999 standard was something like 1200 pages.
    3. Allows programmer to circumvent relational integrity. Things like "hidden identifiers", pointers, etc...
    4. Too wrapped up in implementation -- users must spend a lot of time understanding the physical storage, rather than focusing on queries in abstraction.
    5. Many small inconsistencies in SQL itself

    But the problems with SQL are impossible to judge if you only know SQL. It's like the people who used to ask what was wrong with a perfectly good typewriter that made people want to use a word processor. To any who are curious, I suggest you do some reading. The absolute best simple introductions for these problems are in a two online documents by Hugh Darwen at www.thethirdmanifesto.com. Look for "The Askew Wall", and "The Importance of Column Names".

  74. NULL is a nuisance by CaptainZapp · · Score: 1
    NULL columns do have a couple specific problems in a relational database:

    It's a not defined value (comparable with a division by zero). One of the consequences (for example) is that you can't join on NULL columns. Theoretically you can't even sort (ORDER BY) on null values.

    It can be bad for performance. Sybase for example (they still exist and partially in some very performant databases, i.e. in the financial and telecom sector) internally represents NULL columns as VARCHAR, which can be a bad idea since it might invalidate fairly cheap in-place updates.

    That said however, you are right that it doesn't make sense to implement an anal-retentive model in most physical database schemas in order to be compliant with clarity and purity as defined in a number of mathematical textbooks.

    If you have to design a database able to handle 30 million bookings over a 24 hour period, then it might make a lot of sense to avoid NULLs. In your example to replace the NULL by an impossible death date (i.e. 31-Dec-9999) and let the application deal with it.

    (And now, I don't want to argue about the Y10K problem...)

    --
    ich bin der musikant

    mit taschenrechner in der hand

    kraftwerk

    1. Re:NULL is a nuisance by Flaming+Foobar · · Score: 1

      It's a not defined value (comparable with a division by zero). One of the consequences (for example) is that you can't join on NULL columns. Theoretically you can't even sort (ORDER BY) on null values.

      COALESCE(col,0) or something similar works most of the time. If a col is null, it most often also works correctly in queries - the rows that have a null value are simply left out. Logical and for the most part efficient.

      In your example to replace the NULL by an impossible death date (i.e. 31-Dec-9999) and let the application deal with it.

      That's a kludge that sort of works. But it is almost guaranteed to cause problems at some stage. And you could always just say SELECT foo FROM bar WHERE mycol IS [NOT] NULL.

      --
      while true;do echo -e -n "\033[s\n\033[u\134_\033[B";done
    2. Re:NULL is a nuisance by CaptainZapp · · Score: 1
      That's a kludge that sort of works. But it is almost guaranteed to cause problems at some stage. And you could always just say SELECT foo FROM bar WHERE mycol IS [NOT] NULL.

      Which reminds me of something else on which NULL columns are very bad. As (part of) an index column :).

      Again: In most cases allowing NULL columns works just fine (if I wasn't clear on that). What I wanted to point out is that there are certain applications where you might have to think very carefully about allowing them and where even a kludge (as you accurately describe it) might be better. This is certainly an exception, but it happens.

      In addition (and to contradict myself): Usually I'm a very big fan of delegating all kinds of constaints to the database, since letting the application folks deal with them is a clear reciepe for desaster.

      Physical implementation of a logical data model is never clear cut and straight forward. You usually have to compromise one way or the other.

      --
      ich bin der musikant

      mit taschenrechner in der hand

      kraftwerk

    3. Re:NULL is a nuisance by anvil+{UK} · · Score: 1

      using 'impossible values' has a couple of specific problems of its own that nulls, or rigorous denormalisation, don't have.

      first the impossible happens, as pratchett states 9 times out of 10.

      second wrongly skewing data - even worse using the wrong datatype and then skewing the data - really really makes life hard for the query optimizers. at this point database designers usually curse the product for its abysmal performance when they are asking software to understand that 'Dec 31 9999' means 'Unknown', but 'Dec 31 1999' means 'Dead'.

  75. No, NULLs are fine but ALL is missing by RedLaggedTeut · · Score: 1

    NULLs make a lot of sense if you understand fields as sets; so a field with value 11 would really represent the set {11}. Now NULL makes sense because NULL = {} the empty set. However, sometimes NULL is supposed to mean "unknown" in the meaning of "This value could be anything". What is missing from SQL is therefore the opposite of NULL, the value ALL, which would indicate e.g. for an integer field the set containing all numbers. This is where the logic of SQL is indeed often broken, because people use NULL to indicate "unknown" and then questions like "Who COULD have murdered Jane ?" can't be answered by "SELECT name, motive, place FROM motives, whereabouts WHERE motives.name=whereabouts.name AND whereabouts.place='house'" But maybe you can correct my SQL :-)

    --
    I'm still trying to figure out what people mean by 'social skills' here.
  76. NULLs are fine, ALL is missing (reformatted) by RedLaggedTeut · · Score: 1

    NULLs make a lot of sense if you understand fields as sets; so a field with value 11 would really represent the set {11}.
    Now NULL makes sense because NULL = {} the empty set.

    However, sometimes NULL is supposed to mean "unknown" in the meaning of "This value could be anything".

    What is missing from SQL is therefore the opposite of NULL, the value ALL, which would indicate e.g. for an integer field the set containing all numbers.
    This is where the logic of SQL is indeed often broken, because people use NULL to indicate "unknown" and then questions like "Who COULD have murdered Jane ?" can't be answered by

    "SELECT name, motive, place FROM motives, whereabouts WHERE motives.name=whereabouts.name AND whereabouts.place='house'"

    But maybe you can correct my SQL :-)

    (Sorry I had the wrong format selected, so I resubmitted .. should have used preview)

    --
    I'm still trying to figure out what people mean by 'social skills' here.
  77. Identity and the standard by tarvin · · Score: 2, Informative

    You are wrong. The SQL:2003 standard specifies IDENTITY. See http://troels.arvin.dk/db/rdbms/#mix-identity

  78. Recursion and SQL by tarvin · · Score: 1
    I googled RECURSION and SQL, and came back with very little - apparently its only Oracle that's implemented this

    Wrong. DB2 is rather close to implementing SQL:1999's recursiveness. There is an article comparing Oracle and DB2's recursive features.
    -And a patch exists for adding Oracle-style recursion to PostgreSQL.

    Whether WITH RECURSIVE or CONNECT BY work effectively is another question. I haven't seen any experimental articles considering this.

    Without using recursive SQL, different encoding schemes exist for the purpose:

    • most well-known: adjacency list
    • nested set
    • nested intervals
    • materialized path

    Joe Celko has recently published a book dedicated to the subject.

    See my RDBMS links page for more on this.

  79. Re:The academia is forgetting the whole point of X by leomekenkamp · · Score: 2, Insightful

    XML is all about programmers being able to understand the data!

    No. XML is all about storing meta-data alongside the data. And you are implying that programmers will not understand data that is not easy readable, like XML; this implication is an oversimplification.

    (...) reverse engineer another proprietary data format (...)

    XML is certainly not the only well-documented interchange format.

    (...) dig into a horsepile of documentation.

    Is is fairly easy to create XML documents that require horsepiles of documentation, just like any other format. Just look at the XML output of Microsoft Office; XML is no magic bullet.
    (...) (like user preferences) (...)

    No, one should use an API for that and be independent of storage meganism.

    However anybody using XML for long term data storage is a genius since other "more efficent" formats will be obsolete ten years from now and the software that can read it can be extreamly difficult to obtain

    I would not like to retrieve all information from abovementioned Office XML files right now, let alone in ten years time.

    So yes XML is self describing only to humans and that's the whole point of it.

    Ehhm, no. Human readability is a side effect that probably led to its widespread adoption. Work is underway to make certain XML far less 'readable' to humans, but smaller and faster to process.

    Formalizing data semantics is not the goal of XML (...)

    No, because that would require AI, but formalizing meta-data semantics most certainly was a target set for XML to achieve the goal of better data interchange.

    --
    Wenn ist das Nunstueck git und Slotermeyer? Ja! Beiherhund das Oder die Flipperwaldt gersput.
  80. The Data man. problem must be solved at O/S level. by master_p · · Score: 2, Insightful

    One foundamental error in today's operating systems is that they are datatype-agnostic. They simply don't know what the data they handle is. This task is left completely to applications, and each application usually provides its own way of managing data. This causes incompatibilities between applications, and these incompatibilities are not solved either by SQL or XML (that are nothing more than human-readable representations).

    In my opinion, an operating system must primarily provide a data management solution. It must provide the common ways to organize, store, retrieve and process data. This means that the application should only care about the logic behind the data, not how data management is implemented. There are only a few methods of data organization anyway, and it is a shame that these methods are not available when an O/S is installed.

    The algorithms that concern the data types should also be available along with the data types. This means that an operating system not only should provide data management, but it should be object-oriented: each "data node" in the system should be available as a class in the chosen programming language (if it supports such a concept).

    The availability of data and their types on the O/S level would also boost security and safety, as it would not be possible for a 'devious' application to approach the data in any other way other than the intended one.

    Finally, the concept of 'application' is also wrong, and I am saying this in the context of data: in our day and age, data not only multiply fast, but the types of data are frequently modified. The liquid status of data (and their data types) makes the concept of an 'application' (thousands of source code lines, cast in the stone, with a huge degree of coupling between them) a huge obstacle in really making computers useful. Applications need to be replaced by a live system of persistent objects that do simple jobs and inform the world (through events) about changes in their state or the results of their computations; the O/S should be responsible of organizing how objects communicate with each other (either in the same memory space, in different memory spaces, or in different computers).

    Since the current situation is not exactly orthogonal (as described above), there are many misunderstandings and problems in defining concepts clearly; many thousands of dollars are spent in re-inventing the wheel, and many work hours and brain power is consumed in creating what should already be there...(and thus we can have a nice /. discussion when half of the posters say that SQL sucks and the other half saying that XML sucks!)

  81. Have you stopped beating your wife? by sebster · · Score: 0, Offtopic

    Maps perfectly to the real world:

    1) You actually did beat your wife and have stopped doing so... answer the question with a YES.

    2) You actually did beat your wife and still doe, so you didn't stop beating your wife... the answer is NO.

    3) Any other case (like you don't have a wife, or never STARTED beating her, so you can't possibly stop), the answer must be NO.

    In my case, I'd have to answer with a NO ;-) If that doesn't give you the information you needed, you asked the wrong question.

    Greetings,
    Sebster

  82. OT: Re:The Data man. problem must be solved... by AveryT · · Score: 1

    One foundamental error in today's operating systems is that they are datatype-agnostic.

    Sorry but this (mis)use of the word "agnostic" really irks me. Surely you don't mean that today's operating systems think that the existence of datatypes is "unknowable"? Why can't you just say datatype-independent?

  83. What is truly amazing... by blorg · · Score: 1

    ...is that you *needed* to append that final line for people to realise that you weren't deadly serious. Only on Slashdot...

    1. Re:What is truly amazing... by Anonymous Coward · · Score: 0

      Well, to be fair, here on Slashdot those few of us who try to read the articles do get to see a pretty large number of SQL servers crashing...

  84. Oh dear by Ankh · · Score: 3, Informative

    I'm not going to try and reply in detail, but since I participate in the W3C XML Query Working Group and am also the w3C XML Activity Lead, a few comments may be useful.

    The article seems to says "I don't like SQL and I don't like XML and I think XML Query is about mergin them although I don't understand it very well, so the people working on XML Query must be stupid, and in any case it's easier to attack people than understand a specification".

    Perhaps that's unfair, but it's clear to me that the writer is a little fuzzy on the design goals of XML and also on the focus of SQL development over the past 10 or 15 years.

    In both cases the story is about interoperability.

    If you look at the XML Query Home Page you'll see approximately two dozen implementations of the XML Query draft, including a number of open source ones. If you look at the public mailing list for comments, you'll see we received over 1100 detailed technical comments at the last public review. So there's a lot of interest in this work.

    Why is that? One reason is that, like Web services and SOAP, XML Query is able to replace a lot of proprietary and hard-to-maitain middleware. Another reason is that for the first time we'll have a standard way to search over multiple kinds of data source.

    Don is the primary editor of the XQuery language, but the technical decisions reflected in the specification are a result of collaboration, and are agreed on by aconsensus process by a much larger number of particpants. The goal is to make a language that people agree to implement and to use. With support announced by Microsoft, Oracle, IBM, BEA and others (see Web page mentioned above) and judging by the public interest, I think it's fair to say that's going to happen.

    It's pretty rare to see a large complex system that everyone is happy with. It's actually pretty rare to see a small system that everyone is happy with. There are people who are unhappy with some features in the Unix cat program, but it's better to have cat in every Unix system than to have millions of shell scripts break on systems where it's missing! The trick, then, is often to include features that will lead to massively wider adoption, even if some people would rather be without them.

    Then we have (as part of W3C Process) a public call for implementations so that we can test to see how confident we are that all the major features can be implemented compatibly (i.e. interoperably) in multiple independent implementations.

    Features that were not implemented get removed before the specifications are final.

    Is XML Query a waste of time? Is XML evil? Is SQL evil? A lot of people think otherwise, and some of them are pretty smart, so if you are concerned, take the time to read the specs and decide for yourself. :-)

    --
    Live barefoot!
    free engravings/woodcuts
  85. 'Ol Papa Bear remembers this shoot out long ago... by LifesABeach · · Score: 0

    Yes. Consider the mathmatics of this question. Its far easier to convert SQL to XML, than it is from XML to SQL.

    There's some history behind this subject. XML is the love child of the CODASYL standards created back in the 1970's. In the business shoot out for data bases was IBM's SQL(relational) winning over UNIVAC's ADML(hiearchial); only because IBM did its home work better by using math, and marketing. Computers back then were VERY expensive, and SQL proved to be cheaper than ADML. Hardware cost has changed since then.

    But our universe is more Hiearchial, than Relational. In the fullness of time, we will see the imergence of XML over SQL, if only because of the cost of developement, and maintanence.

  86. You have no concept of "conceptual" do you? 8-) by IBitOBear · · Score: 1

    You are still missing the point. You are mistaking the simple conceptual example I originally provided for a complete and potent solution. Try to be a little more abstract.

    There would have to be a way to disambiguate the full model. The SQL modle is disambiguated by the table name. A properly relational model would have "Some Other Mechanisim(tm)". I made no attempt to suggest what that mechanisim-of-syntax might be.

    Now, as a BONUS, in your original discussion of ambiguity you *WRONGLY* assume that the ambiguous result would be incorrect. This is an example of the SQL polution the article writer was talking aobut. There are completely valid reasons to want to fetch, in one EXPOSE, both branches of the abmiguous tree. In SQL this compositing is done by the UNION operator.

    So you state:

    Address -> ZIP Code
    Customer -> Billing Address
    Sales Order -> Sales Order Number
    Sales Order -> Customer
    Sales Order -> Shipping Address

    EXPOSE Sales Order Number, ZIP Code
    WHERE Sales Order Number == "12345"

    Are you asking for the ZIP code where the physical goods should be sent, or the ZIP code where the bill should be sent? The system has no way of knowing.

    And yes, if I am a clod and ask this general question when I want a spesific answer, I will be disapointed.

    On the other hand, if I were doing a marketing distribution graph, I would *EXACTLY* want the general answer that one might get from
    "EXPOSE Part Nubmer, Zip Code" which would "naturally" find all the Zip Codes that were related to the part number. That way my map-of-the-US-grpah would have a push-pin in it for every address it could find, giving me the billing, shipping, contact, and corporate address zip codes.

    Since the discussion at the conceptual level is beyond you, the "real language" implementation of expose would need limiters. So something like:

    EXPOSE Sales Order Number, ZIP Code THROUGH Billing Address
    WHERE Sales Order Number == "12345"

    Or some such.

    I can imagine several better syntax and a good number of operators that this "Real Language" would need. Things like "DISTANCE BETWEEN thing AND thing" to filter on how many Degrees Of Kevin Bacon are interesting to me, or concordancing where one of the "things" in an EXPOSE were an expression like street name(Billing Address) or even (street number, street name, zip code)(Billing Address) or some-such.

    THE POINT was not for me to offer up a complete language, ready an whole, to be nominated as a replacement for SQL. It was just to *EXPLAIN* about the fact that SQL, and the HABBITS OF THOUGHT THAT SQL ENGENDERS IN ITS USERS (habbits like those you have demonstrated in your objection), discard whole-scale many of the things that the "relational model" says "ought to be(tm)" stored in the database along with the shema and the data.

    Those "things" being the relationships themselves.

    As long as the language of access doesn't allow you to think about storing and exploring "relationships" then the RDBMS you are using is just a bunch of lists with Sadly Basic Set Theory Operations on top of it.

    --
    Innocent people shouldn't be forced to pay for inferior software development.
    --"Code Complete" Microsoft Press
    1. Re:You have no concept of "conceptual" do you? 8-) by emurphy42 · · Score: 0
      in your original discussion of ambiguity you *WRONGLY* assume that the ambiguous result would be incorrect. This is an example of the SQL polution the article writer was talking aobut. There are completely valid reasons to want to fetch, in one EXPOSE, both branches of the abmiguous tree. In SQL this compositing is done by the UNION operator.
      And you wrongly assume that the ambiguous result would be correct. (That is, you are right in some cases and wrong in others, just as I am right in some cases and wrong in others.)
      On the other hand, if I were doing a marketing distribution graph, I would *EXACTLY* want the general answer that one might get from "EXPOSE Part Nubmer, Zip Code" which would "naturally" find all the Zip Codes that were related to the part number.
      I suppose some marketers might want this. The vast majority of my work involves cases in which you want the results from exactly one relation.
      EXPOSE Sales Order Number, ZIP Code THROUGH Billing Address WHERE Sales Order Number == "12345"
      As I said in another branch of the discussion, I fail to see how this requires significantly less knowledge than what SQL requires.
      SQL, and the HABBITS OF THOUGHT THAT SQL ENGENDERS IN ITS USERS (habbits like those you have demonstrated in your objection), discard whole-scale many of the things that the "relational model" says "ought to be(tm)" stored in the database along with the shema and the data.

      Those "things" being the relationships themselves.

      As long as the language of access doesn't allow you to think about storing and exploring "relationships" then the RDBMS you are using is just a bunch of lists with Sadly Basic Set Theory Operations on top of it.

      I didn't discard the concept of storing relationships in the database. Hell, I pointed out an existing feature (SQL views) that's in the ballpark. I also suggested some tools for implementing such storage in a much more general fashion, and I would welcome the availability of such tools on certain occasions. (The majority of my work involves a small handful of tables that I've used umpteen thousand times before, and thus know like the back of my hand.)

      What I'm arguing with is the idea that storing relationships in the database is sufficient to auto-disambiguate ambiguous queries (when disambiguation is desired). I acknowledge now that you didn't intend to put forth that idea.

  87. Having never had a model for storing relationships by IBitOBear · · Score: 1

    You are right that the problem isn't SQL spesific. The original article pointed to SQL as a sort-of poison-pill that prevented the development of proper relational modeling.

    As a simple example, it is "impossible" to model "Six Degrees of Keven Bacon" entirely within an SQL query, or any "SQL Database". Oracle has an extension that lets you to walk a parentage tree (c.f. that "PRIOR"(?) stuff).

    There is nothing, however, that limits us in the Relational Model itself that prevents the simple entity relationship "Movie ->> Actor" from providing us with all the information we need to find do the six-degrees operation.

    We just can't do that operation if we think of the entity relationship as a table (directed vector) the way all of our tools have taught us to think.

    You have, like the rest of us, learned to think about the subset of the model available in your tools as if it were the model itself. You think of the entity relationship as existing in a vertical table. In so doing you have deprived yourself of horizontal associations and N-dimensional bindings.

    Six degrees of Kevin Bacon is a rooted tree; a mesh floating in space "anchored by" Kevin; once you really understand that both "Movie" and "Actor" are fully valid Candidate Keys. If you do a depth-first traversal of that tree following Movie to Actor AND Actor to Movie as a fully fleshed out "magically complete Relational Database" would allow, the problem becomes trivial. Further, there is no reason that the mesh needs to be anchored with Kevin; any node would do.

    The problem is that everyone on the planet has been conditioned to see those table thingies as the core relationship even though the table isn't actually central to the model. The model does offer up the table as a natural core representation, in that we are predisposed to understand lists and any valid group of entities can be listed. But mistaking the representation of the data as a list for the constraint of having to access the data only in list-order is a mistake.

    Since the tools constrain the thoughts, you tend to become blind to larger possible associations. You know that you *COULD* write a program to build the Six-Degrees mapping, but it would be a pain so you look for other ways to solve the problem.

    So we live with the core abstractions, and every now and again someone tosses in an extension like "PRIOR" or "soundex indices" and we all update statistics for our query optimizer from time to time. But the relationships still exsit in our heads instead of our data base engines and the subtle data is hidden from all but the most subtle operators.

    --
    Innocent people shouldn't be forced to pay for inferior software development.
    --"Code Complete" Microsoft Press
  88. NULL is a neccesity by DingoBueno · · Score: 1

    The simple reason for NULL being an integral part of relational databases comes from the foundations of the concept: relations are sets, and the whole idea is based on set theory.

    A table is a set of rows, and a row is a set of attributes. Every set contains the null set as an element. Therefore, without changing the relational model to be based on some other premise, null must exist.

    Granted, there are some problems, but it is what it is...

    --
    ascii art
    1. Re:NULL is a neccesity by Anonymous Coward · · Score: 0

      NO! NULL != Empty Set. The Empty Set is a precisely defined mathematical concept. NULL is not.

  89. Please learn how to make links. by Anonymous Coward · · Score: 0
    Please learn how to make links.
    <a href="http://shelter.nu/art-007.html">RDF and Topic Maps XTM</a>
    yields: RDF and Topic Maps XTM

    If you need more help, please ask whoever made your sig. (If you made your sig, then change "Please learn how to make links" to "Stop being such a lazy bastard.")

    Finally, your article at that link contains numerous grammatical and spelling errors (e.g., "Here is a How to Topic Maps", "Bare with me", etc.). Based on your URL, English is not your first language, so you should ask someone whose first language is English to proofread your articles (although, based on what I've seen on here and on other web sites, even people whose first language is English seem somewhat challenged in the grammar and spelling departments).
  90. Kind of off-topic, but... by Anonymous Coward · · Score: 0

    Mexico does not have states

    Actually, it does. In fact, I have seen it referred to as "The United States of Mexico" in places.

    1. Re:Kind of off-topic, but... by Anonymous Coward · · Score: 0

      Yep, in fact its proper name is "Estados Unidos Mexicanos" (Mexican United States or United States of Mexico).

  91. The Evil of Nulls by Christopher+B.+Brown · · Score: 1
    The problem with NULL values is that they wind up weakly doing too many things all at once, and not being implemented consistently, even within a single SQL implementation.
    • They're sort of like zero, except that there are already perfectly good notions of zero there to use already.

      For instance, MySQL generally takes the approach is that NULL is "zero," and typically can't tell the difference between the two.

    • Alternatively, NULL indicates that something is "unknown."

      It would be reasonable to interpret, from that, that any time a NULL enters into a calculation, it should be treated as a "contagion" that makes the result unknown.

      Thus, if field "quantity" contains any nulls, then select sum(quantity) from some_table should return NULL .

    The fact that these two treatments are not used consistently either within or across products means that the use of NULLs tends to make it difficult to predict application behaviour.

    The more you use NULLs, the more trouble they tend to cause.

    --
    If you're not part of the solution, you're part of the precipitate.
  92. But what problem does it solve? by LWATCDR · · Score: 1

    Really what does it do that you can not do in SQL? New is not always better. As to more efficient that is a big maybe. The only real way to tell would be try it in a real world application.
    I really worry that XML is becoming the univeral hammer and the world is starting to look like one big nail.

    --
    See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
  93. SQL NULL vs. empty set by brlewis · · Score: 1

    Every set has the empty set as a subset, not as an element.

    Declaring SQL's concept of NULL to be an element of every set is problematic, because equality won't be reflexive on any set. (NULL = NULL) is always false.

    SQL NULL is an unspecified value. I don't know of any analogue to it in set theory.

  94. Ok SQL is wrong...and?? by cjb110 · · Score: 2, Informative

    I've read the article (maybe i've misread it but its 8:30am and a saturday:)) and although I can see lots of bashing of SQL, XML and nulls going on I can't see any 'alternative' or solutions suggested.

    --
    ----- I refuse to have an argument with an unarmed person
  95. A directory solves nothing by Humblistofall · · Score: 1

    "Maybe a directory would be better"?! This is throwing the baby out with the bath water. Never mind that the industry supposedly figured out that hierarchical data stores where a bad idea 30+ years ago (yet they still persist... excuse the pun), changing data models is not going to solve the age-old problems concerning missing data. These problems have to do with fundamental logic and predate computers.

    The more rigid relational design (with dead people in a separate table from all people) has several advantages including presenting clearer meaning... not just to the user but to the system. Note that the base tables involved can always be re-combined in a view to hide the "complexity" of having multiple tables from the logical perspective. Introducing nullable columns shifts what starts as a design problem, back to an implementation problem. In the case of the death date in the person table, the person table can no longer be "trusted" to state a single thing and must always be tested for what it means (is this person dead... if so).

    As for performance... these are entirely logical issues so how can we even address performance without talking about a specific system. This said, lest someone assert that *no* DBMS will perform as well in the join case, let me note that: a) the assumption is that in general, users want to see the death information for a person, if the converse is true (and it probably is for all but the most morbid of applications) then retrieving only the person information will perform better on most systems; b) I have seen at least one DBMS with an architecture that allows it to performs just as well when joining tables as when selecting from base tables.