SQL, XML, and the Relational Database Model

'scuse my ignorance but... by fiannaFailMan · 2004-06-28 08:10 · Score: 4, Interesting

It has been quite obvious that the designers of SQL had little understanding of data fundamentals in general, and the relational model in particular; and SQL was hardly developed in accordance with good language design principles.

What exactly is the problem with SQL?

--
Drill baby drill - on Mars

Re:'scuse my ignorance but... by Brandybuck · 2004-06-28 08:56 · Score: 2, Interesting

As a OO fanboy, I must protest your mischaracterizatoin of us. Unlike the DB fanboys, we never claimed that OO is suitable for every problem domain.

What drives me nuts aobut the DB fanboys is that they have to use DB for everything. For example, I maintain an embedded system where some nitwit ex-web-developer decided to implement the process table with MySQL. Huh?

DB is suitable when you have massive amounts of uniform data. OO is suitable when you have heterogenous structured data. Often these two areas overlap, so you can choose whichever you prefer. You can even mix the two if you like. But where they don't overlap, stick with the appropriate paradigm.

--
Don't blame me, I didn't vote for either of them!
Re:'scuse my ignorance but... by fupeg · 2004-06-28 10:08 · Score: 2, Interesting
Ch-ching! If I had a nickle for every DBA I've had to deal with who loved to say crap like :
- If you would let The Database make that calculation, it would be faster.
- The Database has a feature that can do that for you
- If you're going to use that programming language, use this extension of it that The Database supports
- Re-write that query as a stored procedure and it will be so much faster
DBAs love to lock you into the database. Put more logic there. Use extensions so you're tied to a vendor. Squeeze out the extra millisecond of performance so that you'll have to continue to use Oracle or DB2 forever. Blah blah blah.

The truth is that these guys want you locked in their product, because they have training on that produce. It's all about job security. These guys hate object-relational techniques because they turn databases into storage/retrieval devices. They REALLY hate simplified databases (especially MySQL) that don't have built-in XQuery support or Java stored procedures or binary content indexing or --INSERT USELESS FEATURE HERE--.

It's all about job security with these guys. They've always got FUD prepared like
- You can't rely on that other vendor's database because it will not scale
- You can't rely on that other vendor's database because it will corrupt your data
- Don't let developers write DDLs or they will crash the database
- Moving business logic out of the database will cause huge performance problems
It never ends...
Re:'scuse my ignorance but... by amorsen · 2004-06-28 10:10 · Score: 2, Interesting

Recursing through a classic SQL tree with parent-"pointers" is ghastly slow. It gets slightly less slow if you use a stored procedure, but if you do that you have confined yourself to one database vendor. (So much for SQL being a standard.)
With "modified preorder tree traversal" you can do the whole recursive query, whether up or down, in one non-recursive SQL statement. Very neat, but it is a pain to implement and you better be careful about the atomicity of updates. So far I have not found a way to avoid locking the whole table for writes for some operations, even with databases which do proper transactions. The same algorithm could be implemented cheaply and with more fine-grained locking in the database itself.

--
Finally! A year of moderation! Ready for 2019?
Re:'scuse my ignorance but... by Unordained · 2004-06-28 10:16 · Score: 2, Interesting

Technically, it's not a relational-database concern. Relation variables (tables) aren't sorted, order doesn't matter. Any time your query asks the output to be sorted, it's asking for something extra that isn't appropriate for relations, but -is- appropriate for list/vector/array/what-have-you.

Yes, it's more efficiently done server-side. But a relational database server could, academically, be a full-blown RDBMS and not have any sorting abilities whatsoever.
Re:'scuse my ignorance but... by D-Cypell · 2004-06-28 11:49 · Score: 2, Interesting

sorting algorithms in most big databases are highly optimized

Efficient sorting algorithms are hardly a closely guarded secret, it is possible to implement the exact same 'optimized' algoritm in any of the tiers. With this in mind, the performance costs of the two sorts cancels and you are left with the additional cost of the SQL parser. So assuming the programmer is capable of either A) writing an efficient sort or B) using the efficient sorts provided in the libraries that are supplied with most mainstream languages this option will always be more efficient.

the cost of ordering a recordset will almost always be minimal compared to the cost of generating that recordset

Exactly! So request the record set once, load it into memory and do whatever sorts are required. If I need to sort the data on different columns for different reasons the costs of generating the record sets begin to multiply. Maybe the data is in the database cache after the first query, but maybe its not and needs to be regenerated.

'Order by' has been useful to me in situations where the query is combined with another clause such as one that limits the size of the dataset. Something like 'load the top 10, highest priority bug reports'. If I did this in code I would need to load them all then sort and discard, which is clearly not very bright, but using 'order by' just to sort data has always proved problematic in my experience. YMMV I guess.

i noticed that the "order by" clause made it into EJBQL 2.0 as well, despite the howls of protests from object purists.

The howls of the object purists tend to go beyond this and recommend against EJB altogether. Even so, I dont see any reason for not including it in the EJBQL spec, I am sure there are cases where it can be very useful. Its a feature that is provided by the database and not including it 'on principal' would be pretty daft, but using it 'because its there' doesnt constitute a strong argument in my book.
Re:'scuse my ignorance but... by cybergrue · 2004-06-28 11:55 · Score: 2, Interesting

I've met people who think that joins are best done in the middle tier, on tables with hundreds of thousands of rows!

Have you run test to confirm that this is false? As part of a project to improve the performance of an app that I developed at work, a colleague and I rewrote several key select statements in different manners, and found the select statements using joins were the fastest by far (in a very large table). We think that the DBA team has the database tuned in such a way to cause this, but I have also heard that there is a query optimizer built into the engine (sybase) that works much beter with joins then nested selects.

The only advice I can give is to run tests comparing the performance of different SQL statements that do the same thing to determine which is the fastest. The results may surprise you.
Re:'scuse my ignorance but... by boomgopher · 2004-06-28 12:41 · Score: 2, Interesting

"Here's one that'll make you howl: "sorting is a presentation-tier concern"

Yeah, well when you have to sort String values using oddball locale-specific methods it sure the heck is.

--
Your hybrid is not saving the environment. Its purpose is to make you feel good about buying something.
Re:'scuse my ignorance but... by Khazunga · 2004-06-29 01:56 · Score: 2, Interesting

Largely it is. If the user hits a table header to sort on the selected row, are we are supposed go back to the database and do a different 'order by'?... I dont think so!
Then think again. It's far from a black and white question. If data transfer between persistence and presentation tier isn't costly, you're better off passing the sort to the database. It may have the data already ordered, either because the a similar query or subquery is cached, or because the database has an index on the sorted column. Moreover, if you're writing the sorting algorithm yourself, rest assured you won't do as good a job as your db vendor, whose code has already been pounded and cleansed by heavy usage.

--
If at first you don't succeed, skydiving is not for you

Re:"NULLS are bad." quote by MattRog · 2004-06-28 08:15 · Score: 5, Interesting

To those not "in the know" here's some further clarification:

"The use of the terms "flat tables" or "2D tables" to describe data stored in a relational database is wrong, he added."

Basically what I take from this is that the table (e.g. SELECT * FROM foo) is simply a convenient logical representation of a stored relation. That is to say, foo can be implemented by the DBMS as a linked list, a tree, any data structure. The problem is that current SQL DBMS products do NOT do this and so we have the associated performance problems with normalized schemas. If the DBMS was truly a RDBMS then it could optimize the physical storage to improve performance.

When asked if the relational model was implemented soundly in today's systems, Craig Mullins' instant reply was "no," but he doesn't think the situation is as bad as Date says it is.
"We're doing production work and delivering value," Mullins said. "Isn't that what it is all about?"

The question is not "Are current SQL systems providing value" because certainly they are. They overthrew the hierarchic DBMS products for good reason - they were better. The real question is "Are the current SQL systems providing all the value they can". One can simply look at the wide array of DBMS offshoot products like XML DBMS, so-called "Multivalued DBMS" etc. to know that there are significant limitations of SQL products - ones which Date/Pascal/Darwin stress are not limitations of the Relational Model but merely these SQL products. To put words in their mouth, but I don't think they'd disagree at my summation, they'd suggest that if someone were to implement a Truly Relational Database Management System that these other products would quickly become obsolete.

--

Thanks,
--
Matt

Relation arithmetic by Baldrson · 2004-06-28 08:22 · Score: 2, Interesting

While at HP's E-speak project I spearheaded some work in reviving Bertrand Russel's relation arithmetic based on the work of Tom Etter, a researcher who had been working at Interval Research on some advanced theories of quantum software. We were trying to solve some basic problems with the way RDF and predication were being pursued in systems like Cyc and the semantic web. Unfortunately, basic research like this is rarely afforded any support at all, and what little support I was able to get wheedle out of the E-speak project dried up after a few months. At least we had a preliminary paper written up for future work.

I described a general vision for this sort of formalism in a prior slahsdot post. Suffice to say some progress has been made since then -- and work in other areas is starting to converge. There is much yet to be done.

--
Seastead this.

Error: User doesn't know how to use program. by LostCluster · 2004-06-28 08:24 · Score: 5, Interesting

Most of the problems that I've seen with SQL commands that are more complex than they really should be are a result of mistaken assumptions made during the design phase of the database. As a result, extra tables get added late, and therefore create new "features" that code then has to be revised to take advantage of...

XML's going to be no better in this area. Mistakes made during the design phase will always come back to haunt while implementing and using the system. If a single query can't return the results desired, then that should have been thought of while designing the tables of the DB. Trying to get a query to specify "All things that are red" like Pascal suggests is only going to work if all objects implement the "color" property the same way. If somebody uses CMYK, somebody else uses RGB, and a third uses Play-Doh color names, it's still gonna be a mess that requires code to figure out who really matches whom.

I don't see how this "new model" fixes the real problems with working with SQL between databases that weren't designed to work with each other.

Re:Isn't XML semi-object oriented? by Doctor+Faustus · 2004-06-28 08:24 · Score: 4, Interesting

Of course XML is going to be hard to represent in a relational database.

Generic XML, sure, but you can always layout your XML in a relational style, like this:

<root>
<Table1>
<Table1Row Table1RowID="1"/>
<Table1Row Table1RowID="2"/>
</Table1>
<Table2>
<Table2Row Table2RowID="1" Table1RowID="1"/>
<Table2Row Table2RowID="2" Table1RowID="1"/>
</Table1>
</root>

Join support would be nice for simple jobs, but this works really well for more complicated jobs in XSLT. You can use a for-each on "/root/Table2/Table2Row", calling a template and drill down to "/root/Table1/Table1Row[@Table1RowID='$Table1ID'] within the template. This lets you use whatever hierarchy you want, rather than being stuck with the one hierarchy the original designer chose. Just like real relational databases.

Re:Isn't XML semi-object oriented? by tcopeland · 2004-06-28 08:39 · Score: 2, Interesting

> XPATH queries take a bit of getting
> used to, but you can query on

Right on. Another nice thing about XPath is that it can be mapped onto other hierarchical structures. For example, the Java static analysis utility PMD uses XPath to query Java source code for problems. This XPath query checks for empty if statements:

//IfStatement/Statement/Block[count(*) = 0]

Good stuff; more XPath rules are here. Props to the Jaxen and SaxPath guys for their fine work!

--
The Army reading list

Re:"NULLS are bad." quote by MattRog · 2004-06-28 08:48 · Score: 2, Interesting

I'm not quite sure what you're getting at but if you've read his work he's never saying that the concept of a missing attribute is "bad" just that the "NULL" method of representing it as such is. He proposes several different methods in his book which I think I mentioned somewhere else.

If you're interested, it's Chapter 10 - "What You Don't Know Can Hurt You: Missing Information". Take a look at it for a more precise definition.

--

Thanks,
--
Matt

XML Misunderstandings by Decaff · 2004-06-28 08:50 · Score: 4, Interesting

The author seems to have many serious misunderstandings about XML.

The fact is that in order for any data interchange to work, the parties must first agree on what data will be exchanged - semantics - and once they do that, there is no need to repeat the tags in each and every record/document being transmitted

A major point of XML is that the semantics should be explicit. If you don't repeat the tags, you reply purely on position to indicate meaning. This is a Bad Thing. For example, it does not allow sparse data in which non-default or null values can be excluded.

Inter-system data exchange requires an agreed-on efficient machine-readable delimited file format.

XML was designed to avoid the issue of 'yet another machine-readable format'. XML can be read reasonably efficiently, but always preserves meaning, ignoring the horrors of character sets and byte order. Compressed XML is a very efficient way to transmit data.

An important part of XML design was that it should allow information to be expressed in a way that is independent of the software that uses it. In this way, it has something in common with SQL. The point of 'human reading and writing' is that in the last resort, you still have you data! It also makes data transfer hugely easier to debug.

There is also a lot of confusion about the order of tags in XML. Its possible to specify in a schema or DTD that some tags are required and should be in a certain order, but its also possible to just not care about order. XML is neutral about this.

Things you use XML for by Decaff · 2004-06-28 08:53 · Score: 2, Interesting

Use Open Office? unzip that .sxw or .sxc file, and what do you see? XML.

You probably use XML a lot without knowing it.

Re:My Two Cents by Doesn't_Comment_Code · 2004-06-28 08:53 · Score: 2, Interesting

I've used Oracle and MS SQL Server in environments ranging from web sites to enterprise apps. The biggest problems I've seen are not usually problems with these DBMS; they are problems with the application.

I've mainly used MySQL for my large projects, and like you, those projects have not had more than several million entries. But my experience with SQL has been wonderful. I have spent several hours pounding my head on a desk trying to understand why something did or didn't work. But all my applications have run as smoothely and as quickly as one can expect.

I think XML is a step backward, if you're using it to replace a SQL Server database.
I agree - provided your application is best suited for SQL to begin with. There are some small problems where flexibility (XML) is more important than the great features SQL has to offer. I can't see a large number of XML databases at the core of huge operations though.

--

Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.

An extreme position, but with good points by MobyDisk · 2004-06-28 09:03 · Score: 3, Interesting

This guy is really smart. He takes issue with the design of SQL, and with some of the commonly-used relational-database concepts (such as the NULL). His criticisms here are valid, but his position seems to be extreme. With that said, he is absolutely right about XQuery/XPath/XQueryX++/whatever.

XML is great for data exchange. Schemas are a wonderful way to describe data. But it is completely inappropriate for querying. If you need to query XML, you should import it into a database then query it that way. The very design trade-offs made when building XML were to make it extensible, and hard to query. XPath is nice to have for simple dinky import scripts. But trying to build a whole hierarchical query language on top of XML is silly.

If someone wants to build a standardized hierarchical query language, that's great. Very few people use hierarchical databases and need such a language, but I wish them luck in that endeavor. But don't pretend that it is appropriate to use it on XML, or that it has anything to do with XML. XML is an interchange format. Leave it where it works well.

Re:Isn't XML semi-object oriented? by pacc · 2004-06-28 09:07 · Score: 2, Interesting

Sure, this is a nice syntax to describe one-off problem. Though for a database you'd hope that it is done once to transform it to something useful instead of 100 times per second.

//*[@color = "Red"] is taken as a good example in your sense but is actually the worst case in the article. A database would hopefully aleady have track of "red things" and shouldn't have to traverse a tree full of other useless data. Multiple trees spanning the same data is only a contradiction in XML.

Solution Today! Alphora Dataphor by Anonymous Coward · 2004-06-28 09:07 · Score: 2, Interesting

To see the solution to almost all of the problems identified by Pascal, Date, Darwin and Codd please checkout Dataphor from Alphora. I am a user of their project and it is a marvel. It uses any vendor's DBMS for storage and heavy lifting, but adds untop of it true relational lanaguage and constructs. The system is so highly formalized that Dataphor provides *instant* derived user interfaces for all your tables in your existing database. After taking a few days to supply the missing information about your data that you currently cannot do in your DBMS the instant UI knows how to display all tables and forms in a logical workflow. It also handles all form validation and referencial integrity. If you change your database schema the UI is automatically updated because it is *derived*. It is not the result of some lame MS Access style wizard.

SQL is not a Relational Database by Anonymous Coward · 2004-06-28 09:14 · Score: 2, Interesting

The consensus clearly is that the Relational Database is an unqualified success, and that is true. But don't confuse SQL and the Relational Database. SQL was an early hack intended for querying the RDB only.

Part of the problem is that the RDB was such as success that people do confuse the two. SQL, however, has shortcomings, that are addressed in the third manifesto.

This product http://www.alphora.com/ is a serious attempt ot address the shortcomings of SQL, with a new language for programming an RDB, and a new Query language.

SQL itself is not complete, and even the RDBs out there today break many of the rules as set out for the correct RDB.

Fabian Pascal is a bit of a nutter, you wouldn't ask him around to a dinner party, but he is correct.

XML is a badly thought out hack, but it works to an extent.

UML is hardly that, the U is really an act of hubris.

SQL can be improved, made more complete, and RDBs should be made to confrom more closely to the rules of what makes a real RDB.

An attempt at a better solution. by perlchild · 2004-06-28 09:24 · Score: 4, Interesting

Until then, SQL works. What more can you say?

That SQL is mostly a kludge?

Let me restructure that...

The experts who know what the heck the relational model is and is not argue that the language we use to query a specific type of relational-like database, that they call the SQL databases, the SQL language, has unsufficient representation power to represent the whole model, and hence can't be used to get the whole power of the model.

That's certainly interesting, and leaves us to ponder two things:

1) a) Just how much more power could we get? b) And at what cost?

2) What about alternatives, can we get that same power elsewhere, cheaper?

1)a) is beyond my mere predicate logic skills at this time.

1)b) The cost of a model for data storage, representation and management is directly linked to it's adaptability to the data you represent. The article mentions a lot of errors with NULLs(I remember thinking, while reading the article: a NULL was an attempt by the language developers to simulate an interrupt in a language that doesn't have any, this is of course, an oversimplification on my part, but considering stored procedures and triggers[SQL's own exceptions] weren't around yet, they sound like a good basis for further research.) There are a lot of other hidden "costs" for people who use a relational tool for not-quite-so-relational data, but that's not part of the cost of a relational language, per se.

2) Brings up a few notions: there are the types of databases relational databases replaced, like network databases, and there are attempted replacements, like object databases. There are also further possibilities that I will explore deeper later. Object databases can certainly be interesting, in the sense that by bundling data with code, you can have data that can handle itself, in the very basic sense that we humans apply it to ourselves. The problem is that we tend to have a very fuzzy, real-world view of such data, and can't work with it that easily(we are using computers to make data easier to work with, so if we had software that could handle real-world data complexity outside of our brains, we wouldn't be having this discussion). Object data is certainly very adept with data that has some broad commonalities, re-usable behaviours, and follows set-rules. We can call those business rules for now. Those business rules imply that a certain subset of "The Universe" interests us more than the rest, and follows predictable commonalities, making our mental models a lot less complex. On the other hand, object methodology is not always well understood, and the documentation and models it generates sometimes dwarf some production systems implemented to solve the same problems.

Now, at the beginning, Relational Systems were data-handling "toolkits" set to handle specific subsets of data, who also followed business rules.
That's interesting to my purpose, simply because I can envision, at this time(some vendors have similar concepts, but don't formalize them in any way), a new set of "toolkits" where the relational model is only one of many "toolsets" available.

Indeed, what is probably the most used sql-based server available(MySQL) has been lacking true relational functionality for most of its life, yet that doesn't make the tool less useful for most of its users. Future toolkits can inspire themselves by focusing on specific uses of technology to solve specific problems, and yet keep the SQL as a sort of security blanket, since that's where most of the training about databases(and indeed, usually most of the training about data, period, is in database classes and perhaps, some algorithmics classes)

After reading the linked articles about XML's weaknesses, though, I don't think it belongs into any toolkit of that nature. Simply because the tool that belongs in the toolkit is the "self-documenting data", and XML's weakness in that area is evidenced there. XML's early focus as a medium of e

Re:Isn't XML semi-object oriented? by mcc · 2004-06-28 09:30 · Score: 2, Interesting

The "arbitrary XML" part. You must have existing mappings set up to process the XML. New forms of XML thus require a great deal of work on the part of the DB developer

Um... I haven't read this article yet, but to respond to your comment.. um, no, not really. Here's a trivial possible setup to describe an arbitrary xml tree:

Table tagname: int tagname_id, varchar name
Table attrname: int attrname_id, varchar name
Table tag: int tag_id, tagname_id tagname, tag_id parent
Table attr: attrname_id attrname, varchar value, tag_id parent

What's so hard about that? I mean, given, you'd have to do a little bit more work to describe non-tag data or attributes with types other than string. You'd need an "order" column on the tag table if order of xml elements is significant. And it would kind of suck to get too much data out of this with SQL, you'd need either a rather large join or a big flurry of small sql selects (I assume that's what the article has to do with). But the relational model holds up just fine here. I mean, really, XML is just tagged heirarchal data, nothing that special...

--
Irritable, left-wing and possibly humorous bumper stickers and t-shirts

Re:Isn't XML semi-object oriented? by AKAImBatman · 2004-06-28 09:32 · Score: 2, Interesting

A database would hopefully aleady have track of "red things" and shouldn't have to traverse a tree full of other useless data.

In fact, that's exactly what a good XML database will do. By indexing all the accessible fields, it's able to provide fast access to database wide queries like the one you suggested. Of course, these indexes do come at a cost of disk space and memory, so not everything is quite rosy in Denmark. :-( The upshot is that with processing and storage power at all time highs many (most?) databases have no need to concern themselves with running out of storage resources. More than enough resources for a multi-gigabyte database exist on even the smallest modern PC.

--
Javascript + Nintendo DSi = DSiCade

The academia is forgetting the whole point of XML by anttix · 2004-06-28 10:19 · Score: 3, Interesting

"XML is not meant to be read by humans, it's a data interchange format and thus meant to be read by machines" - Good heavens!
Someone saying something like that has really got a BS in BS! Or perhaps even worse: a PhD in BS.
XML is all about programmers being able to understand the data! Yes, because we are not anywhere near that nirvana of fully semantic systems that can (semi)automatically understand each other. NO! Programmers have to do the work to make the systems fit together and XML gives them the advantage that they do not have to reverse engineer another proprietary data format or dig into a horsepile of documentation. XML makes it easy to understand how to process the information at hand - without any extra work!
Also it's great format for storing small amounts of constantly changing data (like user preferences) cause it's extensible and with only a tiny bit of effort backward compatible as well.
Anyone trying to use XML for processing large amount of data (like data warehousing) is either nuts or doesn't give a damn about the speed or costs.
However anybody using XML for long term data storage is a genius since other "more efficent" formats will be obsolete ten years from now and the software that can read it can be extreamly difficult to obtain (anybody who has tried to decode data from some long gone accounting package from the '80-s knows what I am talking about).
So yes XML is self describing only to humans and that's the whole point of it. Formalizing data semantics is not the goal of XML, has never been and will never be, thats what we have RDF, RDFS, OWL and other nice initiatives from Semantic Web movement for.

Re:What the?!... by OscarGunther · 2004-06-28 10:38 · Score: 4, Interesting

Ever read any Trotsky? Or Lenin? Pascal sounds like any of the old Communists (not the later totalitarians, but the true believers who were old enough to have known Marx or Engels personally). His diatribe is entirely typical of the species. He gratuitously belittles his targets:

"Natural" perhaps for those without a grasp of data fundamentals.

(Yes, Fabian, the co-inventor of SQL probably doesn't have a grasp of data fundamentals.) He sprinkles his text liberally with "quotes" and italics so you can "feel" his anger, his dismay -- indeed, you can almost hear him spitting the words in Chamberlin's face. You can almost hear him chortling to himself as he bangs away on his keyboard, demolishing his opponents.

He venerates the Founder. Finding a quote that supports your argument settles the matter. Codd the Wise avoided the errors that Chamberlin made; clearly the latter is the inferior intellect. And there's only a small core with the Founder. "We" are the true believers; all others are apostates and heretics.

Overstatement is a definite tell. Chamberlin's explanation of the difference between SQL and XML data is "unbelievable." The nesting argument is "ridiculous." Industry pronouncements are "incoherent." And most prominent of all is the cutting remark that's meaningless to anyone not in the know or already in agreement:

Unbelievable. Any wonder that SQL fails so abysmally at relational fidelity? We may not expect the average practitioner to distinguish between pictures of relations, which are "flat" due to the presentation medium, and relations of N cardinality themselves, which are N-dimensional logical structures. But we sure expect "industry experts" to be aware of the difference.

And I sure expect a polemicist to know enough about his art to understand when he's descended into self-parody.

Re:SQL sucks? by johnnyb · 2004-06-28 10:55 · Score: 3, Interesting

You can do some really neat tricks if every row is identified by a globally-unique integer (i.e. - not just for the table, but the whole database). For example, I can create a notes table that looks like this:

create table notes(rownum bigint primary key default nextval('rownums'), the_row bigint, notes text);

and have the_row be a foreign key to the whole rest of the database (unfortunately, I'm unaware of any DB that has direct support for this). Then, I can use this table to attach notes to ANY record in the whole database. You can use such generics very easily if you have a common format for a primary key.

In addition, only very rarely do I find a primary key that I know for absolute certain will be unique.

--
Engineering and the Ultimate

Conflicting goals by esap · 2004-06-28 11:12 · Score: 2, Interesting

I think this whole dispute is not necessary. It's just a simple case of conflicting goals. One side wants "efficiency" and another wants "extensibility". What both sides miss is that real systems can't afford to choose just one of these, you have to get both. So you have to have just enough extensibility to allow reasonable extensions, and still attain reasonable levels of efficiency. But trying to get all of either thing will totally lose the game. So the XML camp is wrong to think that extensibility is everything (ever tried parsing XML in a real-time system?). And the "binary transfer format" camp is wrong to think efficiency is everything (ever tried to make two versions of your protocols to interoperate?).

--
-- Esa Pulkkinen

Re:SQL sucks? by iabervon · 2004-06-28 11:18 · Score: 2, Interesting

At the lowest layer, it was essentially a system which let you put together a SQL command incrementally. If you want to do an insert, you create an InsertCommand for the table, and then call setColumn() methods for the columns with the desired values. If you want to do an update, it is similar, but you can also add restriction clauses, either by simply requiring an exact match or with a method taking Operand, Relation, Operand. There was support for, essentially, "insert or update row" (like "create or replace view"), which would do the correct operation for the current state of the database without having you write each case and test them yourself. The support for "select" was somewhere in the middle of the range of functionality supported by different SQL extensions.

So it actually followed almost exactly the functionality provided by SQL, but it allowed you to put it together incrementally rather than with a string constant, so you could have a method on a different object which would take a query so far and a user and apply to the query the access restrictions for that user. Applicability to different problems should, therefore, by approximately the same as an average SQL implementation. It also had the advantage of being able to generate SQL suitable for different databases, so it had a wider applicabilty than completely portable SQL would.

Of course, the reporting engine was solving a specific problem in a clever way, using the framework. It is true that I extended the framework in a few ways (to include GROUP BY and nested queries) which writing the reporting engine, so you could argue that, until I used it to solve the third (or so) dissimilar problem, it wasn't yet sufficiently general. Still, it would regularly build up substantially more complex queries than either I or our DBA could have written in SQL.

A particular point I want to make about it is that it wasn't an OO databse system; it was an OO interface to queries in a relational database system. A lot of OO database systems in OO languages have been written, and these are generally less applicable to a large class of problems than relational databases. What most people write are persistance layers for objects, which are much less useful in a lot of situations.

Re:Isn't XML semi-object oriented? by shelterit · 2004-06-28 11:42 · Score: 2, Interesting

Actually, many such things exist, RDF and Topic Maps XTM (http://shelter.nu/art-007.html) being two popular ones.

--
-- Home, James - it doesn't matter where that thing has b

Re:"NULLS are bad." quote by kcbrown · 2004-06-28 12:09 · Score: 2, Interesting

SQL NULLs are the worst thing since unslicable bread. They break boolean logic. You would think that if (X = Y) is false, then (X != Y) would be true. With SQL, if either X or Y or both are NULL, then any expression evaluating it is false.

No wonder you think SQL NULLs are the worst thing -- you can't even get the semantics right.

When either X or Y (or both) are NULL, then any expression evaluating it is NULL. Not false. Not true. NULL. Any SQL implementation that behaves differently is broken.

It's not the SQL standard's fault if your code's logic can't handle that case. Nor is it the standard's fault that you can't see fit to NOT USE the NULL feature when you don't want to (and any reasonable database even goes so far as to give you the option of making *sure* you don't use it -- that's what the NOT NULL declaration when defining a column is for).

Getting rid of NULL isn't going to help you when you suddenly discover that you really DO need to be able to represent "missing data" somehow.

--
Use 'slashdot stuff' in the subject line in any email you send me if you want to get past the spam filter.

What an arrogant twit by Salamander · 2004-06-28 12:15 · Score: 2, Interesting

It's one thing to say that either SQL or XQuery have their problems, because they do. It's quite another to say that SQL is bad because it doesn't live up to some arbitrary never-achieved (and perhaps unachievable) standard of relational purity that even Codd himself found superfluous. When Pascal does nothing but the latter, and in addition takes a dozen thoroughly unprofessional swipes at Chamberlin for having been involved in both SQL and XQuery, his professional jealousy is becomes thick enough to choke on. I wish he would, so we would be spared the incessant ranting of someone whose whole career has been marked by a lot of words and not a single deed to back them up.

--
Slashdot - News for Herds. Stuff that Splatters.

Re:The Criterion here is no longer machine efficie by geekoid · 2004-06-28 12:42 · Score: 2, Interesting

"The point of al those repeated tags is that machine time and bandwidth is very cheap now, but human time is not"

yeah, you tak a file that has 1000 linse, each line has 1024 characters, and 300 fields. Transmitted from Brazilla at 28K.
Now convert the to an XML format, and then try sending it. suddenly it take 10 times longer to get the data. Literally going from a couple of hours to all day. Not exactly cost effective.

XML is easy to read, if you know the format and context in which your reading it. It still needs to be designed, you still need to go to meetings, and you still need to create paperwork on the structure.
It's got its place, but it is not the end all, or even cloce for that matter.

--
The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect

Pick by Anonymous Coward · 2004-06-28 12:44 · Score: 1, Interesting

I work with a 1960's era throwback of a database that uses Unidata, (or U2), a derivitive of a Pick database. It uses multivalue attributes and sub-multivalue attributes in fields. The technology predates SQL and is dying (unlike BSD).

This is relevent because this dying technology inherently supports non-flat data. If you want to store a purchase order as a single record you can with as many line items and as many deliveries per line item as you like. Field size is limited only by available disk space.

After working with this non-flat data for a couple of years, I have concluded that anyone who wants to go back to it is nuts. It does work, you can run a business on it, but it is cumbersome.

Yes it's true that building up an object is easy because you can get a whole object in one read, but you end up with a bigger problem on the other side. When you want to look at your data differently you end up twisting and bending to get it.

For example if you wanted all the purchase orders due on a certain day, you could easily end up reading every record to find them. Yes, carefull indexing solves the problem and you can index the value of a multivalue list to get the po number, but flat tables and relations will solve the same problem more simply.

Xplain by Anonymous Coward · 2004-06-28 12:59 · Score: 1, Interesting

Some years ago, Xplain was a very good solution to the SQL problem (in a project i worked on for the Guadalajara government) Someone uses it today?

Re:Isn't XML semi-object oriented? by rycamor · 2004-06-28 13:48 · Score: 2, Interesting

Thank you... yes. While Fabian Pascal may be arrogant at times, that is just a personal thing. What he says should be evaluated for whether it is true, not whether it is arrogant.

And yes, the relational model is indeed *much* more than just flat tables with keyed relationships. I thought this upon my first encouters with the relational model, but after actually studying the concepts, I realized that there is in fact something much deeper going on, and that it has a conceptual integrity that I only wish could exist in the most of the rest of computing.

Re:Isn't XML semi-object oriented? by dcam · 2004-06-28 16:50 · Score: 2, Interesting

Hmmm. You make some very good points. I may have to revise my position on XML databases. Indexing may certainly solve some of parsing problems inherent in XML, although this is likely to have other side effects. The database would need to maintain an internal index on each record/row/whatever, which might well add to the size of the database. I'm just thinking aloud here.

I also see the benefit of the flexibility of XML. Black boxes objects and all that. For example when you reach a user object, just pass in the contents of the tag to some sort of class factory (or an equivalent mechanism). As more information is added only the user object needs to change.

Arguably performance problems with XML may drop as more and more people implement systems using XML, something similar happened for RDBMSes, so long as there isn't some feature of XML that is inherent in its design that makes it perform poorly. Your comment on indexing counters what I believed was an XML performance killer.

Thanks for the comments, this has given me quite a bit to chew over. I think I need to do some further research on this and work it through.

--
meh

Re:Relational does NOT have to be "hard" by perlchild · 2004-06-29 04:13 · Score: 2, Interesting

You quote imprecisely in this.

I was saying
1) a) Just how much more power could we get?
is beyond my mere predicate logic skills at this time.

Explaining just what power is missing from SQL to explain "the full power of relational models" in the context of the article, where it is said relational algebra can represent tree-like structures etc... is beyond me.

I am not interested in how hard relational is to use.

1)a) was referring to expressing, just what's missing in all the SQL servers, that could be reached if they were truly relational. In this case it has a lot more to do with how we define what you can AND on, a lot more than what operations you can use. I can understand AND just fine, I have a lot more problems with understanding how the experts say on one breath "it's not flat tables anymore" and on the next, no explanation on what operations you can use on a non-flat table... Say is the AND of a pair of arrays of floats the matrix product of the ands of the arrays? In what precision? Or are they just saying "the tables aren't flat, they can be linked" but you gotta restrict your operations on the virtual flat tables that overlay your more esoteric structures?

ADABAS D had arrays and such(it's now MySQL MaxDB I believe) inside sql-addressable tables, but that was a relatively poorly supported feature, once which wasn't well modeled( you can see from my examples why it wasn't well modeled, I think, it brings up questions about the domains of each operation)

We can certainly agree on the lousy marketer and packager, but you know what? The marketers won't help you write your complex queries either, Codd's theories just might(well again, they might not, but that's what the original post was about).

I'm sure I misunderstand more about predicate logic than you've understood in the time it took you to read these lines, but I was aware of those limitations, and kept my post in line with that.

You however chose to limit the relational model's power, to the logical operators it uses, without mentioning that the relational model also includes rules on how you can organize what objects you can apply those operators TO, which is a rather important distinction, and probably closer to weaknesses of SQL, as evidenced in the article anyways. Anyone can write relational algebra if you got scalar booleans, but if you got an array of complex numbers to multiply by a matrix of floats, and all that needs union(or any other type of join etc...) with userids, full names and addresses, it gets a little bit more fun.

Thus, we extended the "language" of Boolean to closer match human language. As long as our extensions are defined using the primatives and don't break the rules of our system, we can have our cake and eat it to. True Boolean geeks can still use NAND if they want.

Well "the rules of our system" also include just what we can NAND (or AND or XOR) ON, not just those operations. The definition domain of those operations is also important, that's why in SQL's fuzzy logic, ANDing with a NULL gives you a NULL.
Now if there's an impedence mismatch between sql and relational theory, it's probably on the definition domain of the operations(NULLs, non-scalars values, strings) and not on AND itself, which is pretty much a "simple" operator, at least when you limit yourself to scalar values like booleans, and logical predicates.

Re:You Are Missing the point of NULLs entirly by IBitOBear · 2004-06-29 10:08 · Score: 2, Interesting

In your example, you still need part_number in table2 unless you allow only one part per order ID.

The iterative join needed to then needed to print 10,000 invoices of average-of 20 parts per invoice gets too expensive to be reasonable.

It can be done, but it becomes anti-helpful.

Remember, normalization can go too far. By splitting the table you have now doubled your storage requiremet and multiplied the training, implementation, and execution costs since every operator (person or statement) and ad-hoc-query agency now needs to "know" that these two tables are needed to fulfil the "normally dense" query.

(That is, over the lifetime of the database, most orders will have been shipped so most quantity shipped datum will be present.)

With NULLs properly used, you can "SELECT COUNT(quantity_shipped)/COUNT(*)" on a single table and the answer will come from the traversal of one table or index. With two tables it would be a join.

If you actually look at the assembled wisdom of the field, you will note that you are supposed to STOP NORMALIZING when it stops helping and starts causing harm. /sigh

--
Innocent people shouldn't be forced to pay for inferior software development.
--"Code Complete" Microsoft Press

Slashdot Mirror

SQL, XML, and the Relational Database Model

40 of 453 comments (clear)