Slashdot Mirror


What Do You Know About Databases And XML?

Dare Obasanjo writes: "XML has become a pervasive part of significant segments of software development in a relatively short time. From file formats to network protocols to programming langauges, the influence of XML has been felt. I have written an overview of XML schemas, XML querying languages, XML-Enabled databases and native XML databases. Below is a shortened version of the article." Obasanjo's original OODBMS article has been updated to reflect more of the disadvantages between picking an OODBMS over an RDBMS.

8 of 257 comments (clear)

  1. xml is an interchange format, not a storage format by TechnoVooDooDaddy · · Score: 5, Interesting

    Databases are for storing data. End of Story.
    Oracle is taking some BIGTIME performance hits for stacking all that OO crap in there, and MS SQL Server is seeing the same thing now that they've got the XML in theirs. Don't believe me?
    Why is NASA switching to MySQL from Oracle and noticing speed increases?

    Don't get me wrong, I'm a big fan of XML.. as a data interchange format.. but when i want tight storage and quick retrieval, give me a normalized RDBMS any day of the week. Because that's what it's for.

  2. Re:Super short intro to XML by sporty · · Score: 3, Interesting
    Well it really depends on what you are doing and how youare trying to do things. Perfect example is internal documentation. It isn't in a 2d format (usually).


    Lord knows how annoying it is to write a document so generic, that translating it to other forms can be possible. XML is the perfect format since there is always some middle ware that can turn XML say, into PDF's or HTML. To html, you have XSLT, its a no brainer. But to say a PDF, you can use another scripting language to process the XML and write out the PDF binary. Now we can create a handbook and have some cool stuff on the web without destroying the site.


    XML can also have internal uses for say, templating. Using XSLT, you can build a tempalte that would do cool stuff like

    [html][body]Hi [username/][/body][/html]

    which would be translated into something like

    [html]
    [body]
    [script language="php"]
    getUsername();
    [/script]
    [/body]
    [/html]

    VERY nice stuff for designers to use.

    Yes, I know my php tags and html open/close entities are arcane/wrong... but this is to make it easier to type on my part :)

    --

    -
    ping -f 255.255.255.255 # if only

  3. Re:xml is an interchange format, not a storage for by illtud · · Score: 3, Interesting
    Don't get me wrong, I'm a big fan of XML.. as a data interchange format.. but when i want tight storage and quick retrieval, give me a normalized RDBMS any day of the week. Because that's what it's for.

    But what if your data representation is already an XML schema? And a pretty complicated one at that? For example, look at METS : The METS schema is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide Web Consortium. The standard is maintained in the Network Development and MARC Standards Office of the Library of Congress, and is being developed as an initiative of the Digital Library Federation.

    Have a look at that schema and tell me how you'd store that in a traditional RDBMS (I'd be interested if you could, because I know SQL, I don't know OODMBS or XML repositories - this is painful for me). Databases have been for storing data, but when your data is already a complex XML representation of an object, there's little use in saying don't use OODBMS.

  4. XML + XSL(t) client side database. by jelwell · · Score: 3, Interesting
    I actually wrote a client side database recently, where all the processing is done on the client. I use Javascript, XML and XSL(T).

    It requires Netscape 6.(not out yet), IE 6, or Mozilla 0.9.5+ because of it's use of XSL Transform functions.

    You can view the page here.

    Joseph Elwell.

  5. RFC (corrected) by Reality+Master+101 · · Score: 5, Interesting

    Can we please, please, please append the definition of XML to allow "</>" to close whatever the last tag was?

    That simple change would probably cut the size of the average XML file in half.

    (corrected post, please moderate my other one down. I have plenty of Karma to spare...)

    --
    Sometimes it's best to just let stupid people be stupid.
  6. XML is the storage format for some things by aspillai · · Score: 3, Interesting

    I've used XML extensively and in someways agree with people saying XML isn't a storage format. But right now there are lots of applications where XML is the perfect storage format. Example: Consider a order processing company who brokers orders for company to company. One option would be to define a monolthic db schema to take care of what each company would like in their order. Another would be to define a really abstract schema to facilitate handling generic order forms. The problem with the first is, each time XYZ wants something added to an order form, you need to change the schema. With the second, it'll work but you'll need exceptionally discplined and smart programmers to deal with the abstract layer. This doesn't even deal with migration issues.

    The solution is XML. You create a XML Schema and start storing stuff. Some company wants more parameters - no problem, extend the schema. You need to migrate previous XML docs to adhere to the current schema, use XSLT. Or you can add these as optional parameters and every document that exists already will conform to the schema.

    Speed in XML is an issue. But people who think you need to read the entire XML document to process don't know what they're talking about. You can do modular processing. Also, you can do smart indexing to increase speed. And in a production environment, you turn Schema cheking off unless you're getting documents from untrusted sources. Will XML ever be as fast as RDBMS? Probably not. But XML doesn't store relational data. And with current research in XML Query languages, I'm sure XML's speed will be good enough for most applications in the future that deal with fuzzy schemas. (If you need high performance DB, then you have to bite the bullet and use a RDBMS).

    My two cents.

  7. Triple stores by macpeep · · Score: 4, Interesting

    At my previous job, I implemented an experimental app that was inspired by RDF (Resource Description Framework) and triple stores.

    In a triple store, you have objects that are defined by a set of properties. The word "triple" comes from the fact that you have triples of objects, properties and property values. For example, you could have a person; John Q, who has an age 37, a phone number 1234 and an employer Foo Ltd. Foo Ltd. in turn has a phone number 5678 and any number of other properties. This forms the following tripples: John Q --age--> 37, John Q --phone number--> 1234. John Q --employer--> Foo Ltd. Foo Ltd --phone number--> 5678.

    When you look at these, you can see that Foo Ltd. is both the employer of John Q (a property value) but also an object in itself that is described by a set of properties. In RDF, the tripples form a graph that describes your data. The graph is typically serialized as XML.

    At first, it would seem that this lends itself very well for relational databases. A row in a table would be the object to be described and columns are the properties. The intersection is the value. However, the problem - and strength of RDF - is that you can have any number of properties for an object. Basically, you could have any number of columns and sometimes, the property value is not just a value - it can be a database row in itself or even a set of rows.. or a set of values.

    The app I wrote mapped arbitrary RDF files to relational databases and back as well as provided an API to perform queries on the data. The result of the queries were RDF graphs in themselves.

    While this was quite cool, it turned out to be quite difficult to turn the query result graphs into meaningful stuff in a user interface. Also, queries on the RDF graphs could turn out to be extremely complex SQL queries... Most of these problems were eventually solved but the code wasn't used directly for any real world app, except heavily modified as a metadata database for a web publishing system.

  8. A markup weenie rebuts. by rodentia · · Score: 4, Interesting

    A large number of otherwise intelligent posters would seem to have been hit by the runaway XML hype train. Examples culled from various posts:

    ...[not a] major advance in computer science.

    ...[bogus] contribution to programming language design (re: XSL)

    ...[transfer data between businesses,] which is the problem XML aims to solve.

    But these are critiques directed at the hype machine, not the specification. This is really distressing me. The machine is so efficient that there are API's for XML (which shall remain nameless) being written and optimized for message passing which cannot handle mixed content as a matter of design. As though it were somehow so useful in this area that a section of the spec should be tossed to make it efficient. As though there weren't already gallons of ink being spilled on EDI, etc.

    XML was not designed to replace S-expressions, to facilitate cross-platform communications, revolutionize EDI or DBMs, to theorize about language design, yada, yada. XML is just that, an Extensible bloody Markup Language, a document tagging scheme. In this regard it is a tremendous advance. It is 80% less suck, by volume, than what went before. If you think your XML parser is bloated, have a look at any SGML parser. Part of what gets stripped out is tag minimization, the absence of which another poster complained about.

    Hey, its text and not binary because I need to write it and read it. Yes, Virginia, I've got 400 users tagging XML in flat-file editors. They complained about the loss of tag-minimization, too. But my svelte little Xerces needs a hand to stay so lean.

    The goal is to get structural and semantic information into my documents. (Yes, it's data, but a special kind of data called a document. You can call the message your passing a document, and use XML to format it, but there is some overhead the hype machine may not have emphasised in their rush to market.) I also strive to eliminate formatting or presentation instructions from the document (or hide them in PIs) to facilitate multi-target outputs. This lets my typesetters typeset and my data-entry people enter data.

    XML is designed to bring something of this model to the web. HTML is too presentation oriented. SGML is too bulky. That's what it do, babe. I take a single source file from somewhere on the filesystem, incorporate pieces from elsewhere (entity resolution, DB queries, etc.), turn it into one of five possible outputs. I use two different pagination engines with different proprietary formatting macros, XSL(T|FO), or a trap door on the bottom to dump pretty-printed ASCII. Its a publishing tool.

    --
    illegitimii non ingravare