Slashdot Mirror


What Do You Know About Databases And XML?

Dare Obasanjo writes: "XML has become a pervasive part of significant segments of software development in a relatively short time. From file formats to network protocols to programming langauges, the influence of XML has been felt. I have written an overview of XML schemas, XML querying languages, XML-Enabled databases and native XML databases. Below is a shortened version of the article." Obasanjo's original OODBMS article has been updated to reflect more of the disadvantages between picking an OODBMS over an RDBMS.

26 of 257 comments (clear)

  1. Super short intro to XML by Ars-Fartsica · · Score: 3, Flamebait
    XML solves the interchange problem.

    By this, it is meant that XML allows two systems that do not share a predetermined data exchange protocol to share data.

    Thats it.

    Where two systems share a common predetermined protocol, it is almost always more efficient than XML.

    Applications of XML to programming lang design (XSL) and other domains are largely a waste of time and won't last.

    1. Re:Super short intro to XML by Skapare · · Score: 3, Insightful

      So if someone designs a new (not like XML) format for exchanging data, and manages to get it standardized, then won't this also allow two systems that do not share a predetermined data exchanged protocol to share data? One could also be careful in this design and make sure it is more efficient than XML, not only in space and bandwidth, but also in CPU time and programming time. Now does such a format need to be text based as XML is?

      --
      now we need to go OSS in diesel cars
    2. Re:Super short intro to XML by Rogerborg · · Score: 5, Funny
      • Where two systems share a common predetermined protocol, it is almost always more efficient than XML

      I hear you. The product that I'm working on right now is XML heavy. It's using entirely proprietary data formats, and the XML processing is taking up 80% of the query time. After achieving full buzzword compliance, we decided that the system is way too slow, and now have to strip the whole bloody lot back out again.

      Note that there was no reason to use XML in the first place, other than some designers wanted to put it on their resumes. I kid you not.

      --
      If you were blocking sigs, you wouldn't have to read this.
    3. Re:Super short intro to XML by sporty · · Score: 3, Interesting
      Well it really depends on what you are doing and how youare trying to do things. Perfect example is internal documentation. It isn't in a 2d format (usually).


      Lord knows how annoying it is to write a document so generic, that translating it to other forms can be possible. XML is the perfect format since there is always some middle ware that can turn XML say, into PDF's or HTML. To html, you have XSLT, its a no brainer. But to say a PDF, you can use another scripting language to process the XML and write out the PDF binary. Now we can create a handbook and have some cool stuff on the web without destroying the site.


      XML can also have internal uses for say, templating. Using XSLT, you can build a tempalte that would do cool stuff like

      [html][body]Hi [username/][/body][/html]

      which would be translated into something like

      [html]
      [body]
      [script language="php"]
      getUsername();
      [/script]
      [/body]
      [/html]

      VERY nice stuff for designers to use.

      Yes, I know my php tags and html open/close entities are arcane/wrong... but this is to make it easier to type on my part :)

      --

      -
      ping -f 255.255.255.255 # if only

    4. Re:Super short intro to XML by maraist · · Score: 3, Informative

      The obvious answer is because first you have the hit of decompressing

      You're thinking LZ or huffman. But you could very easily perform utilize tag-id,data-length,data.
      If tag-id and data-length are binary integers, then you reduce any tag combination to 8 bytes (which, except for single character tag-names is shorter). It most definately produces faster read-times, since you read entire chunks without lexical comparisons.

      For 1-level-deep data-structures, this is pretty good.. You can even reduce the tag-size down to 1 Byte (thus have only 5 bytes overhead per CDATA). This is especially good for protocols between web-server apps, and the like. For multi-levels data-structures, you have the choice of either combining all the levels into new tag-types (though this doesn't allow for recursion), or have the reader keep track of state.

      Since this can easily be converted back and forth between XML, what this could mean is that externally XML is used, internally compressed XML is used.

      Note that even this has limited usefulness; only at all useful when interacting with 3'rd party apps, or when being saved to disk (to allow vi-modification).

      -Michael

      --
      -Michael
  2. Same article on kuro5hin by blamario · · Score: 3, Funny

    And they have some intelligent discussion over there too. Please leave it that way.

  3. An other interesting link by Anonymous Coward · · Score: 3, Informative

    There was a good discussion on XML data bases on the XML-Dev mailing list, which is summarized pretty well by Leigh Dodds XML and Databases? Follow Your Nose.

  4. xml is an interchange format, not a storage format by TechnoVooDooDaddy · · Score: 5, Interesting

    Databases are for storing data. End of Story.
    Oracle is taking some BIGTIME performance hits for stacking all that OO crap in there, and MS SQL Server is seeing the same thing now that they've got the XML in theirs. Don't believe me?
    Why is NASA switching to MySQL from Oracle and noticing speed increases?

    Don't get me wrong, I'm a big fan of XML.. as a data interchange format.. but when i want tight storage and quick retrieval, give me a normalized RDBMS any day of the week. Because that's what it's for.

  5. Re:xml is an interchange format, not a storage for by sphealey · · Score: 5, Insightful
    Why is NASA switching to MySQL from Oracle [fcw.com] and noticing speed increases?
    I will defer to you on the advantages/disadvantages of using databases to store OO data.

    However, citing NASA as a source for technology or trends is a bit silly, for a number of reasons. The primary one is this: NASA is so large, and so diverse, that at one of their sites/on one of their projects they use one of just about every technology product you can name.

    I was once running two back-to-back software evaluations for products in the $20-million range. For both applications, the top ten vendors all claimed that their system was "used by NASA for the Space Shuttle". We checked up and guess what - they were all telling the truth.

    So you need a better example.

    sPh

  6. The problem with XML is... by gillbates · · Score: 5, Funny
    that it incurs quite a bit of processing overhead. Not only this, but in order for a validating parser to parse XML, it must read the entire document. This is simply not practical for even modestly sized databases, as most current XML parsers will attempt to read the entire file into memory.

    Granted, XML has some advantages. Data interchange among disimilar clients, for one. But storing XML in a database is a gross waste of space and processing power, and is realistically impossible for all but the smallest of databases.

    --
    The society for a thought-free internet welcomes you.
  7. Re:xml is an interchange format, not a storage for by illtud · · Score: 3, Interesting
    Don't get me wrong, I'm a big fan of XML.. as a data interchange format.. but when i want tight storage and quick retrieval, give me a normalized RDBMS any day of the week. Because that's what it's for.

    But what if your data representation is already an XML schema? And a pretty complicated one at that? For example, look at METS : The METS schema is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide Web Consortium. The standard is maintained in the Network Development and MARC Standards Office of the Library of Congress, and is being developed as an initiative of the Digital Library Federation.

    Have a look at that schema and tell me how you'd store that in a traditional RDBMS (I'd be interested if you could, because I know SQL, I don't know OODMBS or XML repositories - this is painful for me). Databases have been for storing data, but when your data is already a complex XML representation of an object, there's little use in saying don't use OODBMS.

  8. Re:xml is an interchange format, not a storage for by Skapare · · Score: 3, Insightful

    So what do you think of using XML for system configurations? That tends to be in UNIX systems a lot of separate files, traditionally edited with vi although today the tools are getting more and more dummy friendly and have a smaller space of possibiities.

    --
    now we need to go OSS in diesel cars
  9. Re:xml is an interchange format, not a storage for by ergo98 · · Score: 5, Informative

    xml is an interchange format, not a storage format

    Absolutely, positively agree. Not only is XML only an interchange format, but it only makes sense in some situations (for instance if we have an embedded piece of hardware that we have to communicate with, and we're communicating to it from a Windows box, and there is no shared common data encapsulation format, I'd greatly prefer XML (with XSD) vastly over Jimmy the Programmer making up his own data encapsulation format/documentation method/extraction system, but if I have two Windows machines running SQL Server and they're in a common security context and they'll never change, I'd use DTS or replication, not XML).

    and MS SQL Server is seeing the same thing now that they've got the XML in theirs

    The XML "in" SQL Server is surface fluff (I love SQL Server and I'm saying this as a good thing, not a bad thing). i.e. Some modules that'll convert an XML query to an underlying DB query, and the results back to XML, and some basic XML importing and exporting routines. This hasn't affected the underlying operations of SQL Server whatsoever.

  10. XML + XSL(t) client side database. by jelwell · · Score: 3, Interesting
    I actually wrote a client side database recently, where all the processing is done on the client. I use Javascript, XML and XSL(T).

    It requires Netscape 6.(not out yet), IE 6, or Mozilla 0.9.5+ because of it's use of XSL Transform functions.

    You can view the page here.

    Joseph Elwell.

  11. Re:xml is an interchange format, not a storage for by BroadbandBradley · · Score: 3, Informative

    Linuxfromscratch.com has a project that aims to automate the process of building your own linux setup storing configuration files in XML, read the intro page they propose you could go to a website and fill out a survey type form to define your system, which would create a configuration file that could build everything correctly. It sounds to me like a huge undertaking but if distros chimed in on this and contributed the tools and expertise they have in how to install a linux system automagically, Automated Linux From scratch could become a standard tool used by anyone wanting to setup linux on anything. To go one step further and convert my /etc directory to MPXML (My Penguin XML...I made that up) well I don't know if this would be a good thing.

  12. Oracle vs. MySQL performance by Raul+Acevedo · · Score: 3, Insightful

    Comparing Oracle and MySQL performance in the context of XML is silly. It is a well-known fact that MySQL is significantly faster than Oracle, but not because of XML, Java, or other "OO crap". It is simply because MySQL doesn't have transactional support, and probably a host of other non-OO high end RDBMS features.

    I wouldn't be surprised if "OO crap" does indeed slow down Oracle, but I know the JVM for Oracle is completely optional. I can't speak to any XML features in Oracle, I'm not familiar with them.

    --
    In a real emergency, we would have all fled in terror, and you would not have been notified.
  13. Closed minded people sadden me... by Sean+Starkey · · Score: 3, Informative

    It makes me sad to see all of these closed minded people when it comes to XML. They just haven't seen what XML can do and have been turned away from previous work in XML. XML can be used for data storage, and has many advantages.

    XML allows data to be stored with context. For example if you have the data element "CmdrTaco", that doesn't mean much. But with xml, you can store this bit of information with context:

    <SlashDot>
    <Editor>
    <Name>CmdrTaco</Name>
    </Editor>
    </Slashdot>

    Isn't that more informative?

    It is surprising to me that people who like OO don't like XML. OO allows you to have functionality attached to your data. XML allows you to put context (and even functionality) around your data.

    Another big advantage of XML databases is the lack of a schema. If you want to have a dynamic database is the relational world, you are looking at a large schema migration. An XML database allows you to just add the information with no migration at all.

    Advanced storing techniques allows query of the XML database to be just as fast as a relational database. How can that be? The XML is stored in a specialized indexed form that allows for fast retrival.

    Sure, there are applications where it doesn't make sense to use an XML database. Using an XML database to store relational data doesn't make sence, that's what relational databases are for. But if you can think outside the mold, and store your data in a new way, XML databases are for you.

    I might be a little biased in this area, since I work for a XML database company (http://www.neocore.com). I have seen XML in action, and it is more than just a data transport. I hope that I can convince at least one person to look at this advanced technology.

  14. RFC (corrected) by Reality+Master+101 · · Score: 5, Interesting

    Can we please, please, please append the definition of XML to allow "</>" to close whatever the last tag was?

    That simple change would probably cut the size of the average XML file in half.

    (corrected post, please moderate my other one down. I have plenty of Karma to spare...)

    --
    Sometimes it's best to just let stupid people be stupid.
  15. XML is the storage format for some things by aspillai · · Score: 3, Interesting

    I've used XML extensively and in someways agree with people saying XML isn't a storage format. But right now there are lots of applications where XML is the perfect storage format. Example: Consider a order processing company who brokers orders for company to company. One option would be to define a monolthic db schema to take care of what each company would like in their order. Another would be to define a really abstract schema to facilitate handling generic order forms. The problem with the first is, each time XYZ wants something added to an order form, you need to change the schema. With the second, it'll work but you'll need exceptionally discplined and smart programmers to deal with the abstract layer. This doesn't even deal with migration issues.

    The solution is XML. You create a XML Schema and start storing stuff. Some company wants more parameters - no problem, extend the schema. You need to migrate previous XML docs to adhere to the current schema, use XSLT. Or you can add these as optional parameters and every document that exists already will conform to the schema.

    Speed in XML is an issue. But people who think you need to read the entire XML document to process don't know what they're talking about. You can do modular processing. Also, you can do smart indexing to increase speed. And in a production environment, you turn Schema cheking off unless you're getting documents from untrusted sources. Will XML ever be as fast as RDBMS? Probably not. But XML doesn't store relational data. And with current research in XML Query languages, I'm sure XML's speed will be good enough for most applications in the future that deal with fuzzy schemas. (If you need high performance DB, then you have to bite the bullet and use a RDBMS).

    My two cents.

  16. Database storage in XML format is fine, if... by jlowery · · Score: 4, Insightful

    Of course, this is not an easy question to answer, but the right answer involves knowing three things:

    1) Can certain records be considered 'atomic'?
    This is similar to the RDBMS question of whether or not it makes sense to construct a view or not. View definitions represent a common query. If you considering a query as a means of tying together disparate data from many tables into a single, denormalized set of records, the record could just as easily be expressed in some XML format.

    Now, if that record represents some physical or conceptual entity in the data model, it is in fact a set of properties about an object. This is what XML is good at representing. Decomposing that set of object data (record) into normalized relations may not make sense if such 'objects' are frequently requested; but there other considerations...

    2) Ad hoc queries are difficult when data is stored internally in XML, because each XML blob has to be parsed and checked for the query values. If you don't know in advance if the XML structure even has the fields you're looking for, then you must do an exhaustive search. Some have used indexed XPath information to work around this issue. Since we're mentioning indexes...

    3) How do you find the XML blobs you're looking for. We've used an ORDBMS for our XML data, and indexed on the ID or key values (as defined in an XML Schema) for each element stored in the database. This makes looking up element instances easier. It also makes relating them easier, too, if you use IDREF or keyrefs as your foreign keys.

    Now every XML document has a single root element. If you're storing that document in a database, you could choose to store just that one root element instance. More likely, you'll want to decompose the root so that accessing subelements by ID or key in the database will be easier.

    Got to run off now,

    Jeff Lowery

    --
    If you post it, they will read.
  17. Missing the Big Picture by SuperKendall · · Score: 5, Insightful

    I can't believe no-one has posted my standard response to someone who thinks XML is just for "interchange".

    The interesting thing about XML to me is NOT that it solves the interchange problem (though it helps with that). The great thing is that it solves the PARSING problem. No longer do I have to write a parser everytime I have some simple task of reading in something externally.

    What XML does is define for you a standard means of parsing, and by defining the API for parsing and the structure of the documents lets you think about how you want to structure external information, not how you're going to read it in.

    Also, because the API for parsing is now hiding the engine details below, parsers can be specialized depending on what kind of task you have. Parsing thousands of 1k XML documents would seem to demand a different processor altogether from a few multi-GB documents, but you only have to know one parser (Ok, really two - SAX and the DOM interface). You could even have specialized XML processors that did write the stream out in a wierd custom binary format for compactness and read it back in with the normal DOM API so clients wouldn't have to adjust. I'll grant you that there don't seem to be many specialized XML processors - yet.

    I also like the robustness of XML exchanges (here I'm getting more into your main point). If you add or drop attributes from an XML document, clients that read that document are less likley to break (unless of course they relied entirely on the node(s) you have removed!). That is especially true of XSL, where missing nodes of a document simply correspond to missing parts of output (which can also be a useful effect).

    You might think of XSL as a useless language, but I'll be happy to make a counter-prediction that it will grow and thrive. It's simply too useful a transformation tool to do anything else. I know the syntax seems overbearing, but for the kinds of short transformational work it's normally put to that's not much of an issue and you get used to it quickly.

    --
    "There is more worth loving than we have strength to love." - Brian Jay Stanley
  18. Triple stores by macpeep · · Score: 4, Interesting

    At my previous job, I implemented an experimental app that was inspired by RDF (Resource Description Framework) and triple stores.

    In a triple store, you have objects that are defined by a set of properties. The word "triple" comes from the fact that you have triples of objects, properties and property values. For example, you could have a person; John Q, who has an age 37, a phone number 1234 and an employer Foo Ltd. Foo Ltd. in turn has a phone number 5678 and any number of other properties. This forms the following tripples: John Q --age--> 37, John Q --phone number--> 1234. John Q --employer--> Foo Ltd. Foo Ltd --phone number--> 5678.

    When you look at these, you can see that Foo Ltd. is both the employer of John Q (a property value) but also an object in itself that is described by a set of properties. In RDF, the tripples form a graph that describes your data. The graph is typically serialized as XML.

    At first, it would seem that this lends itself very well for relational databases. A row in a table would be the object to be described and columns are the properties. The intersection is the value. However, the problem - and strength of RDF - is that you can have any number of properties for an object. Basically, you could have any number of columns and sometimes, the property value is not just a value - it can be a database row in itself or even a set of rows.. or a set of values.

    The app I wrote mapped arbitrary RDF files to relational databases and back as well as provided an API to perform queries on the data. The result of the queries were RDF graphs in themselves.

    While this was quite cool, it turned out to be quite difficult to turn the query result graphs into meaningful stuff in a user interface. Also, queries on the RDF graphs could turn out to be extremely complex SQL queries... Most of these problems were eventually solved but the code wasn't used directly for any real world app, except heavily modified as a metadata database for a web publishing system.

  19. Web-Apps need XML by webmaven · · Score: 5, Informative

    Separation of content, logic, and presentation is very difficult to do in current web-app developments environments.

    The breakdown is not on the logic/content side of the equation, or the presentation/content side, but mainly in the presentation/logic arena.

    Imagine an HTML designer who has mocked up a page for a web-app, and hands it off to the dev team for them to add in the neccessary laogic to dynamically include the user-name, current balance, contents of the shopping cart, etc. Depending on the exact paragdigm taht their tools use, they will either:

    a) Chop up the page and include various fragments in the programs that are designed to emit said fragments at the opportune times to be assembled into a text stream eventually recived by a browser

    or b) Various bits of logic get stuck into the page in oder to parameterize and/or conditionalize it, using either some sort of speacial tagging format or actual inlined blocks of code.

    Whichever approach the dev team's tools use, the result is the same: the designer can no longer change the altered page.

    Even in case b), which maintains some semblance of a coherent 'page', the designer cannot load the page-with-logic into their favorite visual editor and see anything resembling the actual page. They certainly can't edit it to change the look-and-feel without breaking the carefully constructed logic.

    The end result is that the designer has no recourse other than to take their page design, change it, and hand it over to the dev-team again for them to re-include (in some cases re-code) all of their logic.

    This is obviously a very wasteful approach.

    Amazingly, there actually is a solution to this problem. It's called Template Attribute Language (TAL), and it solves the problem by adding programming directives to the page via XHTML attributes on the existing tags. The language is deliberately designed to only be suitable for presentation logic, relegating business logic code to some other objects, where the designer can't see them. This helps enforce the appropriate distinction between presentation logic and business logic that most current development environments ignore, thus encouraging their admixture.

    Currently, TAL (and the related specifications TALES and METAL) are only implemented in one environment, but the language has been deliberately designed to be as platform agnostic as possible. Other implementations of the specification are possible, and even desireable.

    Articles:

    Zope Page Templates: Getting Started

    Zope Page Templates: Advanced Usage

    Using Zope with Amaya, Dreamweaver, and other WYSIWYG Tools

    --
    The real Webmaven is user ID 27463. I don't rate an imposter, because my ID is such a lame-ass high number.
    1. Re:Web-Apps need XML by dolanh · · Score: 3, Informative

      Not to mention Apache Cocoon.

  20. A markup weenie rebuts. by rodentia · · Score: 4, Interesting

    A large number of otherwise intelligent posters would seem to have been hit by the runaway XML hype train. Examples culled from various posts:

    ...[not a] major advance in computer science.

    ...[bogus] contribution to programming language design (re: XSL)

    ...[transfer data between businesses,] which is the problem XML aims to solve.

    But these are critiques directed at the hype machine, not the specification. This is really distressing me. The machine is so efficient that there are API's for XML (which shall remain nameless) being written and optimized for message passing which cannot handle mixed content as a matter of design. As though it were somehow so useful in this area that a section of the spec should be tossed to make it efficient. As though there weren't already gallons of ink being spilled on EDI, etc.

    XML was not designed to replace S-expressions, to facilitate cross-platform communications, revolutionize EDI or DBMs, to theorize about language design, yada, yada. XML is just that, an Extensible bloody Markup Language, a document tagging scheme. In this regard it is a tremendous advance. It is 80% less suck, by volume, than what went before. If you think your XML parser is bloated, have a look at any SGML parser. Part of what gets stripped out is tag minimization, the absence of which another poster complained about.

    Hey, its text and not binary because I need to write it and read it. Yes, Virginia, I've got 400 users tagging XML in flat-file editors. They complained about the loss of tag-minimization, too. But my svelte little Xerces needs a hand to stay so lean.

    The goal is to get structural and semantic information into my documents. (Yes, it's data, but a special kind of data called a document. You can call the message your passing a document, and use XML to format it, but there is some overhead the hype machine may not have emphasised in their rush to market.) I also strive to eliminate formatting or presentation instructions from the document (or hide them in PIs) to facilitate multi-target outputs. This lets my typesetters typeset and my data-entry people enter data.

    XML is designed to bring something of this model to the web. HTML is too presentation oriented. SGML is too bulky. That's what it do, babe. I take a single source file from somewhere on the filesystem, incorporate pieces from elsewhere (entity resolution, DB queries, etc.), turn it into one of five possible outputs. I use two different pagination engines with different proprietary formatting macros, XSL(T|FO), or a trap door on the bottom to dump pretty-printed ASCII. Its a publishing tool.

    --
    illegitimii non ingravare
  21. performance by csbruce · · Score: 3, Funny

    1-GHz Pentium-III + Java + XSLT == 1-MHz 6502.