Slashdot Mirror


Choosing the Right XML Database?

Saqib Ali asks: "Later this year, I will be starting a project, that will involve storing XML data in a database. I understand why a Relational DB is not a good choice. I also understand why a pure OODB like Objectivity is not a good option either. So I started doing some research into various XML DBs like Apache Xindice, exist-db, Oracle 9i, and others, but I am unable to decide which XML DB to use. What criteria should one use when evaluating whether an XML DB will be a good option for a particular application? I would prefer using an Open Source solution. Initially my application wil involve storing reports in an XML repository, for retrieval via XPath, but the reports will get larger with time. Any suggestions on how to decide which database to use?"

65 comments

  1. your xml by Anonymous Coward · · Score: 5, Funny

    <post>
    first
    </post>
    <!-- take that beyotches -->

    1. Re:your xml by Mas3 · · Score: 1

      What is the major advantage of an XML-Database ??
      --
      Stefan

      DevCounter - An open, free & independent developer pool
      created to help developers find other developers, help, testers and new project members.

    2. Re:your xml by bsartist · · Score: 1

      What is the major advantage of an XML-Database?

      Optimized, open, standards-based, buzzword compliance.

      --
      Lost: Sig, white with black letters. No collar. Reward if found!
    3. Re:your xml by Anonymous Coward · · Score: 0

      Mr BIG wanked over XML, you don't get the sack.

    4. Re:your xml by SnapShot · · Score: 1

      Let's rephrase the question. What requests does one make of an XML database that are difficult or impossible to make of a RDBMS?

      Are you trying to determine Elements that contain a given Attribute value or name? Are you trying to return Text nodes that contain a particular search string? Or, are you simply storing small Blobs of XML data organized through some higher level data?

      As a starting point for discussion (and having not researched XML databases) here's is a simple table structure:

      [ElementMapper]
      ParentID
      ChildID

      [Element]
      ID
      Tag

      [AttributeMapper]
      ParentE lemID
      AttributeID

      [Attribute]
      ID
      Name
      Value

      [Text]
      ID
      ParentID
      Value

      So is an XML database is optimized to convert things like:

      <address type="business">
      <street>1234 Main Street</street>
      </address>

      into:

      [Element]
      ID Tag
      1 address
      2 street

      [ElementMapping]
      ParentID ChildID
      1 2

      [Attribute]
      ID Name Value
      3 type business

      [AttributeMapper]
      ParentElemID AttributeID
      1 3

      [Text]
      ID ParentID Value
      4 2 1234 Main Street

      I'm no SQL expert, but I suppose as this gets complicated the performance of the SQL queries required to answer questions like (in pseudo-code) SELECT Address WHERE Street Contains "Main Street" becomes difficult. So, is that what XML databases are supposed to solve? Are they simply RDBMS that have been optimized to handle these types of queries?

      --
      Waltz, nymph, for quick jigs vex Bud.
    5. Re:your xml by Anonymous Coward · · Score: 0

      ummm... you can use it for XML.. duh

    6. Re:your xml by Tailhook · · Score: 1

      ...but I suppose as this gets complicated the performance of the SQL queries required to answer questions like (in pseudo-code) SELECT Address WHERE Street Contains "Main Street" becomes difficult...

      It shouldn't have to. Some RDBMS systems allow you to optimize locality of rows from multiple tables based on some key. Oracle calls this a "table cluster". Rows from multiple tables are stored together on disk when their keys match across tables. They behave like normal tables in most other respects. Query a particular row from any cluster member and the engine will pick up the related rows automatically in one IO operation.

      This is a very efficient way to create table-like structures at runtime. You create an "attribute" table for each significant data-type and cluster all the tables together, then you just create views to make the separate attribute rows appear as a single row in a query.

      I can't imagine why it wouldn't apply to XML. I'll bet it's done this way in real XML database implementations.

      --
      Maw! Fire up the karma burner!
    7. Re:your xml by wideBlueSkies · · Score: 1

      SO it's kind of like a virtual table on disk. Or a pre-joined join.

      What's the advantage? You're spending runtime (disk i/o, data varification )joining every row together as you load the tables. Not knowing if you're going to need every row joined later on.

      As opposed to just doing a join for what you need later when you're pulling data out of the tables.

      Maybe I just don't understand.....

      --
      Huh?
  2. Berkley DB XML also an option by kzeddy · · Score: 4, Informative

    Berkley DB XML is a new product. i have not tested it though... so this is not a reccomendation.

    1. Re:Berkley DB XML also an option by Anonymous Coward · · Score: 5, Informative

      Yup I was going to mention that one. I've tested it and it works great. Basically regular Berkeley DB which rocks the house already, with an XML-aware layer on top.

      If you have lots of small XML documents this is definitely the best choice. Dunno about big reports. Berkeley scales to any size, but maybe he should split his big documents into "metadata.xml" and "report.xml".. then store and index metadata.xml in the database and put report.xml on disk. I believe there is a standard for XML Includes now, so he could have the metadata.xml actually point to the report.

      Lots of ideas. Check out Berkeley DB though, it beats Xindice (especially since it's not written Java, which pretty much ruled it out for my purposes.)

    2. Re:Berkley DB XML also an option by stonebeat.org · · Score: 2, Informative

      looks good. but doesnt have a Java API. My app is going to use Apache Cocoon which runs on Tomcat. So I would prefer a DB that has Java API

    3. Re:Berkley DB XML also an option by Anonymous Coward · · Score: 5, Informative

      It does have a java API! Did you check it out? Comes with C/C++, Java, Perl, Python, and TCL support out of the box. It's just not *written* in Java which makes it more flexible. since it's still "prerelease" you have to sign up to get the software but that's not a big deal.

    4. Re:Berkley DB XML also an option by stonebeat.org · · Score: 1

      then i should try it out. will do it tonite.

    5. Re:Berkley DB XML also an option by beowulf_26 · · Score: 2, Informative

      For what it's worth, at my workplace at the moment, we're doing the exact same thing, but already have a ton of data that we need to get ingested. The pointy haired boss hired These Guys who know there stuff pretty well, and prefer to use Xindice. The only problem is that it's well.. quite slow.

      Other commercial alternatives are Ipedo or Tamino if your development house has the cash. Education discounts of 99% are availible I believe from Tamino, but the Ipedo people aren't as forthcoming with what they're willing to deal on.

      Sadly, there just isn't a hands down winner in this market, but if you're looking for something to go with Cocoon, Xindice looks to be the best OSS solution for the moment.

      --

      --I hate big sigs.
    6. Re:Berkley DB XML also an option by SnapShot · · Score: 1
      The intro page to Xindice has the following item:

      The benefit of a native solution is that you don't have to worry about mapping your XML to some other data structure. You just insert the data as XML and retrieve it as XML. You also gain a lot of flexibility through the semi-structured nature of XML and the schema independent model used by Xindice. This is especially valuable when you have very complex XML structures that would be difficult or impossible to map to a more structured database.


      Is this ever really true? Is there an XML structure that CAN NOT be implemented in a RDBMS? Or are there simply difficult and/or inefficent ways of storing XML data in an RDBMS?
      --
      Waltz, nymph, for quick jigs vex Bud.
    7. Re:Berkley DB XML also an option by beowulf_26 · · Score: 1

      Manually, by hand, figuring out a way to store XML relationaly can be a nightmare, especially as schema's grow more complex. XML databases solve this problem

      Impossible? Not at all, ever. There's always a way to represent it in an RDBMS, but it usually makes it quite hairy to retrieve in a meaningful fasion. It may not be effecient, but it's always possible.

      Basically, that whole paragraph describes ALL XML databases (although, many do it better), not just Xindice. The benifit of native XML databases is that you don't have to figure out a way to represent the XML relationaly, so it can be searched through more easily.

      Most of them just parse the XML and use their magic to index the files relationaly or by some other means.

      --

      --I hate big sigs.
  3. why an xml database? by jeffdill · · Score: 5, Insightful

    To pick the right database, you need to analyze the structure of your data and the operations you intend to perform on it. XML is a useful general format for interchange of serialized data, but just because you have some data represented in XML doesn't mean you should store it in XML. What is the structure of the data? What will you do with it? Why is a relational database or a object database a bad choice for your application?

    1. Re:why an xml database? by stonebeat.org · · Score: 1

      The XML that I will be storing is a tree like structure, with lotsof children. so mapping that to relational database was not easy.

      the other option for me was to use a pure OODB like objctivity, which i have used in other project. I could still use it for this projekt.

      But I thought it would be better if I use some engine that support XPath and XQuery.

    2. Re:why an xml database? by Anonymous Coward · · Score: 5, Informative

      Even if the data you're storing is XML formatted, it might be better to map certain tags to relational columns and just store the XML doc itself as part of a normal relational table. The searches are guaranteed to be more efficient, especially with decent indexing. This won't work if you really need to do searches involving parent/child/sibling relationships between nodes.

      At the minimum make sure there's good XQuery support. XPath just won't cut it if you need to scale.

      DB2 has decent XML support currently, and great XML support coming along the pipe at some point afaik. My experiences with it have been very positive.

    3. Re:why an xml database? by rycamor · · Score: 4, Informative

      For more opinions to make you think:

      http://www.dbazine.com/pascal9.html
      http://www.dbazine.com/pascal8.html

      And here, C.J. Date argues that a truly relational DBMS should be able to support an XML data type:

      http://www.dbdebunk.com/lauri1.htm

      (PostgreSQL is an example of a DBMS with extensible types)

    4. Re:why an xml database? by zaqattack911 · · Score: 1

      How does that work? My XML knoledge goes no further than simply representing a datastructure/object/whatever in a TAG like format.

      What use is it to store the XML in a table? wouldn't that just be storing a string?

      please help :)

      --moi

    5. Re:why an xml database? by rycamor · · Score: 2, Informative

      >wouldn't that just be storing a string?

      Oh no. I mean, yes, you could just store XML as a string in a BLOB column, but that's no better than just storing as a file.

      A custom XML datatype would not treat the XML as a blob, but actually parse the XML upon input into the table, storing an internal representation (probably as an associative array) which would allow custom operators to traverse the tree, visit nodes, etc...

      But, it would also allow you to perform relational queries and place integrity constraints on your XML documents.

      To explain further, I will use a specific example: PostgreSQL allows you to create custom datatypes, even importing C functions to handle the input and output of these types. Thus, theoretically, you should be able to create a custom datatype called "xmldoc", and use code from a standard C library which handles XML, such as libxml, or expat, which will parse the XML string into an internal data structure, and vice-versa upon output.

      (I must stress, this is theoretical. I haven't had the time or need for such a thing, but according to the documentation, it should be possible)

  4. Take the easyst way by angel'o'sphere · · Score: 5, Insightful

    Thre are the things I would question first:

    a) does it use XQuerry/XPath to access the DB or an other standard way or is it proprietary?
    b) does it support your programming language of choice?
    c) Where do you get fastest a running prototype?

    C) is the most important point IMHO. If you have chosen the right DB you only know AFTER you have implemented your application. (( well, you can try to find test cases and try to predict if the DB is the right one by trying to scale tests up)) Note: I used the word try several times, because such an approach is only trial and error.

    Ok, if you can just start coding, that was point c), and a standard like a) is supported, then you should be easyly able to hide the actual DB behind an suiting interface.

    b) is only a matter of your flexibility ....

    I would guess the appliacation has more constraints which will likely limit you or challange you to overcome than the DB used behind it.

    I once read an article in a german magazine, they have put a DOM writer and a DOM reader as stored procedures into an SQL data base.

    And all XML was stored in a few tables, element, attribute and such ... it was very fast ... and well, you programmed your XML manipulation by directly manipulation "virtual" DOM trees inside of the DB. In SQL and in a relational DB, of course.

    So much to "relational" wont fit your needs :-)

    Regards,
    angel'o'sphere

    P.S. You gave not many hints why you need an XML database. A XML database makes only sense if your natural document format is ... XML. It makes no sense when you think you need to use XML because of hype or something ....

    --
    Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    1. Re:Take the easyst way by stonebeat.org · · Score: 1

      thanks for the reply :) Actually the data that will be stored is XML, and is very well suited for XML. Plus the application that will retrieve it later, will be expecting a Well-formed and valid XML as the input. Apache's Xindice also support storing everything in Relational DB.

    2. Re:Take the easyst way by NineNine · · Score: 1

      Actually, I'd approach it differently... What do you need to do? Saying an "XML" database is already pretty limiting. Hell, a database might not even be the right answer. In some cases, a few flat files will do the trick, or a pipe, or other things.

    3. Re:Take the easyst way by Lechter · · Score: 1

      If you need to hit the DB from some type of programming environment I'd recommend using a DB with an implementation of the XML:DB API. I've been looking at Xindice, and Software AG's Tamino, both of which support the Java XML:DB API, which actually seems rather nice.

      As for the speed, I can't comment from personal experience, but according to the Software AG folks it's quite fast even for their customers who are indexing terabytes of data. Of course, that's pr bunny speak so it's to be taken with a grain of salt.

      I'm not sure exactly how native XML DB's work, but from my research (e.g.)it seems that implementations are based on hierarchical data bases: e.g. Adabas -> Tamino.

      --
      credo quia absurdum
  5. someone please explain... by tongue · · Score: 3, Interesting

    Frankly, I don't think i understand why relational is considered a poor choice for this. would someone please explain this? (this is not a troll, i really don't know) is it just the work involved in storing an object in a set of tables?

    1. Re:someone please explain... by stonebeat.org · · Score: 1

      tree like structure of XML vs tabular format of RDBMS.
      An ORDBMS might work in some situations.

    2. Re:someone please explain... by Anonymous Coward · · Score: 5, Insightful

      Well, if you're just sticking the entire XML document into the table like a blob of text, then yeah, there's no problem. But then you can't really do anything with it other than retrieve it (i.e., you can't run a query on parts of it).

      But if you want the database to be aware of the *structure* of the data, you have to decompose the data into pieces, stick them in various tables, keep the integrity between the tables, and, oh yeah, write some code to convert the data back into XML when you want to get the whole document.

      For instance if you are storing an XML document that's made with one-or-more Chapters, Paragraphs, or Sentences and each Chapter has one-or-more Paragraphs, and each Paragraph can contain Sentences .. etc.. you have a complex many-to-one structure you have to store in multiple tables .. how would you do it? Well, you'd make a document table, a chapter table, a sentence table, and link them all together with unique id's .. etc.. you get the point I hope, that the XML doc's rich structure has to be decomposed into rows and columns.

      XML databases take care of this automatically and also can *index* the various parts of the document so that queries (XPath or otherwise) run faster (i.e., give me the documents that contain sentences beginning with "Hello").

    3. Re:someone please explain... by Tablizer · · Score: 3, Insightful

      XML databases take care of this automatically

      Take care of what? Parsing? That is a parser, not a database. How about a specific example.

      Relational is pretty flexible if you just know how to use it. (I agree that existing commercial relational systems could use some adjustments, but lets not throw the Cray out with the bathwater.)

      Too much of this XML database stuff sounds like a return to the "navigational" databases of the 1960's. Do we really want that? Dr. Codd rescued us from those. Now you want to be un-rescued?

    4. Re:someone please explain... by MeanMF · · Score: 1

      tree like structure of XML vs tabular format of RDBMS. An ORDBMS might work in some situations.

      Trees are easy to implement in an RDBMS. Just think of it as a series of one-to-many relationships. Just because your data is in an XML format doesn't mean you need to store it that way. XML is just another file format, and it's a horribly inefficient one for data storage and retrieval. It's the data that you really need to worry about, not the XML code wrapped around it. Generating XML on the fly from a relational database gives you all sorts of flexibility.

    5. Re:someone please explain... by battjt · · Score: 1

      <oderlist>
      <order id="1" customer="Aunt Bea">
      <apple type="golden" color="yellow" />
      <orange />
      </order>
      <order id="2" customer="Bob">
      <car type="pinto" color="yellow" />
      </order>
      </orderlist>

      What sort of relational schema do you use to save the above data? How do I query for orders with 2 items? orders with yellow items? yellow apples? How about the items that a customer who bought a chair and a yellow apple in possibly different trips has bought? XPath and XQuery do these sorts of things. The down side is most implementations don't use indexes, so they are slow on huge datasets.

      The beauty of XML is the flexibility. It is just like using an OODB without having to first define the classes. That's also why it doesn't efficiently fit in a relational DB.

      To get efficiency, the schema needs to be well defined, then you might as well stick it into a relational database. Sometimes, it is best to to define the schema later in the game, after you figure out what you are doing.

      Joe

      --
      Joe Batt Solid Design
    6. Re:someone please explain... by alienmole · · Score: 2, Interesting
      Actually, I've implemented relational databases with schemas exactly like the one in your example. Of course, you'd have Customer, Order, and OrderItem tables. The Product table would be generic and primarily contain a unique ID for each product, whether it's a car, apple, orange, whatever. This table might also have some other generic fields like Description, Price, etc.

      To handle the specific attributes of each product, one way to do it is to have a separate table for each product type that has unique attributes, and use a type selector field in the Product table. However, this is somewhat non-relational and may not scale well to large numbers of products. Another way to do it, which is more flexible, is to have a generic ProductAttribute table with fields like (ProductID, AttributeID, Value) - details would vary depending on what you're trying to achieve, e.g. whether you want a distinct ProductID for each unique set of attributes, or want to select attributes only per order (if you're custom-building based on orders).

      All the queries you mention are perfectly doable. Orders with 2 items? "select * from OrderItems group by OrderID having count(OrderID)=2". Orders with yellow items? "select ProductID from ProductAttributes where AttributeID=COLOR_ATTRIBUTE and Value='yellow'" would give a list of all yellow products. You could extend this request with joins into the Order table, or whatever it is you need. "The items that a customer who bought a chair and a yellow apple in possibly different trips has bought" : pretty simple, determine the product ids as above, join to the Orders table, and filter on the customer you're interested using "where CustomerID=$custid".

      Sometimes, it is best to to define the schema later in the game, after you figure out what you are doing.

      More likely, this is the road to disaster. I've seen companies that have painted themselves into some seriously small corners by doing this, and then spent millions on maintaining a system that just doesn't do what they need. Careful and detailed upfront analysis can save a huge amount of time and money. What you're really saying is that XML can be a substitute for upfront design. Maybe in small systems, but otherwise, that's just irresponsible.

      I use XML plenty - as a transmission format for data in web apps, as a metadata representation format, for small domain-specific languages, and for document-oriented applications. But thinking of XML as a way to avoid having to actually figure out what you're doing - I guess it'll lead to job security for someone in the future, when all that has to be thrown out and replaced. Probably won't be fun jobs though.

    7. Re:someone please explain... by Tablizer · · Score: 1

      What sort of relational schema do you use to save the above data? How do I query for orders with 2 items?

      I will leave the SQL for such a query as a reader excercise because that kind of query tends to vary per dialect. It will probably involve a GROUP BY and a COUNT operation, or perhaps a correlated subquery. (SQL is not the ideal relational language IMO.)

      Here is one schema approach. Note that it may vary per business.

      Table: Customers
      ----------------
      custID
      nameMI
      lastN ame
      etc...

      Table: Products
      ---------------
      prodID
      prodDescript
      e tc...

      Table: ProductVariants
      ------------
      variantID
      prodRef (f.k. to Products)
      variantDescr
      etc...

      Table: Orders
      -------------
      orderID
      custRef
      orderDate
      fullFilledDate
      etc...

      Table: OrderItems
      --------------
      itemID
      orderRef
      prod uctRef
      variantRef
      customerRef
      quantity
      etc...

      Sometimes, it is best to to define the schema later in the game, after you figure out what you are doing.

      In my observation, if you don't have enough info to create a starting schema, then you need to do some more analysis. If you organically "grow" your app with a bunch of local names, then you will have a lot of work to later clean it up so that you don't have similar but different names, for one. I am not saying that the first version of the schema will be perfect, but it is better than waiting until the end.

    8. Re:someone please explain... by battjt · · Score: 2, Insightful

      You can store anything in a SQL database, but you do have to take the time to design it and migrate the data as the schema changes.

      Spending lots of time and money designing a system that the customer can not imagine is a waste of money, because you will have to change the design as the business units focus on what they want, normally after they see your initial results.

      Sometimes you have to use duct tape.

      I have one app in production that uses XML files as data stores. There are about 24 users. I also have apps in production with 1000s of users that use 20GB+ SQL databases. I use apps that utilize lisp dumps for fast read only datastores (in addition to emacs). There is a place for everything.

      Joe

      --
      Joe Batt Solid Design
    9. Re:someone please explain... by battjt · · Score: 2, Insightful

      In my observation, if you don't have enough info to create a starting schema, then you need to do some more analysis.

      This is exactly the problem. How do you get any analysis if the customer doesn't know what to ask for. Applictions evolve. The flexibility offered by an unstructed data store like XML lets you eveolve the data model like the rest of the application.

      You gloss over the hard part with "etc..." Attributes or even structured child tags can not be anticipated and built into the schema or else by the time you do then you've just built an XML database.

      --
      Joe Batt Solid Design
    10. Re:someone please explain... by Tablizer · · Score: 1

      This is exactly the problem. How do you get any analysis if the customer doesn't know what to ask for.

      You have to "probe" them. Study the manual process. Look at their manual reports. Look at other systems for similar companies. Make some sample screens and reports for the client to jog their mind. Ask them questions like, "Is there only one address per employee, or could they have multiple addresses/contacts?"

      XML is NOT going to make up for a lack of understanding of what is needed. You can make an organic attribute system using relational also. It is just not the best course of action in most cases.

      You gloss over the hard part with "etc..."

      What is an example that you think is glossed over, but hard? I believe I covered the same fields as your example.

    11. Re:someone please explain... by dpt · · Score: 1

      Too much of this XML database stuff sounds like a return to the "navigational" databases of the 1960's. Do we really want that? Dr. Codd rescued us from those. Now you want to be un-rescued?

      I think the advantage would be its "hierarchical", not "navigational", nature. And this is the problem with relational databases for the kind of problems I encounter. Ever tried to store complex *inherently* hierarchical data in them? It's just the wrong idiom, and shouldn't even be attempted. Of course, some clown always comes along and says "we're going to use Oracle for persistance!". Sigh. And J2EE's "entity bean" actually codifies this madness!

      Of course, they (relational databases) are well suited to certain applications, and I wouldn't dream of using anything else for, say, a billing system. But like all 4GLs, they've sacrificed flexibility for power and ease-of-use in their domain. But that's all right, I imagine it was a conscious design choice.

    12. Re:someone please explain... by battjt · · Score: 1

      Apparently we wont come to an agreement on software development. My experience indicates that no amount of research will best quickly evolving software (XP, prototyping, what ever today's buzz words are). XML databases are better suited to support evolving software than relational databases, due to the lack of or the flexibility of the schema. Relational database can be better optimized than XML databases. OO databases (or hierarchical) can be best optimized, but in my opinion are rarely needed.

      You can make an organic attribute system using relational also. It is just not the best course of action in most cases.

      Right. Then you have the funtionality of an XML database, which you sometimes need. Just buy the XML database.

      What is an example that you think is glossed over, but hard?
      Changing schemas or designing 'an organic attribute system'. (As a side note, I've seen a shrink wrapped telephone network management system that used Oracle to store the configuration... in one table (name char(24), value char(24)). The company is now out of business, but over 20 years did quite well.)

      BYW, nice website. You are wrong :-), but nice site. (My back ground is large company IT projects.)

      --
      Joe Batt Solid Design
    13. Re:someone please explain... by angel'o'sphere · · Score: 0, Flamebait

      Why does Tabelizer allways get "Insightfulls" for his ranting?

      XML data is in general not well hold in RDBMSs, Tabelizer. There are exceptions of course as I pointed out in my post above.

      A RDBMS returns on an SQL querry, what? A textual table starting with a header of column names, followed by rows of text.

      That is not XML, is it?

      Further more: to querry a full document, for regeneration XML by the querrying application, you need to make several querries one after the other based on the returned data from earlyer querries.

      That means you need to do application side programming ... iterating over result sets and sending "sub querries" to the DB to get nested elements.

      A XML database manges a set of documents, with differnt DTDs. Often in practice those DTDs allow that documents of different type may contain elemts of the same type. Imagine an element type [address] which might be contained inside of a document of type [delivery] and in a document of type [order].

      I now can post a querry to an XML database asking for all documents containing a specific element of type [address]. And I get back: well formed XML documents. Not a set of data from SQL tables.

      Further, the querry language is XMLish, so you can manipulate the querries, the stored data, the returned data etc. with XSLT.

      If you dont see that this is a difference for the programmer of the application, then please SHUT UP.

      I take it that you are an expert in relational database management (however I saw one post of you, where a true expert disrupped all your point about how to use an RDBMS efficiently and correct ) ... but you have no clue about XML and no clue about OO. So why can't you stay out of discussions where you have no clue about?

      To prove that moderators have no clue either and give you even "Insightfuolls" for nonsense, like your post above?

      angel'o'sphere

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    14. Re:someone please explain... by Tablizer · · Score: 1

      XML databases are better suited to support evolving software than relational databases, due to the lack of or the flexibility of the schema.

      Adding a new column is a snap on some systems. What is the complaint? I realize that some shops have rather static rules WRT schemas, but that is a political issue, not a technical one.

      BYW, nice website. You are wrong :-), but nice site.

      If wrong, then show it.

      (My back ground is large company IT projects.)

      So? A wide variety of techniques are used on large projects (however "large" is defined, since the boundaries are usually fuzzy in the shops I observe. Applications should usually be partitioned based on the targetted users, not the size of the database they feed off of nor the company size. Big EXE's are often a sign of bad design, not "big project skill" IMO.)

    15. Re:someone please explain... by Tablizer · · Score: 1

      I think the advantage would be its "hierarchical", not "navigational", nature. And this is the problem with relational databases for the kind of problems I encounter. Ever tried to store complex *inherently* hierarchical data in them?

      If you look at the needs of most "complex hierarchical structures", it often turns out that trees are the wrong "structure" to begin with. Trees are easy for managers and users to grok, but they simply don't reflect the complexities of the real world relationships and changes very well IMO. Thus, I tend to use them for "newbie facades", but generally not underlying internal structures, at least not the primary ones.

      We would probably have to explore a specific case study to settle this. These debates often get into rather deep philosophical issues of taxonomies and the nature of things that would give even Plato headaches. If you wish to take this elsewhere as to not bloat up slashdot, I would be happy to.

    16. Re:someone please explain... by Anonymous Coward · · Score: 0

      Why does Tabelizer allways get "Insightfulls" for his ranting?

      Usually I find that I get slapped with "troll" ratings after a while. There is a moderator out there who does not like my philosophies. (Whatever the hell "troll" really means. Pest? Wrong? Obsessed?)

      Further more: to querry a full document

      Most documents are not in XML, at least not in the shops I see. However, I will agree that document indexing is *not* one of the strong-points of RDBMS, at least in its current state.

      I well-rounded document indexing system would probably deal with more than just XML. DOC and Excel formats are not going away anytime soon, for example. Thus, we would need a "document database" or "document management system". XML is too narrowly used so far except for special niches.

      however I saw one post of you, where a true expert disrupped all your point about how to use an RDBMS efficiently and correct

      Gee, I missed it. Care to link? That would be a more professional way to attack my viewpoint. The reader has no way to verify your claims against me. You keep talking about all these alleged slippages of mine in the past, be never produce anything verifiable.

    17. Re:someone please explain... by dpt · · Score: 2, Insightful

      If you look at the needs of most "complex hierarchical structures", it often turns out that trees are the wrong "structure" to begin with

      What about cases where entities can contain instances of *themselves*? Or where the depth and width of the nesting is not necessarily known up front?

      You end up creating these artificial "id" fields, and in so doing build a "tree" on top of the relational database, which is a very silly thing to do.

      And what about cases where ordering of contained elements is important?

      We would probably have to explore a specific case study to settle this.

      I don't think it can be "settled". Apples and oranges, and all that. Sure, you can *force* a relational database to represent anything, it's just often like putting a round peg in a square hole, particularly for engineering, scientific, and mathematical problem domains.

      If you wish to take this elsewhere as to not bloat up slashdot, I would be happy to

      Take it where? I suggest your journal.

      The worst case I have had was one in which ordering was important. Relational databases just can't handle that very well, being based on set theory IIRC. Sure, you can add a field containing an ordinal to represent the "order", but that breaks badly if you need to insert or remove items.

    18. Re:someone please explain... by Tablizer · · Score: 1

      Take it where? I suggest your journal.

      Sure! I have yet to use it. Now is as good a time as any to try it I guess.

      particularly for engineering, scientific, and mathematical problem domains.

      I deal most in the custom biz app domain. Maybe math is different. I never said relational was always the best solution. But, it is often bashed or passed up for the wrong reasons IMO.

    19. Re:someone please explain... by dpt · · Score: 1

      Sure! I have yet to use it. Now is as good a time as any to try it I guess

      Well, create and entry and we'll start slugging it out :)

      I'd certainly be interested to see if ordered data can be represented easily in a relational model. My suspicion is that it can't, and since I do a lot of "modelling" and "infrastructure software" (that needs persistifiable state to ensure QOS), ordered whole-part relationships come up a lot. And scoping. And nesting.

      I deal most in the custom biz app domain

      That's where I think the relational model works best, as ordering and containment don't come up a whole lot and everything's usually got a physical counterpart and a natural unique id.

      Maybe math is different

      Well, I just threw maths in there as logically, anything not malleable by set theory is just not going to work very well. I think.

      But engineering (say, network management, fault tolerant reliable messaging) and scientific (eg. DNA processing) problems also often fall outside of what can be elegantly represented "relationally". In my experience.

      Not that I'm saying XML databases can do better, of course. I'm just interested in different approaches.

      I never said relational was always the best solution. But, it is often bashed or passed up for the wrong reasons IMO

      I completely agree. But it's also used where inappropriate, too.

      Or for the worst of both object and relational worlds, look at J2EE entity beans. All the work of relational, and none of the advantages of powerful SQL queries and sensible and normalized table design :(

  6. two articles on the subject by DevilM · · Score: 2, Interesting

    http://builder.com.com/article.jhtml?id=u003200303 06gcn01.htm

    http://www.devx.com/xml/article/9796

  7. Transformation views by Opiuman · · Score: 1

    Does anyone know if any of the above can maintain XSLT transformations of the data as views? Much like you can create SQL views etc? That would be a usefull feature.

  8. Don't count out object databases by mattc58 · · Score: 3, Informative

    It's interesting that you bring this up.

    I just finished writing an article for an online magazine on object databases and .NET. You might want to look into Matisse. It's got bindings for all the popular languages, it's an object database, and it's got SQL interfaces. Nice.

    And I'll point everybody to my article when it's published.

  9. Tamino by Sam+Lowry · · Score: 1

    If you have a lot of money, try Tamino.

  10. logical versus physical by Frans+Faase · · Score: 2, Interesting

    It seems that nowadays most people have a great problem distinguishing between the logical and the physical representation/storage of data. (Personally, I think that XML sucks from a logical point of view, because its semantics are rather weak and limited.) What we lack is tools for mapping logical representations to physical representations. I think that the main reason why we do not have such tools is that from a marketing perspective they would be very undesirable. (No serious commercial company likes to adhere to an open standard, as that would make it very easy for a customer to switch.)

    1. Re:logical versus physical by stonebeat.org · · Score: 1

      Is there a IEEE standard for mapping logical representation to physical representation?

  11. An XML Database by alixnet · · Score: 1

    From the ground up, Object Store was built as a purely XML databse.

    1. Re:An XML Database by SamDrake · · Score: 1

      Blatantly false. ObjectStore has been around since WAY before XML was.

  12. Re:why an xml database? -- There are many reasons by tizzyD · · Score: 1

    Take the simple instance of a BOM relationship.

    For those not sure what a BOM is, it stands for bill of materials. In those relationships, you have a part. It is made up of other parts. Each of those parts is made up of parts, etc. etc. The end result of large complex parts is a non-determinant SQL join. Say you need to find how many screws you need for a car. It's a nasty issue for relationals. XML systems, OTOH, handle it beautifully. XPath would do that query simply, pulling out a single part throughout wherever it's used. Bingo, you have the part count.

    That type of indeterminantly nested relationship, it brings relational dbs to their knees -- all of them.

    I teach this subject to MsC and PhD students at a university, and I tell them not to buy into the hype. Use the best DB type for your data. Relationals are one type, not the only type.

    --
    ...tizzyd
  13. Re:why an xml database? -- There are many reasons by HunkyBrewster · · Score: 2, Insightful
    How is this at all a problem for the relational model?

    The relational model is a logical model and I challenge you to find any example of data that cannot be represented quite easily in the relational model. In your example, you have traded any notion of data integrity for what you assume will be faster data access. In fact, since the relational model makes no recommendations on how data is physically stored, this is not necessarily the case.

    How would XPath enforce your rules on how parts can relate to other parts? Why don't you just try flat files and grep?

    XML is a perfectly acceptable means of data representation. It does not however by any means provide a formal, coherent theory of data management.

    I really hope you were kidding about teaching anyone anything. You have a lot of learning to do.

  14. Here's an option: by John+Harrison · · Score: 1

    IMS is a hierarchical database from IBM. The structure of the DB matches up with XML nicely and it is super fast. Of course it is also one of the oldest software products in existence...

  15. Re:why an xml database? -- There are many reasons by jd10131 · · Score: 1

    Take the example of a BOM, given above, with subassembiles (components which are not raw materials) This type of layout is common in manufacturing situations.

    I can build something called WIDGET1, for example.

    WIDGET1 uses 10 screws and 3 of WIDGET2.

    WIDGET2 uses another 20 screws and 4 more of WIDGET3.

    Write SQL query which can express this information. The only requirement placed upon your tables is they must be at least 2nf.

    You're going to end up rejoining your BOM file at least a couple times for that one. The problem is, the subassemblies can recurse an arbitrary number of times. That kind of thing cannot be expressed in a single query, since SQL can't recurse.

    That's the kind of situation where you need something like an XMLDB.

    OTOH. You can express this problem with a series of queries, and if that is the only problem in your app that an RDB can't solve, but there are many other places where it's a better fit, you make it work. Thus is the way of the real world.

  16. Update: Just found another good DB by beowulf_26 · · Score: 1
    So, like I said earlier, we've been doing this at work, and we found some new stuff since my post yesterday.

    I believe we've found our solution (hope I'm not speaking too soon). But we happened upon eXist for an XML database solution. While sourceforge lists it as alpha, the currunt version number is 0.9 and it seems rather mature, and FAR faster than Xindice. It looks to be a really good solution, and is easy to administrate. It also boasts Cocoon interoperability. Since you're going to be using Java anyways, it shouldn't be a problem that it's based on Java 1.3/1.4, thus being cross platform to boot.

    Also, you said you were worrying about DBs having a Java API. Quoted from the eXist homepage:

    Java developers should have a look at the XML:DB API, which provides a common interface to access XML database services.
    Cheers.
    --

    --I hate big sigs.
  17. I don't see the point... by megajini · · Score: 2, Interesting

    I don't see exactly where I would need that kind of XML-Database... My applications usually have a big load of model-objects witch represent the structure of my data at "work-time". This is a very beautiful and elegant way of building applications.

    The real Problem (in terms of flexibility and time) is the massive work needed for fetching data from relational db (Everything is working in Java, using JDBC2 compatible Connection-Pools) and getting it into the data-model and the way back...

    So there are two choices: Using an object-oriented db or using some xml-db features. The latter would be great, having some nifty utilities to produce xml out of object's data and then send this xml directly to the db.

    But I definitely don't see why a XML-DB should automatically do everything (It's like saving Word-XML to a Database). I think the entire concept of fast querys and stability would be destroyed at once. Those XML-Documents are application-specific-data written in standard XML (XML is only a structured language, it's like writing a document in english and hungarian; of course english will be more "useful" for the world), so it's the business of the application to care about it.
    There is virtually no product out there being able to do that magic trick. The only cool software I know that does this (not with xml) is Lotus Notes and even there a load of additional information is required...

  18. IBM's by Dave21212 · · Score: 1

    Try the generic IBM XML page also.

    --
    "Whoever would overthrow the liberty of a nation must begin by subduing the freeness of speech."--Benjamin Franklin
  19. IBM's Domino is well suited by Dave21212 · · Score: 1



    It's not relational, it's been described as 'document oriented' which is perfect for storing and retrieving XML docs. It's also extremely flexible, extremely secure (NSA, CIA, FAA, and 80+ million other users), and fast to program with (RAD), and supports tons of open standards. For you fans of "View Model Controller" - Domino has been using this architecture for over 15 years now...

    The XML classes are built in (or easily extend your own classes using LotusScript, Java, C++, COM, anything really!!!) There is an intro on the dev site that described the classses. Check out the demo code in the sandbox, or surf from the main product page. By the way, it runs on almost any OS/Platform (AIX, OS/400, Linux, Solaris, Windows).

    Personally, I would use Domino if I was going to create a repository fo reports in XML. The model fits like a glove and it's a pleasure to program/maintain.

    Here's a few random Domino related URLs for you...

    Gary's Devendorf is the Product Manager for the AppDev portion of the Domino product and he has a section on one of the dev sites with XML references.
    Off topic, but you can run your blog on the side (graphically challenged site warning) check out the links to Domino people, especially Libby !
    And there's even an Open Source group of Domino developers.

    --
    "Whoever would overthrow the liberty of a nation must begin by subduing the freeness of speech."--Benjamin Franklin
  20. Re:why an xml database? -- There are many reasons by HunkyBrewster · · Score: 1
    I believe you and your example are confused.

    First, you use SQL and relational interchangeably. That is incorrect.

    Second, you fail to provide a coherent logical model of your data - something that is necessary regardles of your preference for a RDBMs or a "XMLDB".

    For example, your refer to WIDGET1 as an entity when really it is a type. In your database, you will need to track the instances of WIDGET1. Something is a WIDGET1 because it needs to relate to 10 screws and 3 instances of WIDGET2.

    So far we are talking about

    1 WIDGET1s
    3 WIDGET2s
    4 WIDGET3s
    70 screws (assuming the screws used by in a WIDGET1 and a WIDGET2 are the same)

    Perhaps some clarification would sort this out. Do you mean to say that a WIDGET is a WIDGET and that it becomes a "WIDGET1" only when it is put together with 10 screws and 3 other WIDGETs? In that case, why are the 3 other WIDGETs not miraculously transformed into "WIDGET1s. After all, they too are related to 3 other WIDGETS and 10 screws.

    You must create a logical model of this - what are the components of your data, how do they relate, what are the rules that govern data manipulation. This is where relational technology comes in. Relational technology allows you to use the predicate logic to describe the specifics of your data to a RDBMs.

  21. Tamino by Software AG by munkinut · · Score: 3, Interesting

    When I worked on the Ananova project, we started off using Tamino by Software AG, which was great while we were in development, but we had trouble scaling from tens of stories per day to dealing with thousands of stories per day when we went live. Backing up, moving data between versions, and restoring onto higher spec boxes proved to be a nightmare, and we soon moved to Oracle instead. This was 3 years ago however, and the product may have matured since then. It would meet your requirements as stated certainly, and would be worth checking out. There are also Netbeans modules to aid development in Java.

    --
    re-invent wheels ... you never know
  22. The major question is... by Millennium · · Score: 1

    Is an XML database the right tool for the job?

    I'm not a relational-zealot like the sorts found at dbdebunk.com; I don't worship the table and the join, but neither do I worship the DTD and the entity. If you're just starting a project, think long and hard about your options. Maybe an XML database will be the best tool for the job, or maybe a relational database will, or maybe an OODBMS will work better, or maybe you'd be better off with an object-persistence system such as Prevayler.

    I can't know the answer, and neither can anyone else here, since you haven't supplied (and probably wouldn't be allowed to supply) enough data on which to base a decision. So just think about it. You have an opportunity to do it right the first time, so make sure you know what that is before doing anything.