What Do You Know About Databases And XML?
Dare Obasanjo writes: "XML has become a pervasive part of significant
segments of software development in a relatively short
time. From file formats to network protocols to
programming langauges, the influence of XML has been
felt. I have written an
overview of XML schemas, XML querying languages,
XML-Enabled databases and native XML databases.
Below is a shortened version of the article." Obasanjo's original OODBMS
article
has been updated to reflect more of the disadvantages
between picking an OODBMS over an RDBMS.
By this, it is meant that XML allows two systems that do not share a predetermined data exchange protocol to share data.
Thats it.
Where two systems share a common predetermined protocol, it is almost always more efficient than XML.
Applications of XML to programming lang design (XSL) and other domains are largely a waste of time and won't last.
You kind find more on OODBMS and their benefits here.
And they have some intelligent discussion over there too. Please leave it that way.
My company (www.phoenixca.cm) deals with things like Documentum, iMan, domino.doc and things along those lines. We are doing things that basically have never done before. We are mainly doing integrations between different software's and such, and we use XML a lot, it makes moving data between different systems (client / server, server / server.....) simple. So to be honest I am not surprised that XML has taken off like it has.
my 2 cents plus 2 more
There was a good discussion on XML data bases on the XML-Dev mailing list, which is summarized pretty well by Leigh Dodds XML and Databases? Follow Your Nose.
Databases are for storing data. End of Story.
Oracle is taking some BIGTIME performance hits for stacking all that OO crap in there, and MS SQL Server is seeing the same thing now that they've got the XML in theirs. Don't believe me?
Why is NASA switching to MySQL from Oracle and noticing speed increases?
Don't get me wrong, I'm a big fan of XML.. as a data interchange format.. but when i want tight storage and quick retrieval, give me a normalized RDBMS any day of the week. Because that's what it's for.
Databases are also for querying data :P heh
As far as I know, SQL Server is hardware bound. That means, add better hardware on it and u get a perf increase to match.
XML is great for representing data especially manupilating it with the BCLs in C#. Easy as pie.
As with everything DB, DB operations are always more expensive than any other storage operations.
DB operations are always the performance bottlenecks of any server system.
The only real way to improve on that is to partition the DB and clustering.
----- Whats wrong with this picture? http://www.revoh.org:1234/whatswrong
However, citing NASA as a source for technology or trends is a bit silly, for a number of reasons. The primary one is this: NASA is so large, and so diverse, that at one of their sites/on one of their projects they use one of just about every technology product you can name.
I was once running two back-to-back software evaluations for products in the $20-million range. For both applications, the top ten vendors all claimed that their system was "used by NASA for the Space Shuttle". We checked up and guess what - they were all telling the truth.
So you need a better example.
sPh
It's also a matter of need. Some things need OODB more than RDB. Some things need RDB more than OODB. I've found some things may well work better with a well designed mix.
now we need to go OSS in diesel cars
If you want tight storage and quick retrieval, why dont you save it in a spreadsheet? Excel or whatever? No DB overhead, data remains as tables (sheets) etc.
----- Whats wrong with this picture? http://www.revoh.org:1234/whatswrong
The new tools like XPath and XQuery are pretty useful, but do you know of any tool that reads in XML and then allows you to access it via standard SQL? I know it would be a bit of a stretch to make it fit the SQL model, but I think it would be very useful, as lots of people out there are used to using SQL. Anybody doing this?
"I don't know half of you half as well as I should like, and I like less than half of you half as well as you deserve."
Granted, XML has some advantages. Data interchange among disimilar clients, for one. But storing XML in a database is a gross waste of space and processing power, and is realistically impossible for all but the smallest of databases.
The society for a thought-free internet welcomes you.
After several weeks of dealing with growing pains and general brokenness, my manager wisely decided to transition our systems back to a UNIX environment. I worked in the group that was responsible for this, and after obtaining source code to several of our accounting and inventory applications, we moved the operation over to a Linux 2.2 (Debian potato) system. Things have worked flawlessly since then, and the OODBMS and Java developers are long gone. The promise of an OO architecture was great, but it just didn't work out in the real world - Linux was the solution for us.
-CT
But what if your data representation is already an XML schema? And a pretty complicated one at that? For example, look at METS : The METS schema is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide Web Consortium. The standard is maintained in the Network Development and MARC Standards Office of the Library of Congress, and is being developed as an initiative of the Digital Library Federation.
Have a look at that schema and tell me how you'd store that in a traditional RDBMS (I'd be interested if you could, because I know SQL, I don't know OODMBS or XML repositories - this is painful for me). Databases have been for storing data, but when your data is already a complex XML representation of an object, there's little use in saying don't use OODBMS.
I can't see any advantage to an OODB over an RDBMS. RDB is optimized and streamlined. OODB is new and bloated.
I can see a lot of 'gee whiz' about OODBs, but RDBs are proven, stable, and have a lot of accumulated knowledge. Seems that nobody really knows how to write an OODB. All that object stuff belongs in UML diagrams, anyway.
and your shit funnel was "used by NASA for the Space Shuttle", fucking velocity of crap outa you would put anything in orbit. now for fucksake donta take your dads cock outa your arse or we'll end up in orbit beside charon.
Basically, use the storage format that best suits your data and operations.
Dont just say a generic response "Oh, lets stick it in a DB"
----- Whats wrong with this picture? http://www.revoh.org:1234/whatswrong
THere was an interesting discussion on XML Data Bases recently on the XML-Dev mailing list, with several XML experts giving pretty interesting and not too biased opinion on the subject. You can find a summary of it by Leigh Dodds in :XML and Databases? Follow Your Nose.
Look, that's why there's rules, understand? So that you think before you break 'em. (Terry Pratchett)
So what do you think of using XML for system configurations? That tends to be in UNIX systems a lot of separate files, traditionally edited with vi although today the tools are getting more and more dummy friendly and have a smaller space of possibiities.
now we need to go OSS in diesel cars
.NET applications already do that.
.INI files and regkeys.
appname.exe.config which is XML instead of darn
----- Whats wrong with this picture? http://www.revoh.org:1234/whatswrong
Well, databases are for storing data, that's right. But what you've got inside your XML document is data as well, so there's no reason why you shouldn't store it in a database.
Of course you'll get into performance trouble, if you import your XML data into a normaized RDBMS (converting it) ard export it afterwards (converting it again). That's exactly the reason why there's a need for XML-native databases (and it's also the reason why "XML-enabled" does not help very much).
xml is an interchange format, not a storage format
Absolutely, positively agree. Not only is XML only an interchange format, but it only makes sense in some situations (for instance if we have an embedded piece of hardware that we have to communicate with, and we're communicating to it from a Windows box, and there is no shared common data encapsulation format, I'd greatly prefer XML (with XSD) vastly over Jimmy the Programmer making up his own data encapsulation format/documentation method/extraction system, but if I have two Windows machines running SQL Server and they're in a common security context and they'll never change, I'd use DTS or replication, not XML).
and MS SQL Server is seeing the same thing now that they've got the XML in theirs
The XML "in" SQL Server is surface fluff (I love SQL Server and I'm saying this as a good thing, not a bad thing). i.e. Some modules that'll convert an XML query to an underlying DB query, and the results back to XML, and some basic XML importing and exporting routines. This hasn't affected the underlying operations of SQL Server whatsoever.
XML doesn't solve anything:
it is a an immature and clunky form of Lisp, and Lisp has overcome these "interchange" problems decades ago. And Lisp has turned out to be a fine platform for programming language design and "other domains". XML is a buzzword, plain and simple, and will be forgetten in a number of years.
XML's problem is that it has no provisions for a programming language's semantics. And adding them would be rather pointless, since Lisp has been doing this for far longer than most of you were alive, and in a much more elegant fashion.
But a lot of effort has gone into XML, and we can afford the extra overhead now, and it is standard and widely available for most languages and platforms. It isn't time to throw that away. I would use XML for all now application development, however the benefits of migrating old applications and their datatypes to XML is marginal - why fix something that isn't broken?
Can we please, please, please append the definition of XML to allow "" to close whatever the last tag was?
That simple change would probably cut the size of the average XML file in half.
Sometimes it's best to just let stupid people be stupid.
Or you use the best of both worlds - a database system like Cheshire which parses XML into a flat RDBMS (in this case the open Berkeley DB3). Cheshire is Free, as in almost. (Not-for-profit licence from UCB)
We use Cheshire for serving large documents which you can search based on the indexes built at database load time. While in theory you may want to search on arbitrary XML paths, 99% of the time what you really want is a simple named search. (author, title, subject, keyword, full text etcetc.) so by reducing the XML into a flat format you don't lose any significant functionality. 99.99% of people would be confused by searching on a 'tag' or 'XPATH' -- they have a concept in mind of what they're looking for, and how you represent that in your underlying data is irrelevant to them, as it should be.
-- Azaroth
It requires Netscape 6.(not out yet), IE 6, or Mozilla 0.9.5+ because of it's use of XSL Transform functions.
You can view the page here.
Joseph Elwell.
well, i'd site my own performance tests ran here at this company, but my NDA prevents me from publishing anything.. so, i used the NASA reference. I encourage anyone with the resources to do the same, and use what works best for you...
Linuxfromscratch.com has a project that aims to automate the process of building your own linux setup storing configuration files in XML, read the intro page they propose you could go to a website and fill out a survey type form to define your system, which would create a configuration file that could build everything correctly. It sounds to me like a huge undertaking but if distros chimed in on this and contributed the tools and expertise they have in how to install a linux system automagically, Automated Linux From scratch could become a standard tool used by anyone wanting to setup linux on anything. To go one step further and convert my /etc directory to MPXML (My Penguin XML...I made that up) well I don't know if this would be a good thing.
"The Most Fun Possible on 4 wheels" is at SunBuggy in Las Vegas
Comparing Oracle and MySQL performance in the context of XML is silly. It is a well-known fact that MySQL is significantly faster than Oracle, but not because of XML, Java, or other "OO crap". It is simply because MySQL doesn't have transactional support, and probably a host of other non-OO high end RDBMS features.
I wouldn't be surprised if "OO crap" does indeed slow down Oracle, but I know the JVM for Oracle is completely optional. I can't speak to any XML features in Oracle, I'm not familiar with them.
In a real emergency, we would have all fled in terror, and you would not have been notified.
Anyone explain this to me :
Linux brainstorm
Broken Hearts are for Assholes. - Frank Zappa
It makes me sad to see all of these closed minded people when it comes to XML. They just haven't seen what XML can do and have been turned away from previous work in XML. XML can be used for data storage, and has many advantages.
XML allows data to be stored with context. For example if you have the data element "CmdrTaco", that doesn't mean much. But with xml, you can store this bit of information with context:
<SlashDot>
<Editor>
<Name>CmdrTaco</Name>
</Editor>
</Slashdot>
Isn't that more informative?
It is surprising to me that people who like OO don't like XML. OO allows you to have functionality attached to your data. XML allows you to put context (and even functionality) around your data.
Another big advantage of XML databases is the lack of a schema. If you want to have a dynamic database is the relational world, you are looking at a large schema migration. An XML database allows you to just add the information with no migration at all.
Advanced storing techniques allows query of the XML database to be just as fast as a relational database. How can that be? The XML is stored in a specialized indexed form that allows for fast retrival.
Sure, there are applications where it doesn't make sense to use an XML database. Using an XML database to store relational data doesn't make sence, that's what relational databases are for. But if you can think outside the mold, and store your data in a new way, XML databases are for you.
I might be a little biased in this area, since I work for a XML database company (http://www.neocore.com). I have seen XML in action, and it is more than just a data transport. I hope that I can convince at least one person to look at this advanced technology.
real db tables aren't so different from csv files... it's the way that they are accessed by the db app. i think that really good dbms's are tuned so that they take advantage of page sizes in primary storage devices, while spreadsheets will not do that.
logging and being able to roll back changes are other huge plusses to a real dbms.
A: None. The Universe spins the bulb, and the Zen master merely stays out of the way.
Doesn't anyone else think that a pretty bad job was done on XML?
Don't get me wrong - saving data as plaintext with tags to mark the structure is ok as long you don't have to care too much about footprint. This was done long before XML was specified. Chomski did all the basic work on generative grammatics ages ago. It was simply necessary to put this old stuff into a simple standard.
IMO they did a bad job. Probably SGML was specified before UTF was. But when a new standard is created, it really shouldn't support dozens of old encodings. And allow to change them on the fly. It makes the parsers big, slow and complicated. And I simply hate how CDATA looks. Plus the DTDs obviously aren't/weren't a contender.
All in all I think specifying XML was a pretty trivial job. And the result is lousy.
Can we please, please, please append the definition of XML to allow "</>" to close whatever the last tag was?
That simple change would probably cut the size of the average XML file in half.
(corrected post, please moderate my other one down. I have plenty of Karma to spare...)
Sometimes it's best to just let stupid people be stupid.
This is another characteristically flawed discussion on the promises of XML.
Platform independence. Nope. This depends on the committment of hardware and software manufacturers to build compatible cross-platform technologies -- and commit to them for long periods of time. Parsers, browsers, OS's, databases, languages, etc. What are we learning with Java? Sun and Microsoft can't even come to agreement on it. Just how many browsers out there support the full range of XML implementation? Who's going to dominate this battle? Who's going to lose?
Data interchange vs. data storage. Things like Xscheme, xquery, etc. are all duplicates of technologies which have existed in RDBMS technologies for 20 years. Most of the XML apps. I've seen look like a trip backward in time, to the days of flat and denormalized databases.
The glorious w3c. It's $50,000/year to join, and dominated by Fortune 1000's. Is this the best drumbeat that we should all march to?
The Layered Network Model. XML is a little problematic for me in that, if used to send tags to a browser, seems to violate the basic separation of presentation, session, and application layers. The definitive data source must be server-side, and HTTP (with a presentation layer!) is hardly the best place to be doing data conversions. Probably a dedicated socket elsewhere. Why involve the client and a browser in the exchange of data formats? XML re-invents the wheel and puts it on the wrong place of the carriage.
I have a fuller discussion on the problems of XML here: http://www.tc.umn.edu/~brams006/mortality.html.
is hardware bound. That means, add better hardware on it and u get a perf increase to match
Huh?
What tasks don't perform faster when you run them on faster hardware? Are you trying to say that the code and architecture are absolutly optimal, and no performance gains are possible without a hardware upgrade? Not likely.
XML is for exchanging data (for sure, i'm not the first to figure this out), but Oracle has really nice XML - Support for im- and exporting data via the XSU (XML SQL Utility). Just give a XML file and the data is inserted into the DB and exporting is even easier. Thats why I dont like MySQL (even with PHP).
Microsoft said "XML Good" and the VB disciples followed.
It's one of those things your boss tell you to implement cause it's the thing to do, but nobody really has idea of what it's all about. Personally I think it's a big load of BS.
This is not a file format, it's just a syntax, how is it going to help any exchange ? Maybe will be easier to reverse engineer Word format.. at the expense of some serious data bloat.
Dump the hype and get busy doing some real coding !
Oracle takes it up the backend processor.
They are optimised for insertions.
Christ you've got to be kidding. We're talking about REAL datastores here. Sure if you're talking about 1-10mb worth of information over a trusted user base, Excel might be fine. But in a real datasource model, with gigs of information at an undefined userbase, Excel has no abilities. Not to mention having no data integrety.
--
Does anyone remember
There's this German textile management system called Koppermann that I was curious about - it's really flexable as far as I could tell. Whell, I fired up the ODBC browser to take a look in its MS-SQL tables as I kid you not: THEY IMPLEMENTED AN OO DATABASE IN A FLAT TABLE DATABASE. The had a giant user interface table that had rectord like this:
ControllID
ParentControllID
DataType
FormLocationX
FormLocationY
Then they had a giant data table like this
DataID
ParentDataID
ControllID
Data
Argh! The madness of it all. Everything of substance was in these two tables. I'll admit that it's a nice hack, and they can tell all their clients that their data is 'easily exported into a CSV file.', but good greif! It reminds me of those people whoe made so many # define macros in C as to make it look like Pascal.
Moneyed corporations, non-working 'poor' and criminal prisoners are turning productive citizens into tax-slaves.
I also know of people who have had bad experiences of using OODBMS but you seem to be mixing up several issues in your post.
I don't think the fact that your company developed a poor Java application running against an OODBMS can justify statements like "The promise of an OO architecture was great, but it just didn't work out in the real world - Linux was the solution for us."
I've seen many fast and stable OO Java Applications running under Linux/Apache/Resin and connecting to a RDBMS. The applications are easy to develop/maintain and are very reliable.
A system is only as reliable as it's weakest link be this an OODBMS, poor coding or Windows/IIS.
Matt.
There's a good article here called "Why Not MySQL?" by Ben Adida [mailto], part of the OpenACS Project [openacs.org], on why MySQL wasn't the right choice for OpenACS (at the time). It's quite out of date (and is recognised as such by the author), but still worth a read, and there many interesting submitted comments. Take a look at some decent free RDBMS alternatives such as Firebird (open-source free Interbase) or PostgreSQL while you're at it. Oh, and there's plenty more dicussion on MySQL in a previous Slashdot article here.
Stef
I got tight storage and retrieval facilities, MY ARSE.
There are several database systems that do not scale well. The point the poster was trying to make was probably that it does scale well.
- Michael T. Babcock (Yes, I blog)
Now that fud is dead, don't forget to enter our big URL giveaway, so you'll be ready to play into the gnu millennium. Includes a year's free hosting. Whatever.
On the other hand, a lot of data is semi-structured. It can benefit from being searchable in a straightforward manner, as XML can be, but how it will be searched isn't clear enough ahead of time to optimize an RDBMS schema for it. Furthermore, there is no reason why it needs to be accessed with a high level of performance. TPS don't matter, it's not going to be hit that hard and that long.
Just because an RDBMS may be best for an airline ticketing system, doesn't make it best for a medical record system.
What's a sig?
Your tax dollars at work: not wanting to play favorite, the government buys from everybody, regardless of whether the bought stuff even works. Right?
OO databases mixed with XML == Very bad performance
This may be great for acadamia, or perhaps small projects, but in "The Real World"(tm) this won't fly. As a performance guy working on a big system, I can tell you that using OO databases and/or XML queries/storage will butcher performance.
For most of our clients, performance is the #1 concern, as that is what dictates hardware. Buying one 32-way p680 for a typical RDMS solution -vs two for a fancy OO/XML solution isn't much of a choice.
"The market alone cannot provide sufficient constraints on corporation's penchant to cause harm." -- Joel Bakan
... because if they did, people might realize that:
<foo> <bar>baz</bar> <mumble>grumble</mumble></foo>
is equivalent to
<foo> <bar>baz</> <mumble>grumble</></>
which is semantically equivalent to
(foo (bar "baz") (mumble "grumble"))
And if they did that, they might have to admit that XML is semantically equivalent to Lisp S-expressions, and not a major advance in computer science after all.
And they'd never do that.
To a Lisp hacker, XML is S-expressions in drag.
The other end of the situation is where software is flawed and has problems scaling. In this case more hardware may not have a noticable affect.
Of course most of us understand the idea of structured data. The point is there are already better ways of storing and retrieving structured data on a server, and very little compelling reason to send content-oriented data through to the web client, where presentation rules and all attempts at disintermediating presentation have fallen flat.
I've found that the ODBC Socket Server (odbc.sourceforge.net)
From the site: ODBC Socket Server is an open source database access toolkit that exposes Windows ODBC data sources with an XML-based TCP/IP interface.
You make the Query in XML, it converts the query back into SQL, and then returns XML again. Let's you use XML when you want to, and SQL when you want to. Also makes a great tool for accessing M$ DB's from a Unix enviroment. It's fast, stable, and Open Source. Check it out.
Driven by 100% sarcasm - fueled by the need to be heard.
I've used XML extensively and in someways agree with people saying XML isn't a storage format. But right now there are lots of applications where XML is the perfect storage format. Example: Consider a order processing company who brokers orders for company to company. One option would be to define a monolthic db schema to take care of what each company would like in their order. Another would be to define a really abstract schema to facilitate handling generic order forms. The problem with the first is, each time XYZ wants something added to an order form, you need to change the schema. With the second, it'll work but you'll need exceptionally discplined and smart programmers to deal with the abstract layer. This doesn't even deal with migration issues.
The solution is XML. You create a XML Schema and start storing stuff. Some company wants more parameters - no problem, extend the schema. You need to migrate previous XML docs to adhere to the current schema, use XSLT. Or you can add these as optional parameters and every document that exists already will conform to the schema.
Speed in XML is an issue. But people who think you need to read the entire XML document to process don't know what they're talking about. You can do modular processing. Also, you can do smart indexing to increase speed. And in a production environment, you turn Schema cheking off unless you're getting documents from untrusted sources. Will XML ever be as fast as RDBMS? Probably not. But XML doesn't store relational data. And with current research in XML Query languages, I'm sure XML's speed will be good enough for most applications in the future that deal with fuzzy schemas. (If you need high performance DB, then you have to bite the bullet and use a RDBMS).
My two cents.
Of course, this is not an easy question to answer, but the right answer involves knowing three things:
1) Can certain records be considered 'atomic'?
This is similar to the RDBMS question of whether or not it makes sense to construct a view or not. View definitions represent a common query. If you considering a query as a means of tying together disparate data from many tables into a single, denormalized set of records, the record could just as easily be expressed in some XML format.
Now, if that record represents some physical or conceptual entity in the data model, it is in fact a set of properties about an object. This is what XML is good at representing. Decomposing that set of object data (record) into normalized relations may not make sense if such 'objects' are frequently requested; but there other considerations...
2) Ad hoc queries are difficult when data is stored internally in XML, because each XML blob has to be parsed and checked for the query values. If you don't know in advance if the XML structure even has the fields you're looking for, then you must do an exhaustive search. Some have used indexed XPath information to work around this issue. Since we're mentioning indexes...
3) How do you find the XML blobs you're looking for. We've used an ORDBMS for our XML data, and indexed on the ID or key values (as defined in an XML Schema) for each element stored in the database. This makes looking up element instances easier. It also makes relating them easier, too, if you use IDREF or keyrefs as your foreign keys.
Now every XML document has a single root element. If you're storing that document in a database, you could choose to store just that one root element instance. More likely, you'll want to decompose the root so that accessing subelements by ID or key in the database will be easier.
Got to run off now,
Jeff Lowery
If you post it, they will read.
Databases are for storing data. End of Story.
Exactly, and XML is a format for encoding structured data. There are many kinds of documents that live their live their entire lives as XML, from XHTML documents to configuration files to myriad kinds of XML documents that exist today.
Why is NASA switching to MySQL from Oracle [fcw.com] and noticing speed increases?
If all you want is speed then MySQL is all you need. Similarly I can quote how much faster TUX is than Apache but that means nothing if I have dynamic database driven content that I want to use JSP or Perl to access.
There is more to picking a database than how quickly it performs some SQL queries.
Don't get me wrong, I'm a big fan of XML.. as a data interchange format.. but when i want tight storage and quick retrieval, give me a normalized RDBMS any day of the week. Because that's what it's for.
This means you're suggesting that people shred XML documents into relational data to store them in the DB and then reassemble them whenever they retrieve them. This is massive overhead and error prone since you're depending on your developers to come up with custom ways of doing this for each application. Also typically very difficult to ensure that the XML that was stored in the DB can be accurately reconstructed (what happens to comments, processing instructions, enbtities, etc).
when you start to grasp the implications of a beautiful language like Lua: http://www.lua.org
simple, tiny, fast, descriptive, extensible.
Meta-Features.
when your data is is code, it only makes sense that it's programmed in such a language that makes parsing it is as simple as execution.
XML tends to be good for hierchial, widely-parseable data. In this sense, XML is good for configuration files, because many of the more advanced ones need some type of hierarchy to be sane. Also, it makes it easy to have one editing mode for many different configuration files, and configurations can be displayed/queried in a more universal manner.
Yeah, right, all XML and SGML parsers have to read the entire document before anything can be done. All of those SAX parsers are a figment of my fevered imagination. *rolleyes*
Now, if you're pointing out that XML provides no mechanism for indexing so you'll have to scan the file *until* you reach the record you're interested in, I agree. But as others have pointed out, nobody uses XML as the storage format for anything but the smallest databases. (E.g., configuration files.) But the translation to/from XML format for queries no more breaks its 'purity' than converting SQL "insert" clauses into binary data stored in B-tree or ISAM tables breaks its relational purity.
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
Ok, that's probably what he meant. I didn't imediatly get that out of what he said....
Yes it is true that some software does not scale well but that's not nearly enough information to mean anything. Does that mean that if you add another machine you get more performace? Another CPU? More memory? Software that "scales well" in one environment (say on 4-8 CPU x86 machines) may not scale well in other environments (large mainframes).
Another point. Say that the performance of the software scales linearly, and your performace is multiplied by the number of whatever hardware devices you're adding that you have. You could argue that that software scales well, but if said software has a slow section of code in it's main execution path, optimization of that code (or removal if it's a fluff feature) shifts your whole curve. There is no reason that a piece of software can't both architecturaly scale well and perform like crap at the same time.
I have no experience with SQL server, so I cannot if this is the case or not. I do know that I would not be able to make a decision about it only knowing that is scales well.
When you normalize, you force the database server to recombine an object from it's spread-out parts at runtime. This is why database companies have spent the last 20 years working on transaction optimization...so they could match the transaction performance of databases based upon a navigational data model.
Relational databases where originally conceived to help with decision-support processing - like data warehouses where queries are the majority of processing. Navigational data models (like object models) are optimal for on-line transactions because the records usually have pointers that take you directly to the data you want to access - rather than have the DBMS server perform join-processing.
Please, don't get me wrong, I got Normalization-fever just like the poster. You just don't do it to speed things up... Normalization generally has the opposite effect....except perhaps for updates on highly denormalized systems.
I want to be alone with the sandwich
I'm writting an app that will use input from many differtn sources that I can not spec at this time so wtitting a DTD to allow any data base jocky to dump data into an XML seems like a simple answer to me. The user can debug their input, the data set sizes will fit in a PC's foot print and I can just bolt in Apache's Xercres parser to make sure the input is all accordign to Hoyle. This leaves me to work on the guts of my code not writing YAHP (Yet Another Half-debugged Parser)
-peace
I find the tags are a major hindrance proper editing tekniq. If the tool is vi, I have to deal with the tags manually. If the tool hides the tags, then it has to be interpreting them and presenting some logical construct. But I've yet to see any tool that can let me do all I want with config files. How would /etc/rc look in XML?
now we need to go OSS in diesel cars
Well, I am fairly atrocious at my understanding of SQL (clueful db admins have always coddled me), but it does not strike me as hard to store hierarchal data in databases. Each tuple would just have a reference to its parent.
As for attributes, a different table per type of node.
I have no idea about performance properties of such a contraption, but I imagine it could really be improved by modifying the XML to lend itself to such a relational representation.
I'm not just talking out of my ass either. I've worked with EDI systems(data in binary format means you need proprietary software on both ends), XML, and plain old text files. I've used all 3 in the context of transferring data between businesses, which is what XML aims to solve. My feeling is that plain old text files, along with a descriptive file of how the text file is laid out, is overall the best solution for most data interchanges between businesses.
One really good example of this is using diff. Suppose your supplier maintains a database of products you can order, and this data changes daily. Using text files you can easily diff todays file with the one you retrieved the day before and get a much smaller file to use to update your internal database. I can't imagine a more elegant solution using XML.
I have found one good solution that uses XML - outputting XML on the fly over the net in response to a query. If you have customers that query your data regularly over the web, any change to the HTML will throw their queries off if they are "screen scraping" to get at your data. XML solves this problem nicely, even if new fields are added or if the XML page layout changes in some way. I don't see the logic of actually storing XML in the database though.
My experience of being in a business where data interchanges take place on a regular basis with other businesses, is that formatted text files are still the best way overall. They are easier to deal with and faster than XML ever will be.
No, Thursday's out. How about never - is never good for you?
XML on a DB seems foolish to me, just storing lots of extra data. What I want to know is: When will the hype on XML die the death it deserves? I am all for XML, it solves real business problems of data conversion and self description.
I am just sick of the answer for everything being XML-can't talk to a business partner, oh we should use XML. Doesn't matter what you are trying to do.
For instance, if its just two parties talking, it probably makes more sense to do some sort of flatfile, as opposed to XML (time to value is much quicker).
looks like the site can't handle the slashdot load of people.. *sigh*
XML can be preferable to an RDBMS for storage when long-term preservation is a goal. Store the preservation copy of the data in XML, then use other more efficient formats for immediate processing needs. The San Diego Supercomputer Center has done some work in this direction. If I want someone to be able to access data 50 years from now, I'm much more worried about software depedence than performance.
Actually, to get good retrieval performance you want to query denormalized data. To get good input performance, you want to write normalized data.
Yes, couldn't agree more. XML is just a particularly annoying way of writing S-expressions.
I really don't get people who complain about Lisp syntax and then tell me how wonderful XML is - XML is 10x more annoying than Lisp!
Also, if you want to deal with XML in a semi-sane way, may I recommend just transforming it into Scheme, processing it with the normal LISPy tricks, then pretty-printing it back out... See here for the best we to deal with XML weenies.
Choice of masters is not freedom.
Converting to PDF is easy too. Just use XSL-FO. Apache has an implementation of this. Currently they only create PDFs, but it could easily be sent to a printer directly. You can also convert directly to TeX:
For my small projects, using DBXML has been a joy. There are certain things for which using XML makes a lot more sense. Some data models just fit more naturally into hierarchcal structures, for example users and groups. If you have unique usernames, you can pull data on a user, then pull their group quite easily without the need for a reference table simply by pulling hte user's parent.
This isn't to say I think XML databases are the answer to everything. One of the largest problems I find so far is that it is that queries that are relatively easy in SQL can get a bit tedious is XPath. Also, as of yet there doesn't seem to be any truly standard query language. This is understandable, given how new the designs are, but it is a bit difficult to decide how to do things sometimes. Do you check in a document, or XUpdate it? Play with DBXML and you'll see what I'm talking about.
For those of you complaining about XML not being an efficient way of data storage because of the high memory cost of keeping documents in memory, bear in mind that there are more parsers out there than just DOM and its relatives. SAX is quite efficient, and even if you're using DOM it is entirely possible to pull fragments out of the document as you see fit; in fact XPath makes this quite easy.
I may be crazy, but I eventually see XML databases providing solid competition to standard RDBMS systems. I've seen complaints about performance -- I think much of this is lodged in the fact that a lot of these systems are not native XML databases -- they are RDBMSs with XML capabilities thrown on top. One way or another, it should be interestign to see how things pan out.
End rant.
just my blog and pix
agreed. linux is shit.
I've been properly brainwashed in the Open Source way, and I use XML all the time as an interchange mechanism, but you'll have to pry Crystal Reports from my cold, dead fingers.
I have spent a lot of time training non-technical users to get their own damn reports from databases. It's hard to imagine putting data--any data--into a system where the tools to get it out haven't been written yet.
a "langauge", but the needle broke off when I accidently plugged it into a gigabit switch.
Tiller's Rule: Never use a word in written form that you've only heard and never read. You will end up looking foolish.
because one cannot normalize a spreadsheet
I can't believe no-one has posted my standard response to someone who thinks XML is just for "interchange".
The interesting thing about XML to me is NOT that it solves the interchange problem (though it helps with that). The great thing is that it solves the PARSING problem. No longer do I have to write a parser everytime I have some simple task of reading in something externally.
What XML does is define for you a standard means of parsing, and by defining the API for parsing and the structure of the documents lets you think about how you want to structure external information, not how you're going to read it in.
Also, because the API for parsing is now hiding the engine details below, parsers can be specialized depending on what kind of task you have. Parsing thousands of 1k XML documents would seem to demand a different processor altogether from a few multi-GB documents, but you only have to know one parser (Ok, really two - SAX and the DOM interface). You could even have specialized XML processors that did write the stream out in a wierd custom binary format for compactness and read it back in with the normal DOM API so clients wouldn't have to adjust. I'll grant you that there don't seem to be many specialized XML processors - yet.
I also like the robustness of XML exchanges (here I'm getting more into your main point). If you add or drop attributes from an XML document, clients that read that document are less likley to break (unless of course they relied entirely on the node(s) you have removed!). That is especially true of XSL, where missing nodes of a document simply correspond to missing parts of output (which can also be a useful effect).
You might think of XSL as a useless language, but I'll be happy to make a counter-prediction that it will grow and thrive. It's simply too useful a transformation tool to do anything else. I know the syntax seems overbearing, but for the kinds of short transformational work it's normally put to that's not much of an issue and you get used to it quickly.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
At my previous job, I implemented an experimental app that was inspired by RDF (Resource Description Framework) and triple stores.
In a triple store, you have objects that are defined by a set of properties. The word "triple" comes from the fact that you have triples of objects, properties and property values. For example, you could have a person; John Q, who has an age 37, a phone number 1234 and an employer Foo Ltd. Foo Ltd. in turn has a phone number 5678 and any number of other properties. This forms the following tripples: John Q --age--> 37, John Q --phone number--> 1234. John Q --employer--> Foo Ltd. Foo Ltd --phone number--> 5678.
When you look at these, you can see that Foo Ltd. is both the employer of John Q (a property value) but also an object in itself that is described by a set of properties. In RDF, the tripples form a graph that describes your data. The graph is typically serialized as XML.
At first, it would seem that this lends itself very well for relational databases. A row in a table would be the object to be described and columns are the properties. The intersection is the value. However, the problem - and strength of RDF - is that you can have any number of properties for an object. Basically, you could have any number of columns and sometimes, the property value is not just a value - it can be a database row in itself or even a set of rows.. or a set of values.
The app I wrote mapped arbitrary RDF files to relational databases and back as well as provided an API to perform queries on the data. The result of the queries were RDF graphs in themselves.
While this was quite cool, it turned out to be quite difficult to turn the query result graphs into meaningful stuff in a user interface. Also, queries on the RDF graphs could turn out to be extremely complex SQL queries... Most of these problems were eventually solved but the code wasn't used directly for any real world app, except heavily modified as a metadata database for a web publishing system.
Killing the enemy solves the problem only if none are left. If you stop short of a clean-cut genocide you have to make some prosperous peace afterwards or the thing will start again a generation later.
WW2 was possible because of the merciless conditions after WW1. OTOH germans are reliable allies for the last 60 years because of American economic aid.
On the same scale, Iraq is an American disaster. There still is no McDonalds in Bagdad, so these guys still have nothing to lose.
Post tenebras lux. Post fenestras tux.
Of course, one of the "ideas" of XML is that you can just strip out all of the tags and have a document you can sort of read. That would be anathema to a Lisp person, and for good reason. Lisp is all about simple, minimalistic expression and manipulation of hierarchical data. XML is about an underspecified hodgepodge of structure and free form data.
Which is not to say that it's not useful, regardless.
- jon
Ganymede, a GPL'ed metadirectory for UNIX
Ironic... kuro5hin.org seems to be ahead of /. once again. Earlier this morning, they ran this story, eerily similar to the one I am replying to now.
/.
Although, I do have to admit that I was throw off for a moment: the reversal of "XML" and "Databases" in the headline tricked me.
Try again,
heyitsme
"scaling well" doesn't get you anything if you don't have anything to scale to.
A Pirate and a Puritan look the same on a balance sheet.
alright, the first posts _are_ gettin more creative :)
you have to make some prosperous peace afterwards or the thing will start again a generation later.
I agree, but...
OTOH germans are reliable allies for the last 60 years because of American economic aid.
American economic aid AND the fact that we installed a stable democracy. What's going to make the middle east complicated (and ironic) is that we will probably be able to install a stable government and lots of aid into Afghanistan, but a lot of the other countries are still relatively backward governments. Saudi Arabia, while our friend, is still a dictatorship with relatively little freedom.
I think this war is going to solve a lot of problems, but it won't eliminate them. The problems won't go away until all the countries become modern democracies with self-sustaining economies.
On the same scale, Iraq is an American disaster.
In terms of the Gulf War, it was a success. The mission was to kick Iraq out of Kuwait in order to preserve the free flow of oil. That was clearly successful. I think Bush felt that exceeding our UN mandate and launching a full-blown invasion of Iraq would've created a lot more problems than it solved, and he was probably right (we'll never know of course). He decided to gamble that Iraq was weak enough for Hussein to be overthrown. Unfortunately, the gamble didn't pay off.
Sometimes it's best to just let stupid people be stupid.
You keep referring to "SQL Server". Which one? PostgreSQL? MySQL? Sybase? There were several last time I checked, even for MS.
Warning: Can't connect to local MySQL server (111) in /usr/www/htdocs/amdpower/mainfile.php4 on line 32
Unable to select database
yeah - seems reliable
Wanna put some money on that?
Just treat each sheet in an excel file as a new table, and then normalize as usual. Pretty ugly, and I don't really know what the point is though.
I'm a loser baby, so why don't you kill me.
Excel seems to be able to scale for one user up to around 70megs. Beyond that, it is better to just use it as a querying and analysis tool front ended to an Access/ODBC/ADO database.
I'm a loser baby, so why don't you kill me.
my experimentation points to the same stuff
,libxslt and XML::LibXML,XML::LibXSLT
....blehck!
xml is interchange
database is storage
they do have to talk together though,
and generating a xml doc based off of database info and then parsing via
xslt is so trivial i dont think it needs repeating
i think the biggest prob in xml is the incompatibilities
between the parser & lexicon engines and well as interoperability too
of course html browsers are like this already
and i loathe it as much as Tim Berners-Lee!!
the best modules / libs ive found for this has been libxml
as far as really good xml to db interchanges oracle was working on XSQL[?]
which would allow for db connections via an xml file, it was only in java though
back in the day we didnt have no old school
Separation of content, logic, and presentation is very difficult to do in current web-app developments environments.
The breakdown is not on the logic/content side of the equation, or the presentation/content side, but mainly in the presentation/logic arena.
Imagine an HTML designer who has mocked up a page for a web-app, and hands it off to the dev team for them to add in the neccessary laogic to dynamically include the user-name, current balance, contents of the shopping cart, etc. Depending on the exact paragdigm taht their tools use, they will either:
a) Chop up the page and include various fragments in the programs that are designed to emit said fragments at the opportune times to be assembled into a text stream eventually recived by a browser
or b) Various bits of logic get stuck into the page in oder to parameterize and/or conditionalize it, using either some sort of speacial tagging format or actual inlined blocks of code.
Whichever approach the dev team's tools use, the result is the same: the designer can no longer change the altered page.
Even in case b), which maintains some semblance of a coherent 'page', the designer cannot load the page-with-logic into their favorite visual editor and see anything resembling the actual page. They certainly can't edit it to change the look-and-feel without breaking the carefully constructed logic.
The end result is that the designer has no recourse other than to take their page design, change it, and hand it over to the dev-team again for them to re-include (in some cases re-code) all of their logic.
This is obviously a very wasteful approach.
Amazingly, there actually is a solution to this problem. It's called Template Attribute Language (TAL), and it solves the problem by adding programming directives to the page via XHTML attributes on the existing tags. The language is deliberately designed to only be suitable for presentation logic, relegating business logic code to some other objects, where the designer can't see them. This helps enforce the appropriate distinction between presentation logic and business logic that most current development environments ignore, thus encouraging their admixture.
Currently, TAL (and the related specifications TALES and METAL) are only implemented in one environment, but the language has been deliberately designed to be as platform agnostic as possible. Other implementations of the specification are possible, and even desireable.
Articles:
Zope Page Templates: Getting Started
Zope Page Templates: Advanced Usage
Using Zope with Amaya, Dreamweaver, and other WYSIWYG Tools
The real Webmaven is user ID 27463. I don't rate an imposter, because my ID is such a lame-ass high number.
A large number of otherwise intelligent posters would seem to have been hit by the runaway XML hype train. Examples culled from various posts:
...[not a] major advance in computer science.
...[bogus] contribution to programming language design (re: XSL)
...[transfer data between businesses,] which is the problem XML aims to solve.
But these are critiques directed at the hype machine, not the specification. This is really distressing me. The machine is so efficient that there are API's for XML (which shall remain nameless) being written and optimized for message passing which cannot handle mixed content as a matter of design. As though it were somehow so useful in this area that a section of the spec should be tossed to make it efficient. As though there weren't already gallons of ink being spilled on EDI, etc.
XML was not designed to replace S-expressions, to facilitate cross-platform communications, revolutionize EDI or DBMs, to theorize about language design, yada, yada. XML is just that, an Extensible bloody Markup Language, a document tagging scheme. In this regard it is a tremendous advance. It is 80% less suck, by volume, than what went before. If you think your XML parser is bloated, have a look at any SGML parser. Part of what gets stripped out is tag minimization, the absence of which another poster complained about.
Hey, its text and not binary because I need to write it and read it. Yes, Virginia, I've got 400 users tagging XML in flat-file editors. They complained about the loss of tag-minimization, too. But my svelte little Xerces needs a hand to stay so lean.
The goal is to get structural and semantic information into my documents. (Yes, it's data, but a special kind of data called a document. You can call the message your passing a document, and use XML to format it, but there is some overhead the hype machine may not have emphasised in their rush to market.) I also strive to eliminate formatting or presentation instructions from the document (or hide them in PIs) to facilitate multi-target outputs. This lets my typesetters typeset and my data-entry people enter data.
XML is designed to bring something of this model to the web. HTML is too presentation oriented. SGML is too bulky. That's what it do, babe. I take a single source file from somewhere on the filesystem, incorporate pieces from elsewhere (entity resolution, DB queries, etc.), turn it into one of five possible outputs. I use two different pagination engines with different proprietary formatting macros, XSL(T|FO), or a trap door on the bottom to dump pretty-printed ASCII. Its a publishing tool.
illegitimii non ingravare
but, you are correct sir
XthML doethn't tholve anything:
it ith a an immature and clunky form of Lithp, and Lithp hath overcome thethe "interchange" problemth decadeth ago. And Lithp hath turned out to be a fine platform for programming language dethign and "other domainth". XthML ith a buzzword, plain and thimple, and will be forgetten in a number of yearth.
XthML'th problem ith that it hath no provithionth for a programming language'th themanticth. And adding them would be rather pointlethth, thince Lithp hath been doing thith for far longer than motht of you were alive, and in a much more elegant fathhion.
It's a pity that there's no email address...
Poor man's free alternative to Tamino (Askemos) could have been mentioned.
1-GHz Pentium-III + Java + XSLT == 1-MHz 6502.
Oh man you really don't know. It's not just storing the data it's also querying, sorting, and re-arranging the data. It's really tough.
War is necrophilia.
Relational databases are based on the relational (i.e. row and column) data model.
If your data is not easily structured using a fixed relational schema (talk to some biotech people!), your're screwed performance-wise when using relational DBMS, because you have to join your n+1 normalized tables to get a result.
What if additional attributes come up frequently? Regularly modifying the schema for large relational databases is a pain.
The so-called semi-structured data model underlying XML is much more flexible, and there is plenty of opportunity to design databases that support it efficiently (I'm doing my PhD about this, look at our website and download the demo! It's a native XML DBMS prototype for Linux with a file system interface).
Of course relational databases are faster for relational data, but not all data is relational.
C-C
Pretty cool concept, It's a DBMS in the form of a simple header file in PHP that allows SQL queries on data stored in XML... The alpha version's available, and you have to register using a script that uses DBX itself- pretty cute.
The site is here.
Please make sure you read the SQL specification and the Xquery specification WELL before you say something like this
(i.e. that XMl queries will be translated into
relational queries).
It's plain false.
Best regards.
Dana