Simplify Apps Using XML With PHP and DB2

IBM slashdotted? by Anonymous Coward · 2005-11-15 21:11 · Score: 0

Looks like IBM's been slashdotted? Thats not very desirable from an IT company that is always banging on about scalability!

Re:IBM slashdotted? by BrynM · 2005-11-15 21:28 · Score: 3, Informative

Maybe it got pulled for some reason or something else happened... When I clicked it routed the request to www-128.ibm.com and all of the pictures loaded cleanly for Google's cached copy, so I'm betting they have a little capacity ready to go. After a minute of looking at Google's link to the real page, the problem seems to be the "?ca=dgr-lnxw01DB2xmlPHP" at the end of the posted link. Here's a workig link for the IBM copy. Here's the google cache if you want it too.

--
US Democracy:The best person for the job (among These pre-selected choices...)

So... by Crayon+Kid · 2005-11-15 21:46 · Score: 4, Informative

...they implemented "XML databases" by treating XML as BLOB's and adding XML parsing and updating capabilities to SQL. Hybrid indeed. I'd be rather tempted to call it a bastardization of both contepts (relational and XML).

The thought springs to mind that PHP, as it happens, would be uniquely suited to working with the hypotethical XML databases, due to it's rather particular concept of arrays. As many of us undoubtely know, a PHP array can be used to contain a tree of arbitrary shape and size, and with nodes of arbitrary types. A native, 1-on-1 match to an XML tree.

A couple of years ago I tried looking for a native XML database. The solutions were either very pricey, or very slow, or both. Nothing to keep up to RDBMS. The whole "XMLDBMS" hype died over eventually, as you may recall. OR hasn't it? Are such hybrid solutions all that's left of the concept? It was a great idea.

--
i ate crayons when i was a kid and now i have two braincells and the blue ones taste nicer

Re:So... by Anonymous Coward · 2005-11-15 22:19 · Score: 0

Go back and study the theory.. that's exactly how the relational model is defined. Each attribute is a value of a particular type (any type). In this example, the type is "XML document". Of course, instead of just letting us define out own types ("Customer", "JPEG", etc.), they dole out one new type (XML) and act like it's the second coming of Christ. Can you imagine a programming language where you had to wait for a new release just to get the XML data type? "Next year, maybe we'll get objects, if we're good!"

Here's something for you to think about: isn't an INT column just a BLOB with integer operators?

If you want to see what a "real" XML database would look like, by the way, just go back to the hierarchic, navigational databases of the 60's, which were abandoned because they weren't general or powerful enough. Same underlying model.
Re:So... by Crayon+Kid · 2005-11-15 23:58 · Score: 1

[...]that's exactly how the relational model is defined. Each attribute is a value of a particular type (any type). In this example, the type is "XML document".[...] isn't an INT column just a BLOB with integer operators?

When you put it that, way, yes, it makes more sense.

I was just enthused by the idea of a database engine that uses XQuery [b]instead[/b] of SQL, not just for a particular type of field. You can infer the implications.

One of them is that the result sets would be variable trees, which could either be placed in PHP arrays, or implemented as objects, with navigational member functions.

I think there's something to be said about the relative merits of storing data hierarchically as opposed to 2d tables. Just thinking about what a JOIN would mean in a tree space, for multiple trees, is very interesting.

--
i ate crayons when i was a kid and now i have two braincells and the blue ones taste nicer
Re:So... by hey! · 2005-11-16 00:13 · Score: 3, Insightful

...they implemented "XML databases" by treating XML as BLOB's and adding XML parsing and updating capabilities to SQL. Hybrid indeed. I'd be rather tempted to call it a bastardization of both contepts (relational and XML).

The problem lies not in our parsers, but in our application domains -- and ourselves.

It is trivial to represent flat data structures (or sets of tuples if you prefer) using XML. It isn't hard at all to represent hierarchical or even irregularly structured data using SQL. What's hard is doing any useful processing of data after you've chosen a format whose associated tools don't readily do what you need to do with that data.

It's hard to tighten nuts with a hammer or drive nails with a wrench.

A hybrid tool certainly has it's potential uses in an application that crosses domain boundaries. Such applications are inherently hard in any case; it's not as simple as choosing a wrench or a hammer, it takes expereince. But this is greatly complicated engineers who will invariably choose the wrong tool (often XML these days), who given a hybrid hammer/wrench tool will insist on always hitting things with the wrench.

To be sure there are borderline cases, but you need to solve the problem with the minimal number of parts.

--
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
Re:So... by ObsessiveMathsFreak · 2005-11-16 00:57 · Score: 1

A couple of years ago I tried looking for a native XML database. The solutions were either very pricey, or very slow, or both. Nothing to keep up to RDBMS. The whole "XMLDBMS" hype died over eventually, as you may recall. OR hasn't it? Are such hybrid solutions all that's left of the concept? It was a great idea.

It wasn't so much a great idea as an exceedingly obvious one which no-one has managed to pick up on.

In my humble opinion, XML is essentially crippled without some kind of query/search/XML-Database functionality. What's the point of storing data in a format if there is no clear way of searching though it?

XQuery should have been completed and implemented five years ago. Now we're just stuck with bastardised solutions like the one proposed in TFA. Right now, your best bet for an XML search/query engine is to hack something together with javascript using xmlhttprequest. I'm not joking.

I wish I was, but the FOSS community in paticular has dropped the ball on XQuery applications. What solutions exist are enormous database applications, designed for large applications with huge amounts of XML data. For my modest little applications where I want to use just a few XML files, I'm stuck with sub par parsers.

For small applications, the lack of search/query capability stands against using XML. For large applications, the volume of data stands against using XML. For medium applications... what the hell is a medium application? At least, what is a medium application that wasn't either a small application at one point, or will become a larger application in the future.

--
May the Maths Be with you!
Re:So... by VolciMaster · 2005-11-16 01:32 · Score: 1

The server-side RSS aggregator I use (magpierss) does exactly this. It reads the blog feed, which is effectively a teeny database, and then indexes into the associative array to find the different elements.

--
antipaucity
Re:So... by cow-orker · 2005-11-16 02:00 · Score: 1

the FOSS community in paticular has dropped the ball on XQuery applications.

And rightly so, because it's crap. Since XML has no regular structure, any XQuery involves a linear scan. Unless some sound theory behind XQuery is discovered and a way to index XML documents, XQuery is useless.

Berkeley DB XML claims to index XML documents. I tested it on a large document. A trivial query took a GB of memory without returning anything but an error. What's the point of this "technology"? It should have been given up on back in 1996.
Re:So... by baadger · 2005-11-16 03:07 · Score: 1

The thought springs to mind that PHP, as it happens, would be uniquely suited to working with the hypotethical XML databases, due to it's rather particular concept of arrays. As many of us undoubtely know, a PHP array can be used to contain a tree of arbitrary shape and size, and with nodes of arbitrary types. A native, 1-on-1 match to an XML tree.

Indeed this is trivial to acheive using PHP's XML functions (expat, not SimpleXML). However, since you have to parse the entire XML file in a linear manner anyway, it's better to use these functions to setup callbacks for the element open, close and filling events, which will save alot of memory over parsing big XML streams into an associative array and just extracting a few elements.
Re:So... by Anonymous Coward · 2005-11-16 04:24 · Score: 0

When you put it that, way, yes, it makes more sense.

That's not just a way to put it, that's the theory. Each type has 1) one or more possible textual representations (for instance, the XML string) and 2) a collection of operators that work with it. The RM allows any type at all, and relational (not SQL) databases are supposed to allow you to create your own arbitrary types.

I was just enthused by the idea of a database engine that uses XQuery [b]instead[/b] of SQL, not just for a particular type of field. You can infer the implications.

Syntax is irrelevant to the underlying model, and is a personal preference. However XQuery is a hierarchic, navigational query language with no support for data integrity, the fundamental purpose of a DBMS (and the various XML schemas are very limited in what kind of constraints they can specify). Instead of expressing your query as a desired result, and letting the DBMS infer the correct operations and path (based on your constraints) to return the data you want, you have to "do the work" yourself. Of course, SQL isn't much better, but at least it has some brains (natural join for instance). XQuery is definitely a step backwards, and the implications are the same as they are for any product not designed to be faithful to the relational model (it will come and go, leaving behind a trail of "time bombs": improperly designed databases tied to their applications).

One of them is that the result sets would be variable trees, which could either be placed in PHP arrays, or implemented as objects, with navigational member functions.

There's no reason trees and objects can't be part of a well-designed relational database (not SQL) since they are just subsets of what the RM can represent. "Sets" are not "trees" by the way, be careful.

I think there's something to be said about the relative merits of storing data hierarchically as opposed to 2d tables.

Relations are N-dimensional, where N is the number of attributes, how do you figure they are "2D"? Is this point in space 2D or 3D: (34, -32, 56)? It is obviously 3D .. the *picture* of it on your screen is 2D, but that's because your screen is 2D.

Just thinking about what a JOIN would mean in a tree space, for multiple trees, is very interesting.

Would it be more interesting than JOIN's between sets of values of arbitrary types, related by arbitrary true assertions? :-)

Your thinking has been clouded by years of working with SQL and reading the garbage from vendors!
Re:So... by Unordained · 2005-11-16 06:46 · Score: 1

The thought springs to mind that PHP, as it happens, would be uniquely suited to working with the hypotethical XML databases, due to its rather particular concept of arrays. As many of us undoubtely know, a PHP array can be used to contain a tree of arbitrary shape and size, and with nodes of arbitrary types. A native, 1-on-1 match to an XML tree.

Welcome to post-relational, multi-dimensional databases! The wave of the future! Purchase Caché today!

[Caché is the latest in a long line of M/Mumps 'global' database products, where arrays are nested in arrays recursively, with data strewn about the structure. Like PHP. Only with SQL, too. Yesterday's hierarchical database, today. I'm no fan. Others are.]

summary of article by Anonymous Coward · 2005-11-15 21:47 · Score: 0, Troll

You can summarize it as: "Apps are simpler if the DBMS directly supports the data types you're using."

Yeah, NO SHIT. You mean there's data out there that's not INT or CHAR?

Codd is rolling in his grave. 30+ years since he developed the relational model, and still nobody's bothered implementing it.

Re:summary of article by ObsessiveMathsFreak · 2005-11-16 03:30 · Score: 3, Interesting

Codd is rolling in his grave. 30+ years since he developed the relational model, and still nobody's bothered implementing it.

The database admins might argue that it's a moot point, given that relational databases are fairly close to the relational model. The concepts are at any rate, related to one another. Not quite isomorphic, but close enough for most purposes.

In fact, it could be argued that to fully implement the relational model would intoduce too much complexity to the fairly simple SQL sysetm, which has served many a programmer well over the years.

--
May the Maths Be with you!
Re:summary of article by eric2hill · 2005-11-16 03:47 · Score: 1

Codd is rolling in his grave. 30+ years since he developed the relational model, and still nobody's bothered implementing it.

Yet another post on this same topic by a troll.

What the fuck do you want implemented that todays RDBMS don't do? Give me a bulleted list of features that you want added or changed to existing implementations.

I've seen so much bitching and whining that "nobody makes a true relational database", but nobody states what the difference is between an existing database and a "true" database.

Fork over some named features or shut the fuck up.

--
LOAD "SIG",8,1
LOADING...
READY.
RUN
Re:summary of article by Anonymous Coward · 2005-11-16 04:53 · Score: 0

Fork over some named features or shut the fuck up.

Okay, even though you clearly don't know a damn thing about the relational model, let's indulge:

* algebraic syntax: SQL completely hides the underlying algebra. Can you imagine doing math without math operators? How you can you do relational algebra without relational operators?

* non-table-oriented storage: why do SQL databases insist on physically storing data grouped in tables? This makes joins expensive. It should be just as efficient to join columns from 20 tables as it is to use one table with 20 attributes. The designers of SQL clearly didn't understand the RM either.

* updateable views: the RM says base relations and derived relations (views) should be indistinguishable. Most SQL implementations don't allow for arbitrarily update-able views. How do you abstract or encapsulate in your database? How do you support two different apps that require two different schemas or column names? You can't, easily. Imagine a programming language without functions/subroutines/methods. Pretty useless, right?

* user-defined types: isn't it strange that programmers use something called an "ORM" that constantly assembles and disassembles composite types? Why can't I store the composite type right in the attribute, like the RM says I should?

Well that's enough for now, entire *books* have been written on this subject.

Developing applications with a properly designed Relational database would be *radically* different than it is today, it's so sad that only a few even know this, let alone *demand it from vendors*.

There is *one* relational database available: Dataphor. Unfortunately it's .NET-centric and very obscure. But people who have used it say it is truly amazing. You just define your data schema and the application can *derive* all the relationships between the various entities.

To put it in programmer terms, imagine being able to write unit tests (assertions), and have the program be written automatically from the tests!
Re:summary of article by eric2hill · 2005-11-16 07:06 · Score: 3, Informative
Wow, some actual bullet points. From an AC no less. And just one personal attack. I'm impressed.
- algebraic syntax: SQL completely hides the underlying algebra. Can you imagine doing math without math operators? How you can you do relational algebra without relational operators?
  While SQL does hide *some* of the underlying algebra, it exposes many of the operations such as UNION, INTERSECTion, subtraction (MINUS in Oracle), etc. Set division is probably the biggest missing keyword in SQL, but can be accomplished through a combination of SQL terms (JOIN ... WHERE ...), so in that sense, SQL fits the algebraic model.
- non-table-oriented storage: why do SQL databases insist on physically storing data grouped in tables? This makes joins expensive. It should be just as efficient to join columns from 20 tables as it is to use one table with 20 attributes. The designers of SQL clearly didn't understand the RM either.
  Tables were born out of the spreadsheet concepts where you have rows and columns of data. Tables do fit the relational model in that you have groups of attributes (columns) in a collection (rows). Tables are simply an easy concept for people to grasp and they work. If you don't like tables, you can use other techniques to store data. Look at Oracle's object extensions. Define an object then store those objects and query them. It's very natural to use and doesn't preclude either the relational model nor SQL syntax.
  Secondly, joins are not expensive when using indices properly. As a matter of fact, bimapped indices can perform joins (or unions or intersections etc) with *very* little CPU or disk overhead. Indices are a technology to keep speed up when performing operations on large data sets, something the "relational model" doesn't specify. You need to realize that the relational model is really a definition of "what" needs to be accomplished, not the "how" it gets accomplished.
- updateable views: the RM says base relations and derived relations (views) should be indistinguishable. Most SQL implementations don't allow for arbitrarily update-able views. How do you abstract or encapsulate in your database? How do you support two different apps that require two different schemas or column names? You can't, easily. Imagine a programming language without functions/subroutines/methods. Pretty useless, right?
  Just because most SQL implementations don't allow for updateable views doesn't mean the SQL standard is flawed. Oracle provides full support for updatable views where it can relate the update to a single row, and you can override that behavior when the update is ambiguous with a set of rules to process the update. SQL Server has the same features I believe.
- user-defined types: isn't it strange that programmers use something called an "ORM" that constantly assembles and disassembles composite types? Why can't I store the composite type right in the attribute, like the RM says I should?
  You absolutely can store composite types right in the attribute. There's nothing in the majority of SQL engines that stop you from doing that. The reason for the disassembly of composite types is twofold; First, you'll get better speed when doing *other* things with the data if the data is broken down into its simplest form, and Second, it takes less room on disk when storing properly normalized data than just BLOBing the whole mess to disk.
The Relational Model is this ideal that means different things to different people and there are some solid database engines that implement many of the concepts of the relational model, but saying that there's no database vendor out there that supports the relational model in any way is pure FUD.
I will have to check out Dataphor. It seems like a good idea, but I suspect it's plagued by many of the problems that have been solved in the RDBMS world such as scalability, recoverability, etc. Basically all th
--
LOAD "SIG",8,1
LOADING...
READY.
RUN

XML is not the answer. It's not even the question. by YA_Python_dev · 2005-11-15 23:14 · Score: 3, Interesting

XML is not the answer. It is not even the question. To paraphrase Jamie Zawinski on regular expressions, "Some people, when confronted with a problem, think "I know, I'll use XML." Now they have two problems."
-- Phillip J. Eby

Granted, he was talking about Python, not PHP... but still...

--
There's a hidden treasure in Python 3.x: __prepare__()

Re:So...use XPath by Anonymous Coward · 2005-11-16 02:01 · Score: 0

An XSLT Processor would do this for you.

Using the document() function access your various files.
And using XPath query/search in those files/structures.

XQuery and XPath in the XML world are functionally equivalent, just XQuery looks a bit like SQL.

Updating the XML files, now thats a different matter...

c'mon by pizza_milkshake · 2005-11-16 02:19 · Score: 0, Flamebait

xml and db2 tend to complicate things more than simplify them

BOGUS by phlamingo · 2005-11-16 05:23 · Score: 1

From the article:

Note: An assumption we are making for this scenario is that the business data is already in XML, even if the database might not have any XML capabilities.

Bogometer Pegged!

If they had made the opposite assumption (that the business data was not in XML format, a much more common scenario), then, the added complexity of translating the data to an XML format would have made the non-DB2 approach look cleaner and more reasonable.

--
I had forgotten how much cooler teenagers look when they are smoking. Oh, wait ...

Wow by freeplatypus · 2005-11-16 06:21 · Score: 1

Simplify, DB2 and XML in one sentence. This is odd.

Funny... by __aaclcg7560 · 2005-11-16 08:56 · Score: 2, Funny

I thought AJAX was the latest alphabet soup fad of the month. Just when you ordered the books for the newest fad, something newer comes out.

buzzwords by Anonymous Coward · 2005-11-16 09:25 · Score: 0

"setting up a PHP environment and integrating DB2 native XML functionality with PHP applications including web services written in PHP and XQuery."

Anyone else get an orgasm from all those buzzwords in the same sentence?

Slashdot Mirror

Simplify Apps Using XML With PHP and DB2

26 comments