XML and Transcoding - How Would You Do It?
morzel asks a doosy: "XML is one of these words everybody's talking about yet no-one really knows how to use it in specific applications or server technologies. At the Apache XML Project, some work is being done on integrating XML/XSL in the server itself, but personally I like IBM's idea of a transcoder in between a range of (XML) servers and a range of clients. But... how can it be done?" (More)
"Suppose you have to develop an on-line application, and you'd want to go with XML on the server side, and everyday browsers on the client side. Portable platforms like Palm and WAP-enabled phones will probably be a client platform that is being used frequently.
What tools -open source or commercial- are available to accomplish this?
The elements of the system are:
- XML Enabled Database system: Data is retrieved by the transcoder using HTTP or your favorite protocol
- Transcoding gateway: should translate the XML data using XSL (or another way) to a form readable by the client. The exact translation or the XSL to use can be set by the server (included in the XML source), or be detected by the gateway.
- Browsers of all colours and kinds.
XML is the wave of the future, that's for sure... But what tools are available to actually incorporate XML in a system that can do all things we poor webdesigners dream of?
All suggestions welcome! "
XML is a technology that was created by european socialists, thus it is por-open source and supportive of the communist GPL. IMHO this represents a great leap forward in our evolution as a social entity. The creators of XML need to hold to their principles and GPL XML. We need to go to the buildings where they work and repeat our glorious mantra to make the world good. We need to (as was put so nicely in a Microsoft Internet Eplorer ad) join hands and sing songs about rainbows and free software.
Lisp has been doing this stuff forever. Maybe it'd be a good idea to look into the formats that expert systems use to exchange data; I bet they're pretty generic.
:)
Of course, that won't happen, we'll all make our own stripped-down, human-readable versions, with big gaping flaws, until someone either standardizes it, or hides something nasty and binary with a GUI and dominates the market (*hint* I wonder who wants to use XML and "open standards"....) So let's try to come up with a real open format now, instead.
---
pb Reply or e-mail; don't vaguely moderate.
pb Reply or e-mail; don't vaguely moderate.
Well, this is kind of a shamless plug, but I'm developing a XML parser at http://mino.portaldesign.net. It is LGPL. The library can be used in any programs and the parser that comes with it can be used for converting XML files to HTML on-the-fly.
I'm working on XSL support (so people can easily say what XML tags should become in HTML), so that should be done in the (hopefully) near future. For now, feel free to download the latest alpha and play with it.
In the near future, I plan to have support for databases, CSS, XSL (as mentioned above), and a few other XML-related technologies.
People familiar with C/C++ should easily be able to write custom modules for converting from XML to HTML using the library by looking at the examples in xmlhandlers/. Anyone want to help develop this?
Probably one of the few truely great ideas in the Web development industry. It means freedom from client peculiarities --forget about all writing for all those different browsers again and again, just one huge translator template will (e.g. XML->Opera-compatible HTML, or IE-compatible HTML or AvantGO, etc). It means that potentially the same server can be serving not only PCs, laptops, PDAs and the like, but also other software, by reading plain XML, or some subset of it.
;-)...
In the OSS arena, the best example of XML on the server=>HTML (or for that matter anything else) on the client is Cocoon. I played around with Cocoon 1.x a little bit and it's very impressive architecturally, but even the principals agree that the performance isn't there yet. I am eagerly awaiting for Coccoon 2 though
engineers never lie; we just approximate the truth.
1) Don't feed the trolls.
2) All of us free software advocates are commies.
3) We're all gay.
4) We're highly contagious.
5) If you don't want to become a gay commie, DON'T POST HERE!
6) Your Anonymous status will not save you. The slashdot 31337 Gay Commie virus will find you and it will OWN you!!!
Now, what was the topic again?
One thing that I heard the wonderful-world of XML was supposed to allow was data on demand. A user clicks an XML/XSL defined element such as a button or piece of hypertext and the page updates without reloading.
This was the theory anyway...has anybody heard of such an implementation, or does anybody know if it is in a future spec?
One application (which is badly needed on the web, I think) is a dynamic collapsable tree. Imagine if you will a SlashDot comments page (not to hard, as you are looking at one!). Now, instead of getting a page-full of comments that take a healthy amount of time downloading (depending on your threshold settings): imagine clicking on a message to expand more comments in the thread which are fetched dynamically. You could resort, change moderation thresholds, and lots of other nifty dynamic operations without having the server do all the work.
-AP
I thought the return of Christ is to be a joyous time.... why must we be afraid of it. you're not a very good Johnathan Edwards.
"Dying tickles!" -- Ralph Wiggum
http://msdn.microsoft.com/xml
Ideally, browsers should develop to the point where they understand XML as well as HTML and XSL as well as CSS. There has been significant effort to do this in the Mozilla browser, the XML/CSS combo works quite well, and the person developing an XSLT (XSL Transformations) engine for Mozilla is talking about having something useful around May. Similarly, Internet Explorer 5.0 has a base understanding of XML (styled with CSS), and surely plugins for decent XML/XSL encoding for IE are likely to appear soon after Netscape shows that it's a feature people demand.
In the meantime, there are some Java Servlets out there to do the transformation on the server side. The server will grab the XML and XSL file, do transformations, and output HTML (or whatever format) to the client. I haven't played with them enough to recommend one as being particularly better, but there's some handy stuff out there.
----
----
Open mind, insert foot.
The reason we use XML in our multi-tier solution is simple. ADO cannot support detached, hierachical record sets.
;)
In our case, this meant we had to find a way to store that hierachical information, which is vital to the front end, in an intermediate format that did not put load on the database itself.
The reason for that, of course, is that when you're running a distributed application to potentially thousands of clients, you want any database hit to be as few, fast and clean as possible.
That means we can't sustain connections to the DB.
That means we have to use disconnected record sets.
Disconnected recordsets don't hold hierachy information, and that means that we have find some other way of hitting the database once, getting enough data to build the hierachy externally, then shutting down the DB link.
XML provides the functionality we need to parse a flat recordset back up to a hierachical structure, without hitting the database again. It also has the added bonus that when it comes to presenting the front end in a browser, we can feed it directly to the browser if it's "XML compliant" (IE5, though there is a patch for IE4).
B.
PS: You'll also find that XSL can do similar things to your XML as CSS does to HTML
The widget order fulfilment organization has a server that speaks XML over HTTP. We created a widget on our server to talk XML over HTTP to it. Instead of spending weeks to work out how to communicate with some proprietery server in proietary format we spent a few days interfacing our servers.
XML = server to server / business to business killer technology
The consumer may someday directly use XML but I don't see that coming soon on a broad scale. HTML (with Java, Javascript, CSS, etc.) will (IMHO) be the way consumers work the web for the near future.
Of course, I could be wrong.
The XML FAQ is here.
Linux is Windows Manager's Product of the Year and
Distribute.net Cracks CSC
The XML part of IBM's transcoding scheme and the planned developments at xml.apache.org are already present in the ExterXML Server from XMLSolutions (www.xmls.com). It uses cocoon (you don't have to wait for Cocoon 2, version 1.5 is pretty fast). You can specify in your document which XSL stylesheet to use for each browser. The IBM Transcoding stuff looks interesting for HTML, but for XML the transcoding solution from IBM is basically XSL.
Looking at any non-trivial XSL stylesheets, you can see what a generally bad idea it is.
My advice would be to use a real programming language with DOM bindings.
XML.com has a good article regarding XSL:XSL considered hamrful.
Note that XML.com also has some pro-XSL articles listed, but they aren't nearly as persuasive.
The bottom line is that the W3 "ordained" XSL to be part of the grand scheme of things, although the technology hasn't been developed in response to any particular problem.
The question is presented in a somewhat muddled manner, but if I understand correctly, it has to do with converting from XML to various formats. For the record, I don't think this is really an issue of converting from XML (which is relatively easy, given good DTDs and [for human eyes] XSL). The beauty of XML and XSL is that it's supposed to separate the *data* from the *presentation* of the data (unlike this mess we call HTML).
So then, if you intend to use XML to store the data, and XSL to format it, the only part of the equation left is determining which stylesheet applies to which requesting client. I have no experience with XSL (I use XML for machine data, not for human eyes) -- is it possible to determine in the document which stylesheet to use? If so, it's just a matter of writing all the stylesheets.
Of course, this all depends on everyone understanding XML and XSL. If people insist on using legacy clients (like non-XML compliant web-browsers *cough*Netscape*cough*), then is a need for "transcoders" to do the XML/XSL interpretation and spit out HTML (|| HDML || whatever) that works with that client.
P.S. If you want applications of XML, look in the b2b e-commerce world. I'll avoid the direct plug and not name the company I work for, but the whole industry is based on XML.
-- I for one do not. BizTalk (MS's XML data schema (or framework for you non-RDBMS implementers)) is just an attempt to corral another 'information-space'. However, they know that they missed the boat with HTML and no-one wanted to implement ActiveX. So MS (IMHO) is trying to claim this new language as their own by trying to be the ones with the biggest and most useful 'dictionary' of often-used 'words' (BizTalk). For instance, if they could model all current business transactions in XML then what is to stop our nightmares of all business - business e-commerce being done using MS's version of XML? This is their plan anyway. Not to say it will come about because schemas are complicated (all the useful ones anyway). But I for one am going to keep a close eye on MS meanwhile.
For a data transfer layer between fully automated intelligent agent systems distrubted world wide over high end computing clusters..
just in case you handt heard...
-- You summarise the readability advantages of XML very well. BUT XML will simply not work (easily) if you have a multiple-user, record-locking, concurrency-handling, read-WRITE database. All the best database apps I know allow you to update data which you have already read and /or insert new data. This could get VERY messy with XML unless you hab a RDBMS at the backend that implemented Transactions and an efficient locking scheme (it would certainly have to scale to the moon). Moreover, because there could be an appreciable delay between updates, this RDBMS would need to store a lot of 'before' images i.e. copies of the data used to created XML resultsets. This rules out all but the beefiest RDBMSes. Thus to implement read-write XML based interfaces to a database, you are going to have to splash out on a serious RDBMS. From my experience, Oracle 8 could handle the transactions (although 8i might be better with the native JVM). Would MS SQL Server be scalable (vis a vi record locks)? Maybe not. P.S. you can forget about MySQL.
Believe it or not, the open-source bug has biten M$ !
Look into M$'s sponsorship of the Schools Interoperability Framework (www.schoolsinterop.org) and maybe you can see how M$ plans to use XML (and its derivative) in real world application.
Muchas Gracias, Señor Edward Snowden !
The key insight into XML is that it should be used only where other solutions fall apart. XML is one of those technologies that is so general, so abstract, and so powerful that you can construct a solution for ANY problem.
The downside is that the solution will involve extra processing steps, extra stuff to be implemented, and impose on you a development model that might not always be convenient (not everything wants to be a document, or a conversion or transcoding between document formats).
However, there are many cases where XML is the only viable solution, and in those cases you're just glat you can solve the problem at all! A typical example is when you have documents coming from multiple sources, and you publish them to multiple targets. It's easy to see what the XML solution would look like--but the problem doesn't even fit into the other ways of doing things.
With WebMacro a common implementation strategy is to drop key XML objects into a template that is otherwise created through ordinary WebMacro HTML template gunk.
The advantage of this approach is that you can create the bread-and-butter stuff like shopping carts, authentication, login/logout, using ordinary Java servlet code and templates. (These things are nasty when you try and force them into a document model).
Then in the middle of your page somewhere you have your XML document, rendered using XSLT or something. You have other targets, besides your servlet, where you publish that same XML document, so the whole thing winds up being a rather pleasant mixture of two different programming paradigms.
Again, the key insight in this strategy is that you use XML for the parts of your problem where it is the only viable solution--and you do everything else the normal way (without the extra costs imposed by XML, since you don't need the extra power).
I worked in an SGML shop for a couple of years, and became smitten with SGML/XML. I set out to do absolutely everything I could in SGML/XML for awhile, before realizing that a traditional template tool (like WebMacro) was far more useful for typical bread and butter servlet programming.
I still use XML a lot, but now I use it intelligently, where it's needed!
--- food for thought ---
XML is very useful in the B2B environment when one company needs to share data with another.
There is a company called WebMethods that produces a server and software that handle the actual translation and mapping between Company A (your company) and Company B (say, Commerce One, Ariba, Concur, etc.).
The box is situated at your location and actively handles all data coming down the wire from Company B that needs to be translated and entered into your database. Very cool piece of hardware/software.
I am working on a project which may accomplish what most of this discusses. I am looking for people to help with specific implementation issues such as setting up autoconf, server programming etc. More info can be found at XMLTP.Org. Cheers, Gavin
Hiya. I'm one of the authors on the cocoon project and I admit my biases upfront. I think, and many of you seem to agree, that the web publishing industry (more generally, the electronic information publishing industry) is in desperate need of a standard way of seperating (and mixing) content and design. XML (a generic tree description language) and XSLT (a generic tree merging and transformation language) offer a very elegant way of accomlishing that goal. The cocoon project is currently focused mainly on two goals: creating (and implementing) a standard way to create XML fragments dynamically, and determining (and implementing) the best way to maintain a site back-ended by XML and XSLT. I encourage brave developers to come check it out - the basic stuff (XML+XSLT -> HTML) works very well, the more elaborate stuff (SQL,LDAP,POP3 -> XML+XSLT -> HTML) is coming along very well, and we're playing with a very interesting take on the whole *SP paradigm called XSP - I was personally highly skeptical at first but am beginning to see the light.
As far as IBM's product goes - once you drill down into the technical details, it looks very much like cocoon. Interestingly enough, some of the closed source components that IBM's product relies on were donated a few months back to jump start the xml.apache.org site (namely, the XML4J parser and the Lotus XSLT processor). The main thing that IBM seems to be offering here is its 'transcoder' technology - which may be interesting and certainly bears investigation, but for my money, you're better off checking out (and having a voice in the development of) the open source apache projects.
Hey all,
Anyone who says "XML is one of these words everybody's talking about yet no-one really knows how to use it in specific applications or server technologies." has probably not noticed the whirlwind of activity (including many bona-fide commercial ventures) surrounding XML.
Hundreds of site today buy syndicated news from central sources (iSyndicate.com and newsreal are two that come to mind) and receive their news feeds via XML. Also, check out webmethods.com -- here's a phenominally successful company whose entire business model is based on XML-enabling businesses.
xml rocks. every piece of online information should be in xml. usability on the web is horrible right now. the fact that search engines and yahoo-style directories are the main entrances to the web is horrific. the fact that google can't find me a single page on gkrellm (a kick-ass system monitor for linux) pisses me off to no end when i'm bored with my current skin. with everything in xml the extraction of data would be much simpler and therefore the interfaces to the web would be much more effective.
the current problem is that
i'm working on a solution and need help...so it's actually pretty smooth that this article came out in ./ at this point.
in a huge blow to problems #1 and #2 above (as well as quite a few others), i am initiating the creation of Uberbia, the most open source of web sites. the backend is zope, which is a tres cool open source web application environment which can conveniently output its internal data as xml. what this allows is for information to be created in zope and stored in zope's native db format and served up as web pages (for instance) quickly, but then also output as xml. problem #2 solved. and when browsers can handle the xml...shove it out that way.
zope also allows for information to be very easily created and shared. this is one of the main goals of Uberbia.
the idea for Uberbia was born out of the fact that the Open Source community has been living in an environment of relatively closed content management on the internet. Sure, one could create a web page and post a HOWTO they just wrote. And then post a message to a relevant mailing list letting everyone know that resource is available. And then submit the HOWTO to the LDP and wait for it to be approved and posted on the LDP page. Uberbia will remove a lot of this hassle and allow the Open Source community to easily create and manage it's content. and the data will go into an xml-aware application. problem #1 solved, at least for the Open Source community. well, okay...so i'm still workin' on it, but it'll get solved, dammit.
on trying to figure out what i was talking about, Ethan (a friend and to-be-developer of Uberbia) wrote:
sounds to me like you want to build an open-content information space. am I totally off-base? Bring "source" up to the next level of abstraction? Collaborative environments of information?
yup. he gets it. but the possibilities that arise from having such a body of contributors and open content in xml are insane. for example, imagine turning on a "newbie" feature in Uberbia that automagically inserted links to the proper entry in the jargon file for every word that was defined there. not difficult with zope and the data in xml
so, essentially i'm responding to this ask slashdot question by calling out for help with an open source project that wants to solve this problem and others. some work has been done, but there's a lot more to do. sourceforge is graciously both hosting the development of this and hosting the project itself. if you are interested at all in the development of something like this or have some really smooth-ass ideas, let me know or join the mailing list.
i hope some of that made sense.
word, Uberdog
It isn't too bad, either.
If no XSL stylesheet is applied then it displays the XML document using a "TreeView" default style sheet.
Also, because the XML parser & XSL thing is COM based you can use it in any language that supports COM - like Javascript/VBScript/ASP. I hate to be a MS lover, but unless you go to Java there isn't much that can do it better than that.
The new XML parser that comes with Win2000 is supposed to be 5 times faster, too. See MSDN.
As far as I know there is no support in IE5 for XML+CSS. I may be wrong, there, though.
You couldn't do it with HTML, either, could you?
Any server that uses stateful connections like that is going to have to be big & powerful.
I think you're not looking at the problem the right way. Typical applcation development breaks things up into domains. These layers usually include a persistence domain (your database), a business logic domain, an application domain, and and a presentation domain.
XML really doesn't change any of the domains EXCEPT the presentation domain. You don't need an XML enabled DB, as you NEVER want to have the outside world talking directly to your DB. XML (combined with HTTP or whatever else) is one way of presenting your application. The various transforms that you would do using XSL are just "aspects" of the same presentation. So this doesn't completely change the way you build applications, just how you do your presentation.
I've written more than a few apps that were available both as GUI applications and web servers. Both versions shared the same code base up until the last layer.
As far what you need to do an XML system, I think it's a lot like an existing HTML system. With HTML, you need a database server, an app server, and a web server for an HTML system. The web server is normally scripting enabled so you can do handy transforms with the raw data.
With XML, it's basically the same concept, except your "XML server" needs to be using XSL to script transforms of the XML data. What we currently don't have is a very good way of doing this. Ideally you'd actually want the CLIENT to do the transforms as the XML data is usually much terser than whatever the XSL will generate. However, nobody trusts the clients to do this, so you might as well go with the XSL engine on the server.
sigs are a waste of space
There are many tools available to build such a system.
To mention only Open Source projects, I could suggest using Apache JSERV with Apache Cocoon as a framework, Castor or Quick to bind XML data to Java objects and a OODBMS like ozone or a RDBMS like PostgreSQL.
These are my favorites ;)
They are very powerful and highly flexible, but the price to pay is that they are rather complex to use, that you need time to get on speed with them and that you loose focus on the core techniques behind them.
To try to get a good understanding of these core techniques, I have set up some simple examples showing how one can bind XML documents into java objects, store these objects in a OODBMS and use them in a XSLT sheet both in standand alone mode or as a servlet.
These examples are available on our web at http://downloads.dyomedea.com/java/ and a mailing list has been created to exchange and discuss such basic tips.
Hope this helps.
Eric van der Vlist
i wouldn't say that webmethods is phenominally successful...check out this item on them filing for IPO on FoRK (think of fork as slashdot for non-trolls). they're another company with shaky financials and a story. anyway point taken about xml being used fairly often now.
-- your knees hurt, don't they?
You might like to check out this page. One of the things they have is an interpreter (X-Tract) that reads a template (written in XML!) and performs pretty much arbitrary transformations on XML input data based on this template. Looks pretty cool and simple to use. X-Tract is free for download. Funny I didn't find any info on license terms though.
I tried doing some very simple stuff with the Linux version, and the only complaints I have are:
You should take a look at MetaHTML (www.metahtml.com), which is a sort of macro
like programming designed to emit HTML (it
was developed before XML was invented). It
was developed by Brian Fox and myself when
we had a company called Universal Access (ua.com). MetaHTML
is superior in some ways to XSL, because it is
more a general purpose programming language, yet
it's evaluator does a lot of the work of parsing
XML syntax expressions. We used to use it
to do many XML-ish things, such a generate the
MetaHTML documentation automatically from a
structured representation in the database.
MetaHTML has also been under GNU public license since about 1996.
As someone mentioned earlier, XSLT can often be a real pain to work with, owing to its insistence on being "side-effect free" (so variables aren't really variable, for a start) and its declarative syntax. An alternative which still has the advantage of being written in XML is "XML Script": XML Script homepage (Note that the version on the site is XML Script 1.0 - v1.1 will be out sometime this week, we reckon)
Who is Jonathan Edwards? And please change your sig; in the long list of things that bug me ripping off demotivators ranks in at number 12.
- Nobody knows what XML is, and are just trying to be cool by saying that their product uses it or that they know how to use it.
- XML in fact does not exist. If this were the case, the first supposition could and most likely is still true.
This disturbs me greatly, because the XML people hyped it up to the point where this disturbs me greatly.Thats funny... but my moderato-scope says bad moderator weather. Either they're all scared shitless to moderate that up or have sticks the size of the Empire state Building shoved up their posteriors.
On our project, we have written XSL to transform our XML data into binary outputs. The stylesheets used ran into tens of thousands of lines!
This is supposed to be good? Something is horribly broken. Perhaps a different tool would be more appropriate? How about a parser generator? (see Jikes)
Life's a bitch but somebody's gotta do it.
I work for AssureSoft whose AssureWeb website is live (work out the URL for yourself, it's not obscure but we don't want to be slashdotted). The site provides financial information to subscribers. You have to have a username and password to get the full range of services- we dole out passwords free to British independent financial advisors.
Our first XML-based service is a quotations system which allows users to get a quote for a pension or mortgage from a wide range of companies in real time (typically 5-20 secs).
Why we needed XML
Our problem was that each company had a slightly different way of asking for customer details. We decided to create an XML data type definition, now adpoted as industry standard by UK financial standards body Origo. This standard means that we can present pretty much the same input form, with a few optional extras, for any financial product.
The main use of XML is in passing the input data from our web server to the companies' quotes servers.
Layer 1: Client Browser
Layer 2: AssureWeb server
Layer 3: Company Quotes server
The XML goes back and forth between layers 2 and 3. We compile standard CGI GET/POST client requests into XML on the webserver and fire them at the quotes server. The quotes server fires back a response as XML again, and we parse this and present it to the client as a standard HTML web page. There is no XML on the client side.
Provided the company quotes server conforms to our XML standard, we can use that server for quotes. Adding new products or companies becomes a lot easier- typically we can go from scratch to beta with a new product within days. Previously it would have taken many months to write and test each individual product. XML allows us to re-use both code and input/output standards to a level never seen before.
Our next step will be a comparative quotes service. Users will be able to enter one set of data, and fire it at multiple companies. They will then get back multiple quotations, from which they can select the best based on their criteria. Effectively we will be having multiple concurrent layer 3 transactions.
--
Andrew Oakley - www.aoakley.com
I work for AssureSoft whose AssureWeb website is live (work out the URL for yourself, it's not obscure but we don't want to be slashdotted). The site provides financial information to subscribers. You have to have a username and password to get the full range of services- we dole out passwords free to British independent financial advisors.
Our first XML-based service is a quotations system which allows users to get a quote for a pension or mortgage from a wide range of companies in real time (typically 5-20 secs).
Why we needed XML
Our problem was that each company had a slightly different way of asking for customer details. We decided to create an XML data type definition, now adpoted as industry standard by UK financial standards body Origo. This standard means that we can present pretty much the same input form, with a few optional extras, for any financial product.
The main use of XML is in passing the input data from our web server to the companies' quotes servers.
Layer 1: Client Browser
Layer 2: AssureWeb server
Layer 3: Company Quotes server
The XML goes back and forth between layers 2 and 3. We compile standard CGI GET/POST client requests into XML on the webserver and fire them at the quotes server. The quotes server fires back a response as XML again, and we parse this and present it to the client as a standard HTML web page. There is no XML on the client side.
Provided the company quotes server conforms to our XML standard, we can use that server for quotes. Adding new products or companies becomes a lot easier- typically we can go from scratch to beta with a new product within days. Previously it would have taken many months to write and test each individual product. XML allows us to re-use both code and input/output standards to a level never seen before.
Our next step will be a comparative quotes service. Users will be able to enter one set of data, and fire it at multiple companies. They will then get back multiple quotations, from which they can select the best based on their criteria. Effectively we will be having multiple concurrent layer 3 transactions.
--
Andrew Oakley - www.aoakley.com
I've been wrestling with some internal docs at a client site. How could we transmit the internal data using standard doc types, when I bumped into the following (at learned how wrong I was, for my case in particular).
/ meaning.html
http://www-4.ibm.com/software/developer/library
Joe
Joe Batt Solid Design
IE5 XSLT is very different from the W3C recommendation. It is a partial implementation of a 1998 working draft.
Do not assume this to be a case of embrace & extend. Microsoft just implemented XSL before the spec was finalised. They say they will bring out a compliant version soon.
${YEAR+1} is going to be the year of Linux on the desktop!
There are two parts to XSL.
XSL Transformations:
Transforms any XML document type into another. This can include HTML if it is well formed e.g. XHTML. In reality, it really is not just for "Stylesheets" but can also be used for data to data transformation. The W3C have published a recommendation (their version of a standard) and there are many implementations.
XSL Formatting Objects.
Formats XML for print or screen display. Powerful, complex typesetting-style system, you could use the analogy "PDF/Postscript for XML". Not a standard yet, and only one partial implementation of an old working draft (FOP).
A lot of the guys criticism in that article refers to the second part of XSL, which is not what people are using, or referring to when they discuss XSL here.
I don't find the guys article that persuasive, it is full of assertions, without proving them. Most of the guys gripes are directed towards formatting objects, which is complex, but the momentum behing XSL relates to XSL Transformations.
${YEAR+1} is going to be the year of Linux on the desktop!
By the way you're putting the problem, it seems that XSLT is the answer for your questions.
The Apache XML project has a XSLT processor called Xalan that can take care of much of that part (I haven't tested any other XSL processors yet). Just link your XML document / DOM Tree to a style sheet and you have a transformed document to the format you like.
The only reason I see that this is needed is because nowadays only IE 5 and Mozilla can work natively with XML files and linked Style Sheets (and that locks you to CSS for Mozilla), so if you plan to use XML with any other device, AFAIK, you will have to use some kind of tranformation processor. It can be used to tranform a XML doc to another XML doc, but that escapes from the presentation field.
Just take a look at their page and make some tests. They're pretty nice tools, and quite easy to work with.
--
Marcelo Vanzin
Marcelo Vanzin
I would just like to say that learning curve of XSL is due to grasping the concepts of how to use it, rather than the language being crypticly designed.
You create templates to match the different kind of elements, and work your way down the tree of the document. This approach allows the stylesheet to work with documents which different numbers of elements, or slightly different structure. Some problems are solved with recursion.
You can do a simple approach where you have a fixed structure document, and insert values from the XML at certain points. This works for a lot of problems.
The main problem I had when learning XSL is study material. The specifications don't function as a tuturial. I recommend http://metalab.unc.edu/xml/b ooks/bible/updates/14.html. It is a version of the chapter on XSL from the XML Bible, updated for the W3C recommendation. I wish I had found it sooner (I have the book, by the way, very good).
${YEAR+1} is going to be the year of Linux on the desktop!
People have discussed the database connection. I came across this article at Javaworld (via the Javalobby site). It describes a different way of using XML with databases.
.com/javaworld/jw-01-2000/jw-01-dbxml.html
Instead of converting the entire database to an XML file, which consumes a lot of resources, and has synchronization issues, this approach places an XML API frontend on the JDBC system. This creates a "virtual" XML document that other XML tools can access via DOM or SAX.
For example, they create a SAX frontend for JDBC, and use it with a SAX-based XSL tool (XT) to transform the data to HTML. So, for example, where the database encounters a column for CustomerName, the template for a CustomerName entity in the XSL sheet is triggered. To the XSL tool and stylesheets it seems as if they are accessing an XML document.
http://www.javaworld
${YEAR+1} is going to be the year of Linux on the desktop!
A small warning for those thinking about moving down the XML/XSL route who haven't done any testing on it:
Its slow. VERY slow.
Most XSL implementations have significant performance and scalability issues as compared to more common custom technology for producing dynamic web pages.
There's no argument that its a better technology, but I've known several commercial web sites that have spent considerable resources developing XML/XSL implementations and having to roll back the technology when they discovered they needed four or five times the number of servers to be able to use it.
Anyone know of any top-tier sites that are actually using the technology?
If someone wants that they can either use NNTP or develop it using the current mod_perl + HTML route - there's no need for XML there.
XML should be used where its appropriate. I'm unconvinced that client-side transformations are the right thing.
Matt. Want XML + Apache + Stylesheets? Get AxKit.
One approach might be to treat DTDs similarly to interface definitions (as in IDL) and keep them in repositories by ORB-like intermediaries. XML documents are, after all, just instances of a particular DTD.
:-)
This has the advantage of reusing existing (ORB) technology for new purposes, and fits into an existing ideology that many already understand.
You would put a client of one of these XML ORBs into Apache or your browser client, and be able to exchange documents and DTDs freely just as with code objects and traditional ORBs.
Or so I would hope.
--binkley
Here at imediation we developped an XML based web application, and we find very difficult to do a _good_ DB mapping. Basically you can easily find a JDBC/XML mapping to translate a table into XML but when it comes to more sophisticated mapping, we could not see anything good. For instance expressing entire query in "pure" xml (ie without embedeed sql statement in xml doc) with joins, expressing insert/update query, whatever. So we developped our own but this is not efficient, as this is not standard. Any pointer/ideas about generic/powerful XML/SQL mapping, or incoming standard about that ?
--- Jerome Bonnet
I don't mean printing the XML itself, but using XML to fit data into a template which is then printed?
I'm thinking of something along the lines of Formscape which can format data for invoices and purchase orders and such.
Deleted
Mind if I ask why you're doing this? XML parsers are off-the-shelf free commodity tools now.
Spend your time working with those tools (XML4C, expat, rxp to name a few) to create higher level tools. Don't re-implement an XML parser - I can guarantee you it will be full of obscure bugs where you didn't understand the spec, didn't understand how to cope with character encodings, or just did something wrong. This stuff, despite the XML spec suggesting that a graduate could write a parser in a matter of weeks, is hard, and experienced people (such as James Clark) have put out excellent products for all to use under non-restrictive licences. Theres even an LGPL parser already out there called libxml (ships with gnome).
If you don't believe you'll create a broken parser, see the recent XML conformance tests on XML.com.
I'd also love to see you move from a non-working XML parser to something supporting XSL "in the near future". I appreciate your enthusiasm, but the XPath spec has some tough little nuts to crack (I know - I'm cracking them right now) and then implementing XSLT from an 80-odd page spec - wow - good luck to you!
(I'm not trying to poo-poo your project, but so many people start working on stuff that's already being worked on in the open-source community that it's just wasted effort).
Matt. Want XML + Apache + Stylesheets? Get AxKit.
I do a lot of work with these up and coming ecommerce companies, all of whom say they "do" XML. It is the most popular interface, in the products they are developing, to have what I commonly refer to as a "repository model" where systems spit data into your XML-enabled system in whatever format you want (EDI proprietary formats, regular HTTP, ACSII etc) and you play around with it in XML (a lot better for data manipulation and content mangement purposes) and then spit it back into whatever flavor your suppliers/customers want it in. This is what I am getting a sense for in the IBM model. This is not so much a "new" way of doing things but increasingly the standard.
Fact is, XML is great for data interchange, plugging large ammounts of standard infomration into standard forms (PO's, RFQs and other business docs) as well as putting some muscle into search engines via context based searching (via XML metadata) but there are way too many standards out there.
- BizTalk - This is the standard, open nonetheless, that MSFT is developing to standardize XML. It is an open standard, but the obvious benefit to MSFT is that they can plug Biztalk functionality right into all of their product lines for interoperability across a platform.
- OASIS's XML.org - OASIS, a non-affiliated standards body, much like W3C, set out to develop a standardized set of XML schemas and DTDs (document type definitions) however, MSFT beat them to the punch by launching their BizTalk site a day before OASIS, ahhh Microsoft, finds a way to compete even in open standards.
- RosettaNet - These guys set out to "map" all common business processes and to make an open standard for XML in the business world, but, alas, mapping entire processes takes a long time, a lot of notaeriety here, not as much substance.
These are just a few examples, there are others, but, my guess is that you'll hear the most about these folks. To make things even more complicated although these guys seem to be "competing" they are almost all members of each others' groups, in a sort of "coopition" model. So, overall, it is no wonder why the big push is for standards repositories, and related transaltion to an from various formats.
That's my $.02
"I'm disrespectful to dirt! Can you see I am serious?"
On performance, I really matters what kind of parser you use. There are two standard parser interfaces:
- SAX (an event driven interface) and
- DOM, the good old document object model that is tree based.
Both XSL (XSLT + XSL FO) and DOM look at an XML document as a tree to be manipulated appropriately, while SAX treats the document as a stream of tags to be managed by handlers. DOM is powerful, subtle, and (in many cases) slow. If you build an application around a DOM centered parser (and most are), you may have performance issues. YMMV, as always. SAX is not as powerful, you have to code more, but it is faster. More than one project in the B2B area that started with a DOM parser is looking now at SAX. There is nothing wrong with DOM based parsers and we use them a lot - but watch out for performance.There has been a lot of argument this year over whether or not to use XSL to style XML documents. I think the jury is still out on this -- at least as far as pure display style is concerned. (There are a lot of CSS loyalists out there as well.) But XSLT as a transformation language for XML is a real winner. One of the reasons is simple but profound -- XSLT is XML and is parseable and transformable just like any other XML document. You can create a stylesheet by using another specialized XSLT sheet to transform an XML or XSL document into the stylesheet you want. This can be very powerful, but difficult to debug.
Finally, I am surprised that nobody on this site has mentioned the expat (stream based) parser by James Clark that is an almost standard part of the modules for Perl5. I am learning Perl using the ActiveState port on NT and am having a whale (camel?) of a time, and the expat parser is clean and fast and fun.
Oh, and one final note -- while there are some really useful books on XML, I suggest you keep to the basic reference type (Neil Bradley's The XML Companion is next to me on my desk right now, and there is a second edition out) and use the net as your basic resource, especially lists like XML-DEV. Things are moving way to fast.
SGML or XML would seem to be perfect for an open source word processor. One of the biggest obstacles of exchanging information in business is the many proprietary document formats. It would seem that if such a program could become the standard (I know that's a big if), it could be a potential killer app for linux in the business world. Especially if it came out on linux first. But even if it didn't, the linux version could be free whereas a windows version would most likely be proprietary. And I would place far more trust in an open source application complying with standards than I would one which is closed.
I know word processing isn't fun or sexy, but its an extremely important part of computing and should receive more attention than it has.
Check out AbiWord.
This question is what the people on the Apache XML project spend more or less all their time not just talking about but building stuff. If you care, join up.
Having said that, XSLT may be magic, but "old-fashioned" solutions like PHP and Zope and plain old perl-backed CGIs (perl includes an excellent XML parser) ain't going away anytime soon.
XML is the wave of the future, that's for sure... But what tools are available to actually incorporate XML in a system that can do all things we poor webdesigners dream of?
What tools are available? Apache's Cocoon is probably the most advanced XML-based web publishing tool available today
All of this seems to refer to Cocoon by name. Database access is possible using the SQL processor as well as the new XSP (the extensible server pages).
XML is one of these words everybody's talking about yet no-one really knows how to use it in specific applications or server technologies
I disagree. Check out the W3C's SVG standard. This is for real.
If you've ever had to muck about with all of the different proprietary flavors of vector graphics formats, you know what a great thing this will be.
That said, I personally *don't* believe in across-the-board XML standardization panacea. Some things deserve standardization, others don't.
Accountants all adhere to accepted standard accounting practices. This is what makes it possible to encapsulate their work into shrink-wrapped database products that pretty much any accountant can use. But this only works because the process is so well known.
So I disagree vehemently that business-to-business transactions, for example, are ripe for XML standardization. Why? Because who the heck is such an expert on these kinds of transactions to be telling everyone else how to do it? There's a lot of trial-and-error to go through before anyone should start proposing standards.
And remember: "You can't vote for anarchy". ;~)
--Lawrence Lessig for Congress!
-- err on second thoughts no they aren't. Unfetterd capitalisim tends to monopolies in the long run (especially state-granted monopolies on intellectual property which have postitive network effects all there own). Ergo capitalism is not always such a good thing. Tim Berners Lee invented HTML as a way of information interchange while he was at C.E.R.N. Now remind me where would we all be without the web???????????
-- thats why XML and the ability to send 'packets' of data, all wrapped in a hierarchical form could be quite useful. However, the next logical step is surely to then write-back the data. Now we come back to RDBMSes. You have to figure out a way of removing the statelessness of all HTTP communications. My point was that this might only possible with RDBMSes which in effect take a snapshot of a resultset at the moment you read it; keep the snapshot (using Rollback segments in the case of Oracle) and then allow updates / deletes etc to be carried out when you send some XML back in. To sum-up, I think that you can't fully realise the data-sharing power of XML until you figure out how data can be read and more importantly updated across the web (and XML is the only candidate at the moment as a 'transfer agent').
-- see title.
My reason for going on about multi-user, record locking databases is this :- Assume you build a good web site, nice and fast and so on, used by many people. I would suspect that as, per the old adage 'No good deed goes unpunished', your boss would then ask you to build a more interactive site.
Then you realise to your horror that XML doesn't really help at all when it comes time to trying to re-mesh updated/changed XML 'data bursts' back in to the main DB.
Another thing that just occurred to me - Surely the queries needed to get the hierarchical data have to be expressed in SQL. If so, surely the cost in terms of logical/physical reads (i.e. the cost to the server of doing the queries) will be the same whether you do them all at once, to build your XML 'data burst' or whether you run them just as the user requests them.
In Oracle you can keep open connections to the server at all times (and even pre-start some at DB startup) i.e. the connection latency is very small. I think SQL Server would have to be configured to pool connections in some way. Does SQL Server 7 let you do this? Does MTS let you do this? I'm not sure.
BTW what are your feelings on MS having to delay the In-Memory Database and COM+ (component) load balancing. As I remember they had to drop them from basic W2000 Server and have said you'll get them in the W2000 Datacenter edition. It might be that without these features your DCOM and MTS architecture might run out of steam. (You might even have to tell your boss to splash out on W2000 datacenter edition as well!).
Just some thoughts.
Does XSL encourage reuse through its syntax? No
Does XSL base its constructs on proven language design ideas picked up in the last twenty years? No
I have no idea why people are so ga-ga over a language that predates Algol-6x in its design.
For someone who uses a language like Python or Java, I can't imagine why they would find anything compelling about XSL. It really is a dog language. Most people are just too ga-ga over the fact that it is encoded in XML to see how lame it really is.
Thankfully, few people are rallying behind it.
I was reading a book that said SGML was developed in 1974! Sure seems old and outdated to me!
Isn't this what the Cocoon project does? You list the stylesheets at the top of the code and then Cocoon selects the proper one based on the client ID.
Very cool.
If you "already do this [convert XML->HTML or XML->WAP]", how does that work? Is it custom?
I have written an article that will help you XML-newbies get up to speed on the idea of XML and some of the sub-specs. The Promise of XML.
I believe eventually we are going to get to a point where server-side transcoding will not be necessary. However, this will be several years, and we are going to have to learn how to do all of this efficiently.
I am even developing my own transcoding software process because I belive I have a better method of doing it than what is currently available. If and when I do succeed it will be closed-source because I want to make money off of my product, not just give away all my hard work.
Anyway, the next few years are going to be very interesting.
E
EverCode
This format in particular, offer no modularity or reuse features, and there is nothing about XML that strictly forbids such features.
I'm wondering if anyone's come across good tools for taking an XML document and doing insert/delete/update kind of things on individual elements and attributes? We've got XML CLOBs stored in Oracle 8i, and thusfar altering the XML document has meant rewriting the whole thing.... Any bright ideas? Curt.
Call me pedantic, but I have some issues with the following statement:
HTML and XML are related formats; in fact, HTML can be defined as a subset of XML.
This is a bit of a peeve of mine. HTML is an application of SGML, not a subset of SGML, and definately not a subset of XML.
A lot of stuff that's in HTML is not legal in XML, like the IMG tag and the OPTION tag:
Which is why XHTML was created.
-- and another thing, Mosaic was rendering HTML pages well before B.G. knew much about it. Powerful computers are not needed to browse web content. So it matters little whether or not you are currently sitting next to a 200, 300, 400, 500 or whatever MHz box. 500,000,000 processor cycles a second. 30000000000 processor cycles a minute are NOT required to read HTML. Running IE5 with its bloat, well maybe thats another story (but then it comes down to RAM here anyway)
XML only solves the problem of data formatting.
There are some doc-heads out there that are trying to wrap XSL, XQL, XPath, and some of the other proto-standards into one cohesive view of the world, but it really isn't there yet.
SQL databases are still the way to go for storage - more due to uptime and recoverability than anything else. Also, regular programming languages such as Python and Java, when used with DOM bindings are still a more powerful, efficient, and flexible solution than XSLT or XSL-FO.
> A lot of stuff that's in HTML is not legal in XML, like the IMG tag and the OPTION tag: Sure it is, in well formed XML documents. Just don't expect it to be understood by any other XML-based-language processor. Transcoding is a good idea, but the hard work isn't in the transcoding infrastructure, it's in the style sheets. Also, there's several commercial offerings in this space that have been around for a while; Spyglass Prism OnlineAnywhere (acquired by Yahoo) Proxynet (acquired by Puma) Argogroup Actigate MB
My article was simplistic in how I stated that, so I will try to correct myself here.
HTML has recently been slightly altered into the XHTML DTD.
A person can use any XHTML DTD in any XML document.
So saying that HTML is a subset of XML is not far from the truth. I am also willing to bet a person would have moderate success using a regular HTML DTD in an XML document, but it would not be worth it.
E
EverCode
As Mahir would say - I kiss you! - Infoshark's ViewShark is right at home with Oracle (my fave). Thanks again for the link.
read you history junior he was a preacher in New England during the reign of the puritans.
"Dying tickles!" -- Ralph Wiggum
The decomposition into three system elements (XML content source, Transcoding gateway, and browser) makes a lot of sense. That way the content source can focus on what it does - deliver content - and the transcoding gateway can handle the customizing the content for presentation on whatever device is making the request. The IBM Transcoding Technology (see http://www.ibm.com/software/secureway/transcoder/) is an example of a tool for building the transcoding gateway. You can download and try the beta code now. There are additional notes at this web site about other tools that may be useful in developing this kind of application. There is a short write-up on XSL at http://www.ibm.com/software/secureway/transcoder/x sl.html.
As you hinted in your note, it can sometimes be a challenge to select the best stylesheet to apply to a given XML document. The gateway may want to choose a stylesheet based on the source document and the destination browser or device. In addition, different stylesheets may be better suited to specific user preferences or network connections. The IBM transcoding technology includes a way to select the "best" stylesheet to apply in a given situation.
The Transcoding technology can also adapt content other than XML for different clients. HTML requires special processing because you can't apply stylesheets to directly since it's not well formed. Images also require special handling to adapt them for the destination device. The whole transcoding gateway may be a separate component, installed as an HTTP proxy, or it may be configured as a servlet on the same server that is the content source.
The jabber project is doing a lot of stuff with XML. I'm not sure if this is similair to the XML server that IBM is doing. Maybe someone wants to contrast them for me?
http://www.jabber.org/
If anyone is interested in integrating XML delivered content into their application, at Moreover.com we've just given free access to all of our headlines from 1500 sources, in a variety of flavors of xml: moreoverxml; wddx; rss and. See Moreover News Categories From our own perspective, what is interesting is that some of the more sophisticated XML-based initiatives for syndication of XML content such as ICE are over complex for many applications. Some much simpler definitions such as wddx allow very speedy integration of content and metadata into a database.
Open Content News
MultiMania's site has most of its content stored in XML. The main HTTP servers are Apache+PHP; we have a JVM running the SAXON stylesheet processor, and a MySQL database with "glue" data, telling the system which XSL stylesheet to apply to wich XML document to generate which HTML page. Some neat hacks and some smart caching even let us deliver 'semi-dynamic' pages - content stored as XML, interpreted as PHP on delivery.
XML rocks. You don't need to stuff your head full of theoretical debates about namespaces, general entities, etc. All you need is vi (or Notepad) and Saxon. To learn XML syntax, just write XML files by hand and feed them to SAXON until it no longer reports XML errors. To learn XSL, just write XSL files until you get SAXON to actually spit out some HTML. Lots of examples are available to accelerate the trial and error process.
When you are finally ready to integrate the whole shebang into actual applications, there are tons of open-source tools to choose from. Look at the list above again - Apache,PHP,MySQL,SAXON - cost zero - this combo drives one of France's most popular Websites.
From our own perspective, what is interesting is that some of the more sophisticated XML-based initiatives for syndication of XML content such as ICE are over complex for many applications. Some much simpler definitions such as wddx allow very speedy integration of content and metadata into a database.
Open Content News
Hmm, my first post went up before I read through all the comments; it's encouraging to see other 'big site' posters out there speak up with their own XML success stories.
But after reading all this, I'm getting this odd feeling again - like the one you're supposed to have with a severed finger that still itches even though it's been gone years - not that I would know about that; anyway, there's *something* missing in the picture and it makes it all wrong.
That something is of course a database that "naturally" represents, stores, and serves XML. With a usable XML database you wouldn't need SQL; you can express the same semantics, and a huge superset of them, in XML. You wouldn't
need an OODBMS; XML-Data bindings would do the OO part, the data store the dirty persistence work.
Current tools in this area are few and incomplete (see the XML-Server list at eGroups for links and discussions), I hope to see some major new Open Source efforts concentrating in that area in the future.
In the darwinian environment of the web, the real growth in XML has been using very simple schemas. Why conform to a huge spec. when a ten line DTD will generate an output that I can pull into my database with a five line Perl LWP script. Where is all the ICE syndicated traffic or XML-based EDI? It simply hasn't happened because it has been too complicated.
I've been following XML-EDI for a couple of years now since working on EDML and the tendancy to re-invent the wheel has slowed development enormously, and yet XML-EDI really should be the killer app. We could all forget having to fill in horrible paper forms without having to spend huge ammounts on Legacy EDI systems.
90% of all paper transactions are Invoices and Purchase Orders, wouldn't it be great if someone concentrated on this and developed the killer XML based system that allowed us to issue these electronically.
Open Content News
You might have a look at OmniMark, a free data-stream programming language that has a built-in XML parser, as well as database and network hooks.
I don't mean to take one position or the other. But I'm a little confused. I just really don't get it. Could someone please help me with this: How is XML better than SQL?
They are both well defined open standards.
SQL doesn't handle variable depth tree structures very well. So XML has a leg up there.
But isn't the next version of the SQL standard going to address that? Recursive structures and all that?
There is a real mathematical rigor and some very important underlying theory behind the relational database model. A lot of that theory has important performance consequences. Has that theory simply been superceded, made irrelevant, by this new model?
What is XML providing besides a natural ability to create outlines? And why not just extend the SQL standard to deal with that deficiency?
Now as far as *presentation* of data goes, that's not really XML. That's XSL, DOM, or some other formatting model. I certainly see the need to have a well-defined open standard for presentation.
Help please?
--Lawrence Lessig for Congress!
Hmmm... E-Speak was designed just for this purpose. I work on the product, so I'm not exactly unbiased, but it is:
Hope to see you there.
-kls
LibBT: BitTorrent for C - small - fast - clean (Now Versio
I thought I needed a 550 MHz Pentium III to enhance my web experience... :D
----
The Zvon XSL Tutorial
Crane Softwrights' Practical Transformation using XSLT and XPath
This isn't free but there are free preview chapters available and all reports I've read of it have been excellent (I have no association with Crane except as a user of the previews)
I recommend using a real programming language that includes an XML parser as a module.
You should look at the new vector graphics XML formats. The W3C's anointed format is SVG(Scalable Vector Graphics) ( SVG spec ). It is fairly complete and tends to be much smaller, in terms of file size, than gif or jpeg.
Hummingbir d just announced XML portal for Linux.
The Year of 2 Jesuses? You blasphemer... I'm not a Christian and even I support the "single-savior" theory. To suggest that Jesus was twins (hence the re-appearance 3 days postmortem) is quite sacreligious. You should be ashamed. Also, you posted in all caps. That's lame.