DTD vs. XML Schema
AShocka writes "The W3C XML Schema Working Group has released the first public Working Draft of Requirements for XML
Schema 1.1. Schemas are technology for specifying and constraining
the structure of XML documents. The draft adds functionality and
clarifies the XML Schema Recommendation Part 1 and Part 2. The XML Schema Valid FAQ
highlights development issues and resources using XML Schema. This article at webmasterbase.com addresses the
XML DTDs Vs XML Schema issue.
Also see the W3C Conversion Tool from DTD to XML Schema
and other XML Schema/DTD Editors."
There's no "vs."
XML Schema are much more flexible and powerful.
There're also about 100 times more difficult and confusing.
Why is PXML useful?
If your files are small and/or your platform isn't brain-dead, use a nice DOM API... or XPath, or XSD-based serialization libs. OTW use SAX...
Why would PXML be of any value, unless you are ripping thru your markup language char* by char*? You shouldn't ever need to see the XML at that level in our modern world... unless you are developing an XML parser!
ralvek
I am a programmer for a commercial company (yes I like to make money, and I program on WinTel). I year ago we had the XML craze we converted all our internal protocols to XML. I discovered that XML was just a lot of hype about nothing. There is nothing self-describing about it. Or maybe there is, just like the section names in an INI file describe the keys in them...
On the other hand the one thing that I did find XML useful for is easy parsing. If you use XML to develop a lower level protocol you end up with bloated 10k messages. But for high-level protocols or for configuration files it's great for only one reason: There are lots of ready-made tools. If you want to parse XML in Windows just load the IXMLDocument interface and it works at lightening speed. If you want to parse the messages in a web-browser through together a quick DOM parser or even use the build in DOM one! If you want to parse XML in PERL or C/C++ there are great libs. The only reason XML is good is because all the hype got people developing very neat tools. In one of my latest projects that needs to pass information between two programs written in different languages a used a Home-Made SOAP and designed a base class the persists using XML. I developed it in both langauges in under an hour!
So although it wastes bandwidth and there really isn't anything neat about it, it is comfortable I'll give it that.
God made the natural numbers; all else is the work of man - Kronecker
XML is a very powerful tool.
On very important use is in creating interfaces between heterogeneous systems. Areadable character set and meaningful tags is very handy for developers. The hierarchical structure is extremely powerful. And, of course, the fact that it is a standard with common tools is invaluable.
However, one useful principle of such interfaces is "if you don't understand it, ignore it." In other words, when you get a message, look for what you want in it and use it. Ignore anything that isn't what you want. XML is ideally suited for this approach - especially if you use path based access rather than DOM tree traversal.
This approach to interfaces allows systems to interchange messages without exact version consistency, and without requiring a tight congruence of the applications. It allows a system to "tell what it knows" and another system to "read what it needs" without further ado.
Unfortunately, the use of schemas goes against this idea. It is IMHO a more old fashioned approach of rigidly constraining the messages to an exact specification. This can make interfaces far less robust and flexible, and increase the amount of work.
Schema processing may also be promoted to "verify" message integrity before processing. However, it only does so in the most primitive ways. Real world messages, especially in the business world, tend to have integrity rules that go far beyond what can be expressed in anything short of a complex computer program or equivalent declarations.
I am sure there are plenty of places where schemas make sense, but in the areas of commercial message interchange, they take a powerful and flexible construct and hobble it.
The only good weather is bad weather.
This approach to interfaces allows systems to interchange messages without exact version consistency, and without requiring a tight congruence of the applications. It allows a system to "tell what it knows" and another system to "read what it needs" without further ado.
Unfortunately, the use of schemas goes against this idea. It is IMHO a more old fashioned approach of rigidly constraining the messages to an exact specification. This can make interfaces far less robust and flexible, and increase the amount of work.
If your talking about using XML for data messaging not using schemas is just lazy. XML Schema allows optional elements and attributes and/or default values. So if it isn't required, then just make it optional. If you want multiversion interfaces, you have a different XMLSchema for each version. Then each side knows explicitly what the messaging protocol is.
While it's probably true that things mostly kinda work if the versions don't match, you shouldn't be relying on this. There's lots of software out there that does this but that doesn't mean it's the ideal.
If your using XML for markup of documents, schemas are somewhat less useful since the underlying semantics of the tags is usually more important.
I am not a number! I am a man! And don't you
Trimming bloat like namespaces and comments? Are you nuts?
How do you embed MathML in another document (like XHTML)? Currently it's with namespaces. How do you propose to do that without namespaces? Just the prefixes? What happens when two different markups use the same prefix? Wups! You're screwed!
No comments? This is supposed to make a better alternative to XML? It won't help readability, and it certainly isn't a major bottleneck during parsing.
Don't want the "bloat" of namespaces and comments? Wait for it... Wait for it... Don't use namespaces and comments in your documents! Wow! What a concept!
Maybe no Unicode in PXML hunh? So much for interoperability for any kind of data. You don't ever want your pet project used in East Asia (or Russia or Greece or most other places in the world) do you? Unicode too bloated? Why not just use ISO-8859-15 (basically ASCII w/ a Euro character -- which incidentally a Euro character isn't available in ASCII)? Oh wait! That's right. You don't want to allow processing instructions, which in XML tell you what encoding is used.
What happens if you want to change some of the basic syntax of PXML? Because you've nuked processing instructions, you can't specify a markup version like you can in XML.
Yes, yes. We've all seen your little pet project. I hope it was just a class assignment.
- I don't need to go outside, my CRT tan'll do me just fine.
Gartner calles this phase the "Trough of Dissolutionment" phase of the Hype-Cycle. After a massive uptake of new technology with great expectations, you realise that it's not the holy grail and swing fast and hard the other way (anti-hype)...then about six months later you enter the actual "productive" stage of the new technology...not hyped, but understood for it's strengths and weaknesses and used accordingly. Most new technology follows this trend (or so gartner says, massive generalization, but they use it as a market prediction model). The good news is, if you know the psychological states, you can avoid them, and think for yourself (rather than getting caught up in crowd behaviour -- which we all do)...and go straight to productive stage :).
btw, DTD is dead, long live schema
When you're talking about standards you need to have things specified exactly, and schemas give you a standard way to do that. And they also allow you to do things like automatically generate code blocks to represent your data in memory, saving developers of data-processing apps a lot of time. And not only that, they create a simple way to communicate between organizations. What would you have people do, look at the XML themselves and guess it's structure (which would work about 95% of the time, but that 5% will bite you in the ass when you get something unexpected).
And finally, Schemas don't force any of that on you. If you don't need schema support, then don't turn it on in your parser. You can still grab what you need out of the tree. Although you might not be able to throw just anything into it, that's probably a good thing. The last thing the world needs is thousands of tiny, ill-conceived exotic extensions to various Datatypes. It would make achieving universal compatibility a nightmare.
If your app doesn't need schemas, don't use 'em. If you don't need to validate, don't check em. If you need to put more data into your tree, maybe you should rethink what your doing or rewrite maybe your schema.
autopr0n is like, down and stuff.
The schema team is aware of these problems and have acknowledged that it is a deep problem with schema, which isn't easily fixed. The author even mentioned he used DTD because he didn't want the weight of schema. One of the most annoying things about schema that I've seen is people try to export database tables as tables for a OO application to use. that makes absolutely no sense and schema in its current form doesn't discourage that kind of usage. Many people in the Java prefer castor for that reason. Perhaps the author needs to research about DTD, Schema, marshalling, unmarshaling a bit further to see how and when they break horribly.
.. was a language only something you can "program"?
If it was called it a programming language that would be wrong, but it's certainly a language.
I'm sure their library would give you the data in exactly the format that you need it in, be available for the language that you want it in, the platform that you need it on, and they will continue to update and support every single variety. You would also, therefore, completely trust this closed, 3rd party code that you've now integrated into your product, to not have any bugs or security holes.
Open data formats are a good thing.
Absolutely. All the possible attributes, and kids of any element are there in one (OK, two) place(s) and you can garner the information about any element in a matter of seconds. With XML Schema you have to keep track of the levels of nesting and rifle through a series of name/value pairs to get the same information. It is in its greater expressiveness that the advantage of XSD is seen to lie. And there might be applications where this expressiveness necessitates the use of XSD.
However, XML Schema, has besides this expressivenss, one other great advantage. It is XML. As such it can be processed with the same XML tools one uses elsewhere with an XML application.
As an example, in one application, I take a DTD, translate it into XSD, and then run an XSL stylesheet over the XSD file to generate some base code used in my application. In this way I can ensure that my code will automatically be changed to reflect any minor changes made to my Schema.
So while I continue to write DTDs, I look on XML Schema as a way to translate, and bring my DTD into the XML universe, with all its attendant advantages.
Better to be despised for too anxious apprehensions, than ruined by too confident a security. --Edmund Burke
I reallly like the approach of the REST guys, a much lighter weight and more intuitive approach to web services than SOAP.
Basically, they are saying - use HTTP as it was intened to be used, not abusing it in a way it was not meant to be abused.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
I see a lot of replies here that flame XML as an RPC protocol. Using XML as a message format for RPC is just one small part of what you can do with it.
:-)
The real strength you get for free when using XML is that you can use standard parsers and transformers to handle different kinds of data in a uniform manner.
I'll give you an example..
Today I was working on my wxWindows application, and I needed to translate (i18n) a lot of windows, dialogs and menus to a few languages. The resources are specified in XRC which is an XML format.
Within XRC files labels for buttons and other widgets are written in the language they were created in, usually english.
What I did was write an XML catalog file for each language. I then ran an XSLT processor with the XRC files, using the catalogs as plugins with a simple pattern match rule -- and I didn't have to write one single line of customized code to do it.
Though for RPC I'll stick to CORBA with IIOP any day
I hate snobby XML evangelists who do nothing but propose that companies change the systems that that have been working for them for 50 years, all to XML and the dozens of related technologies.
People should realize that XML adoption is a matter of degrees and it is perfectly acceptable to adopt just the concept of a simple markup language. Adoption of XML should NOT necessitate the adoption of XSLT, XHTML, SOAP, and every thing else under the sun that is related to it.
If you have a database system with a protocol for passing messages, then maybe add a new type of XML message with a privately known schema. Don't rip up your system and rewrite it as an XML web service, have it talk only the SQL Server XML services, only use XSLT to transform data, and only XPath for retrieving data.
I'm so fucking sick to death of overarchitects who never actually seem to get the job done on time. (And I'm no XP fan, either, so that's saying a lot!)
What happens when you want to have an Alpha box, a Pentium box, a handheld device, and an UltraSPARC box talk to one another? Simple, right? After all, an int is always 32 bits...err...umm...and everything is big endian...err...ummm...and all architectures use the same data structure padding... Well, at least your program took care of the padding issue...for that one data structure.
Wups! We've got a core dump waiting to happen. Okay, so we'll just make sure that everyone is using the same sizes and padding all around for any data structure I may need to pass over the network. Of course, this requires a mapping layer so I don't have to do this for every app and data structure that I write. I know: it's for interfaces and defines the general structure. I'll call it Interface Definition Language or IDL for short. Now I'll make sure that all of this information serializes to the network correctly and decodes on the other end without errors. This will be kind of like a stock broker it that I tell it what I want, and it translates it into something usable but more complex than I need to deal with for each app. I think I'll call it a broker too...an object broker...wait...missing something...messages going back and forth...asking for resources...aha! Object Request Broker! Yeah! Oh wait, but people may have different implementations and I want to be able to work with others. Let's agree on this. We'll call it the Common Object Request Broker...ummm...Architecture! Yeah, that's it!
Hmmm...now I need to make a configuration file for my program. I'll make it plain text. Hmmm...but it needs some kind of structure. I'll make it key/value pairs -- just put in a few equal signs and I'm done. Uh oh. My program is fairly modular, but I want to keep all of the settings in one place. If it's just key/value pairs, everything will get jumbled together. I know! I'll use an INI file. Microsoft used to use those to group items together. Now I can just use those nifty GetPrivateProfileString calls, specify the group and the key, and away I go. But uh oh! I have this subcomponent that requires a group within a group. Let me hack something together... Argh! This data file is getting tougher and tougher to parse. I want to finish writing my program that does something useful, not fiddle away at a dumb configuration file parser. What I need is a standardized, hierarchical format that is still plain text and human readable. Hmmm...what's this "XML" thing? I can have the configuration all in memory or read it in piecemeal? Parsers are already written? If I don't like the parser I'm using, I can just plug in another one? I can read the file from any programming language out there? Sign me up!
FYI: This binary vs. "plain text" tripe needs to go away. All text files are binary files. What is the letter 'B' but a 0x42 (66 in decimal, 01000010 in binary)? It's a piece of translation software that turns that 0x42 into the character 'B' on our screens. I just so happens that <foo/> is clearer to the human eye -- after the preliminary software translation step -- than a serialized C data structure. Clearer to the human eye means that the human fixing bugs can see the error faster. CPUs are hovering aroung the 3GHz range now, but the human mind seems to be falling further and further behind Moore's Law. Perhaps we should help the human mind out a bit and give a bit more work to the CPU.
Yes...I know...I'm a dick. I'm comfortable with that.
- I don't need to go outside, my CRT tan'll do me just fine.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<text xmlns="urn:iana:xml:ns:cruft-1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema instance"
xsi:schemaLocation="urn:iana:xml:ns:cruft my_ass.xsd">
<phrase length="11" language="english" subfamily="us" charset="latin-1">
<word length="5" capitalized="yes" propernoun="no">
<letter capital="yes" type="ASCII" bits="8">H</letter>
<letter capital="no" type="ASCII" bits="8">e</letter>
<letter capital="no" type="ASCII" bits="8">l</letter>
<letter capital="no" type="ASCII" bits="8">l</letter>
<letter capital="no" type="ASCII" bits="8">o</letter>
</word>
<space breaking="no" width="normal" type="ASCII" bits="8"></space>
<word length="5" capitalized="yes" propernoun="no">
<letter capital="yes" type="ASCII" bits="8">W</letter>
<letter capital="no" type="ASCII" bits="8">o</letter>
<letter capital="no" type="ASCII" bits="8">r</letter>
<letter capital="no" type="ASCII" bits="8">l</letter>
<letter capital="no" type="ASCII" bits="8">d</letter>
</word>
</phrase>
</text>