DTD vs. XML Schema
AShocka writes "The W3C XML Schema Working Group has released the first public Working Draft of Requirements for XML
Schema 1.1. Schemas are technology for specifying and constraining
the structure of XML documents. The draft adds functionality and
clarifies the XML Schema Recommendation Part 1 and Part 2. The XML Schema Valid FAQ
highlights development issues and resources using XML Schema. This article at webmasterbase.com addresses the
XML DTDs Vs XML Schema issue.
Also see the W3C Conversion Tool from DTD to XML Schema
and other XML Schema/DTD Editors."
There's no "vs."
XML Schema are much more flexible and powerful.
There're also about 100 times more difficult and confusing.
PXML is a subset of XML - an alternative to the bloated XML language.
believe me, you won't use XML anymore if you once tried PXML
1. DTD 2. XML Schema 3. CowboyNeal validation (via SOAP over SMTP)
He usead an apostrophe correctly.
While the W3 continues to push Schema, they are also forming working groups for RELAX after pressure from XML luminaries such as James Clark.
XML Schema is also kinda whacked. It shows all the signs of being a committee specification.
The big problem with schema is that you actually have two type systems going. Element definitions are types for elements. Type definitions are actualy types for types for elements. I saw a hopelessly confused attempt by some UML people to express XML schema in UML, they simply could not understand that there was no way it could ever work. UML has completely different semantics.
There are a bunch of schema proposals that folk have said good things about. Eve keeps telling me I should look at Relax. But for the time being XML schema is going to be the basis for standards in W3C and OASIS.
There might be an opportunity to do a clean up job on XML schema in 4 or 5 years but that will only happen if it is causing real problems.
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/
dammit, right after I buy a book to finally learn XML in detail, they change the standards. :P
I am a programmer for a commercial company (yes I like to make money, and I program on WinTel). I year ago we had the XML craze we converted all our internal protocols to XML. I discovered that XML was just a lot of hype about nothing. There is nothing self-describing about it. Or maybe there is, just like the section names in an INI file describe the keys in them...
On the other hand the one thing that I did find XML useful for is easy parsing. If you use XML to develop a lower level protocol you end up with bloated 10k messages. But for high-level protocols or for configuration files it's great for only one reason: There are lots of ready-made tools. If you want to parse XML in Windows just load the IXMLDocument interface and it works at lightening speed. If you want to parse the messages in a web-browser through together a quick DOM parser or even use the build in DOM one! If you want to parse XML in PERL or C/C++ there are great libs. The only reason XML is good is because all the hype got people developing very neat tools. In one of my latest projects that needs to pass information between two programs written in different languages a used a Home-Made SOAP and designed a base class the persists using XML. I developed it in both langauges in under an hour!
So although it wastes bandwidth and there really isn't anything neat about it, it is comfortable I'll give it that.
God made the natural numbers; all else is the work of man - Kronecker
RelaxNG vs. W3C Schema makes a much more interesting discussions. DTD is obsolete in many ways... and most of the XML parsers support schema now.
XML is a very powerful tool.
On very important use is in creating interfaces between heterogeneous systems. Areadable character set and meaningful tags is very handy for developers. The hierarchical structure is extremely powerful. And, of course, the fact that it is a standard with common tools is invaluable.
However, one useful principle of such interfaces is "if you don't understand it, ignore it." In other words, when you get a message, look for what you want in it and use it. Ignore anything that isn't what you want. XML is ideally suited for this approach - especially if you use path based access rather than DOM tree traversal.
This approach to interfaces allows systems to interchange messages without exact version consistency, and without requiring a tight congruence of the applications. It allows a system to "tell what it knows" and another system to "read what it needs" without further ado.
Unfortunately, the use of schemas goes against this idea. It is IMHO a more old fashioned approach of rigidly constraining the messages to an exact specification. This can make interfaces far less robust and flexible, and increase the amount of work.
Schema processing may also be promoted to "verify" message integrity before processing. However, it only does so in the most primitive ways. Real world messages, especially in the business world, tend to have integrity rules that go far beyond what can be expressed in anything short of a complex computer program or equivalent declarations.
I am sure there are plenty of places where schemas make sense, but in the areas of commercial message interchange, they take a powerful and flexible construct and hobble it.
The only good weather is bad weather.
Honestly, I believe that xml is just a nice way to replace INI files and CSV files. Seems to be about it really. The odd business may use xml for b2b.
And XHTML really bites. You can tell the w3c doesn't listen.
When you bend over to take the latest XML Schema, don't forget your SOAP.
Any idiot with 1/4 of a brain who speaks this language can decipher what are commonly called "conversational contractions."
doncha
gotta
gonna
know what those mean, or shall I translate for you?
schmuck.
It's occurred to me maybe we are being too diligent in actually validating the schema itself, but I'm wondering what others think?
HTML will continue to dominate the web for a long time.
For PHP I have mysql, cookies, and php to save data into, and for Java I have serialization through Object streams.
Is there any case in which I, as a Java and PHP devloper, would want XML? Would there be any advantage to using it over my current options.
You can't judge a book by the way it wears its hair.
One of the greatest things about XML schemas is that they themselves are well-formed XML documents. This makes it a breeze to parse and create XML Schemas. I've just started using XML Schemas in development for the past few months, and they are fantastic. A huge improvement over both DTD and XDR (Microsoft's temporary schema format until XML Schemas came out).
Forget the whales - save the babies.
This approach to interfaces allows systems to interchange messages without exact version consistency, and without requiring a tight congruence of the applications. It allows a system to "tell what it knows" and another system to "read what it needs" without further ado.
Unfortunately, the use of schemas goes against this idea. It is IMHO a more old fashioned approach of rigidly constraining the messages to an exact specification. This can make interfaces far less robust and flexible, and increase the amount of work.
If your talking about using XML for data messaging not using schemas is just lazy. XML Schema allows optional elements and attributes and/or default values. So if it isn't required, then just make it optional. If you want multiversion interfaces, you have a different XMLSchema for each version. Then each side knows explicitly what the messaging protocol is.
While it's probably true that things mostly kinda work if the versions don't match, you shouldn't be relying on this. There's lots of software out there that does this but that doesn't mean it's the ideal.
If your using XML for markup of documents, schemas are somewhat less useful since the underlying semantics of the tags is usually more important.
I am not a number! I am a man! And don't you
You really have absolutely no idea what you are talking about do you?
I don't even know where to begin to explain all of this to you, so I'll just say this: you are a moron.
Rather than directly having DOM and some xml files etc, what do people think of having applications talk via SDAI?
This is useful for everybody involved, and ensures proper software design.
That said, I do agree that they may only enforce rules in "the most primitive ways", but anything you can't describe in the schema, you can always move higher up in your code.
Some people need to do better jobs thinking up domain names.
autopr0n is like, down and stuff.
Trimming bloat like namespaces and comments? Are you nuts?
How do you embed MathML in another document (like XHTML)? Currently it's with namespaces. How do you propose to do that without namespaces? Just the prefixes? What happens when two different markups use the same prefix? Wups! You're screwed!
No comments? This is supposed to make a better alternative to XML? It won't help readability, and it certainly isn't a major bottleneck during parsing.
Don't want the "bloat" of namespaces and comments? Wait for it... Wait for it... Don't use namespaces and comments in your documents! Wow! What a concept!
Maybe no Unicode in PXML hunh? So much for interoperability for any kind of data. You don't ever want your pet project used in East Asia (or Russia or Greece or most other places in the world) do you? Unicode too bloated? Why not just use ISO-8859-15 (basically ASCII w/ a Euro character -- which incidentally a Euro character isn't available in ASCII)? Oh wait! That's right. You don't want to allow processing instructions, which in XML tell you what encoding is used.
What happens if you want to change some of the basic syntax of PXML? Because you've nuked processing instructions, you can't specify a markup version like you can in XML.
Yes, yes. We've all seen your little pet project. I hope it was just a class assignment.
- I don't need to go outside, my CRT tan'll do me just fine.
Gartner calles this phase the "Trough of Dissolutionment" phase of the Hype-Cycle. After a massive uptake of new technology with great expectations, you realise that it's not the holy grail and swing fast and hard the other way (anti-hype)...then about six months later you enter the actual "productive" stage of the new technology...not hyped, but understood for it's strengths and weaknesses and used accordingly. Most new technology follows this trend (or so gartner says, massive generalization, but they use it as a market prediction model). The good news is, if you know the psychological states, you can avoid them, and think for yourself (rather than getting caught up in crowd behaviour -- which we all do)...and go straight to productive stage :).
btw, DTD is dead, long live schema
I can't believe nobody's mentioned this yet. Microsoft has a tool that will do several things:
This makes writing your XSD almost trivial. The code-generation capabilities are very powerful, as well, as you can generate runtime classes for serialization/deserialization or classes derived from DataSet so you can treat XML files like any other database, etc. It's very useful if you're doing any
I'd be very surprised if there weren't other tools out there doing similar things. I simply mentioned xsd.exe because that's what I'm familiar with.
I've been developing with XMl for a half a year now, and I found both validation methods to be really bad
DTD is easy to learn, simple to write, only that you cannot really do what it is supposed to do, that is validate well formedness, it is pathetic, for example, you have cardinality operators that allow you to specify one or more, none or more and optional elements but to constrain element occurances to say 2 to 5 is just too much, the founding fathers never thought about that level of complexity
then you have the silly parsed character data as type definintion, sheesh
Schemas aren't really better either. Most schemas are utterly incomprehensible to humans, it is like the Perl there's more than one way to do it philosphy permeates it, you can do the same schemas in so many different ways that it takes a serious mental effort to understand someone else's schemas
gimme something better folks b/c both of these just suck
I always thought referring to XML as a language was pretty misleading. People have said the same thing about HTML for years, but at least being able to "program" HTML actually has some worth. Knowing XML by itself has no value, it's how you use it with other languages.
When you're talking about standards you need to have things specified exactly, and schemas give you a standard way to do that. And they also allow you to do things like automatically generate code blocks to represent your data in memory, saving developers of data-processing apps a lot of time. And not only that, they create a simple way to communicate between organizations. What would you have people do, look at the XML themselves and guess it's structure (which would work about 95% of the time, but that 5% will bite you in the ass when you get something unexpected).
And finally, Schemas don't force any of that on you. If you don't need schema support, then don't turn it on in your parser. You can still grab what you need out of the tree. Although you might not be able to throw just anything into it, that's probably a good thing. The last thing the world needs is thousands of tiny, ill-conceived exotic extensions to various Datatypes. It would make achieving universal compatibility a nightmare.
If your app doesn't need schemas, don't use 'em. If you don't need to validate, don't check em. If you need to put more data into your tree, maybe you should rethink what your doing or rewrite maybe your schema.
autopr0n is like, down and stuff.
Is that related to Linear Object Language?
autopr0n is like, down and stuff.
When I was in school, the plural of "schema" was "schemata".
</>
I've already selected "No Karma Bonus". Beyond that I can't mod myself downward.
Prime numbers are exactly what Alan Greenspan says they are -S. Minsky
Hopefully whatever happens it will not be the evil schemas from La Blue Girl anime flic
Modded me down again, huh? You little jackass. I'll get you, punk! I'll GET YOU!
In my experience, many benefits of XML come when dealing with the presentation layers of many application architectures, with the ability to repurpose syndicated data at wil, here are a few examples:
Effective use of XML and XSLT allows you to easily aggregate informational data from one or multiple sources and "repurpose" for an infinite variety of business and technological goals.
One of the main benefits of XML is that it offers and effective, textual representation of "scructured data", that can be conveniently accessed and manipulated according to a slew of various surrounding standards such as XPath, DOM, XSLT, namespaces.
Extraordinary Vacations. Exceptional Prices
Lazy in this circumstance is often good. What you just described is a bunch of work, which translates into *money*. The important question to ask is what is the utility of creating this schema, vs what is the cost of doing so. The answer varies from case to case.
Work does translate into *money*, not doing work doesn't translate into *saving money* except maybe in the extremely short term.
Furthermore, XML messages (with the exception of configuration files where schema may actually be quite useful) are normally generated by computers, not people. The rules to generate those messages are then embedded in code (or tables, which is code by another name). Once it works, it will usually continue to work. So again, the schema has offered no advantage, while adding bureaucracy.
It's true that XMLSchema provides syntactic rather than semantic constraints. But that's *really* useful information. For example XML Schema allows type checking. Sure you can just treat everything as a string and ignore the problem. You can also use it to contrain the valid values for something with regular expressions. This allows you to do assertions at the protocol level. Again, I can get away with not using them but in the long term, that's just stupid.
And if your schema is generated by computer doesn't that make it more useful, not less? It's like saying that COM/CORBA interfaces are nice but IDL is just pointless niggling...
As an analogy, consider a schema to be like a syntax checker. It can tell you if the niggling details are right, but it can't tell you about the whether the proram will work. Since in many cases of message exchange, the niggling details are not even important, this is often a waste of time!
Yes, you could consider an XMLSchema as kind of type checking and syntax checking for your XML. It's been my experience that most real problems are niggling details (unless your doing demoware). Given the broad spectrum of programming tasks out there using XML these days, it would be careless to say that they *all* need Schema (and/or schema validation) which I didn't. But saying that Schemas are always (or for that matter often) a waste of time is IMO a lazy attitude.
I am not a number! I am a man! And don't you
This is really offtopic, but I figure there are a lot of w3 savvy people reading this column.
Is there a correct way to put flash on a page and pass validator.w3c.org for valid HTML 4.01?
The man who trades freedom for security does not deserve nor will he ever receive either. - Benjamin Franklin
if you don't care about validity, your program will not work correctly. in short, it will "suck". please don't write programs that suck, as there are plenty out there already. thank you.
Oh my, where to begin. Please, can't we get some folks in here that have actually worked on real, professional systems? Only a complete moron would make a statement like the parent post. And when I read it someone marked it Insightful, wow, only a bigger idiot (or maybe a PHB) would do that.
.NET there are XmlSerialization and Deserialization engines. Basically, you can take any object and get an xml representation of it (Deserialize), and by the same token you can make an Xml representation of an object and Serialize it into the object. Using these techiques allow you to pass data between application layers or between servers without getting all talky (i.e., in one call instead of setting individual properties, etc).
First off, Xml is not hype. In it's simplest form its a format that has standard parsers on every platform. In it's most robust, it's a terrific data description language that can be used to describe really complex data.
Here's an example of the power of Xml, in
Now, this is basically in the MS world what COM does but the power here is that you're passing complex data types from one application to another in a standard format.
Here's another example, we wanted to store all error messages for an application in a standard xml file. I created an Xml Schema for the file to make sure that all of our developers entered the error codes in a proper format. At build time, I have a script that validates that the file is correct. Furthermore, to help the developers when they update the Xml file, VS.NET provides IntelliSense to let them know what tags go where (thanks to the schema reference).
To me, that's pretty powerful stuff, considering that now I know at build time that everything concerning that section of the app is set up correctly.
Personally, I think those of you out there who don't understand the value of Xml and Xml Schema also don't have a lick of real world programming experience. Hopefully I'll never have to work with you...
but I think you are totally off base with regard to CDATA sections. If anything, they make life easier for the parser, not harder -- at least when I was writing a parser, CDATA made things faster and easier. In cases where you are including a great deal of symbolic data -- for example, when you want to include a source code segment or ASCII art -- it is both easier to read, faster to parse, and *less* bloated.
'<' takes up less space than '<'. Assuming you have more than three or four of these in your text node, a CDATA section reduces the size of your document. For the parser, after the CDATA section is begun, only the character sequence ']]>' can end it. This means the parser only has to check for ']]>' and not '<', '&', '<?', '<!', etc.
And yes, there is such a beast called XInclude, but it's currently only a candidate release. It's used like this:
<foo>
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="bar.xml" parse="xml">
<xi:fallback>
<para>This text goes in if bar.xml cannot be found or has an error</para>
</xi:fallback>
</xi:include>
</foo>
Hopefully most entities can go the way of the dodo.
- I don't need to go outside, my CRT tan'll do me just fine.
Flame me if you wish, but I prefer DTDs when working with XML. DTDs are MUCH easier to create than schemas. Plus, there's no annoying namespaces to deal with.
I still wonder whether or not a transport based on XML (or even ASN.1 representations of XML) would ever be made available to CORBA. All of the advantages of IDL with the serialization and interoperability of XML.
- I don't need to go outside, my CRT tan'll do me just fine.
The key to robust parsing is deferring the decision as to whether a tag has a closing tag until you've seen enough input to know. You have to read the whole document in, build a tree, then work on the tree, but for anything serious you want to do that anyway.
This parser is in Perl. If anyone would like to take it over and put it on CPAN, let me know.
Ok- what if google, amazon, etc were to do the same thing, but translate in binary data, without tunneling overport 80 (which is bad, evil, and vile. Just ask any sys admin), and provide a library that parses the binary data for you?
It would be the exact same thing- except it would be faster, use less bandwidth, be more secure, have session level security (which HTTP lacks). But it wouldn't be buzzword compliant.
I still have more fans than freaks. WTF is wrong with you people?
I don't agree with you that schema validation is useless. In many cases the documents are fully processed for business rules much later, but you want acknowledgement that your document has reached correctly and it passes atleast the most basic validation (e.g. dtd or schema validation). XML Schema do wonderful job at that. In our case, we always keep schema validation on new doc types until the system is stable and bug free and then remove validation for efficiency (for internal docs). We have discovered many subtle bugs in system which would have been extremely hard to track by looking at application error but were easier to find by looking at parser errors.
It would be somewhat unfortunate if both end up popular, because it will be more work to maintain both sets of tools than either one alone. That's probably what will happen, though, at least in the short term.
DTD is about document validation.
Schema is not only about document validation, but also about layering a rich and extensible type system into XML. DTD can't touch this.
SOAP actually went down the path of adding types to Schema until the actual Schema team did it, thereby simplifying the SOAP system quite a bit.
Listen to what he has to say.
Wasn't that a book by PKD?
Schemas, XML, blah, blah, blah. What a load of shit. I use HTML. If it ain't broke, don't fix it.
Do you even know what XML is for? HTML is pretty much restricted to displaying information. XML goes far beyond that. It is a method of representing information hierarchically using text. Why does that matter? It's powerful because it allows you to express data relationships in a way that CSV data cannot (and that HTML couldn't, and shouldn't, ever hope to).
For example, are you familiar with how Sun RPC works? One of the things that makes it ugly is that you have to do all sorts of nasty serialization of binary data structures to be able to express them across a wire in an architecture-independent way. The data is human-unreadable, ordering and bytesize is critical, and you generally have significant code changes on both ends if you modify data structures. XML does not suffer from these problems. You can describe data structures very easily, and you can extend them at a later time with little problem. It's architecture-independent, since the data is simply a text character stream in flat text blobs/files, so you can read data cross-platform. And a human can read it.
I'm currently working on a project which requires that we transmit huge amounts of relational data between different vendor's databases on machines with different processor architectures. XML makes this exceedingly simple. It would be near impossible to do what we're doing without XML.
Clearly, HTML is useless for this sort of thing. This is not even remotely what it's intended for. Likewise, XML is really not intended as a display language, though it *could* be used that way. In reality, nobody does, though, because HTML works well. (Never mind that XML can pretty much be viewed as a superset of HTML.)
Hows about educating yourself about things you feel obliged to slam? Maybe you'll look like less of an idiot if you do. Or perhaps less of a troll.
Well, that's why you'd use HTTPS with certificates, no? And nothing is wrong with the port. If you meant HTTP, then yeah, it's plaintext.
Mind you, I don't have a choice of OS's at work. We use solaris and linux. Now amazon, being a windows shop (i'm guessing), only gives out dll's. Great, now I'm not supported. So fine, we use java. Did you know java class (binaries) are versioned? I'm stuck with 1.3.1 ATM and a 1.4 jdk is in the works. Problem is, some jdk's use one version of the binary while another uses.. another. I always hoped it was a universal format. Sadly let down.
That's why technologies like JAXB and translets are poping up. with JAXB, you can bind particular classes to particular schemas/dtd. It speeds up processing. Translets are just compiled XSLT. Really fast since your xslt can be compiled/interpted once, run anyhwere. Kind of a chain technology. translet->xslt->java->machine language.
And mind you, nothing is more secure about a binary format. It's just obfuscated. Hell, I hacked rengeade bbs's users database format so i can write a user deletion tool. Were they going for security, prolly not. Point is, binary is just obfuscated.
As for your sessoin level security, that's not the job of your data format. Your data format and transport layer should be indepenent. It's why you can do SOAP over HTTP, SMTP/mail and possibly anything else that has a function() like response format. request->response. It's probably why ssh is so great. All it is, is a way of authentication, communication and encryption. You can create ssh tunnels for http as a proxy.
-
ping -f 255.255.255.255 # if only
"constraining the structure of XML documents"
I'd be happy if we could constrain the use cases for XML. It seems to have solved everything! How about limiting its use to the problem set it solves - and I'm not talking about machine to machine communication. What a bloated pile of bits SOAP is. Imagine wrapping a remote method call in layer-upon-layer of XML tags?
Save the world? Heh.
Bill
Upon seeing the box was too small, Schrodinger's Elephant breathed a sigh of relief.
I know. I did some XML development when I was a computer guy. I was trying to make a point... what in the hell does the W2C have to do with anything non web-related?
And, as far as the usefulness of XML, I just found it to add extra overhead. The parsers are cute and all, but it doesn't generally help since you still have to know the strucure of the data. So, between the XML creation, the extra data being transmitted, and the parsing, if I want performance, I'll stick with a binary pipe or a simple comma-delimited stream. XML really isn't a big deal at all. Nothing but hype.
There's nothing that says it *has* to be on port 80, but providing XML rather than binary data reduces both the initial development *and* the maintenance time required to release the data to the public. Also, session level security is unnecessary in a public publishing environment. Finally, with modern compression techniques, bandwidth isn't wasted.
"Times have not become more violent. They have just become more televised."
-Marilyn Manson
The schema team is aware of these problems and have acknowledged that it is a deep problem with schema, which isn't easily fixed. The author even mentioned he used DTD because he didn't want the weight of schema. One of the most annoying things about schema that I've seen is people try to export database tables as tables for a OO application to use. that makes absolutely no sense and schema in its current form doesn't discourage that kind of usage. Many people in the Java prefer castor for that reason. Perhaps the author needs to research about DTD, Schema, marshalling, unmarshaling a bit further to see how and when they break horribly.
The Schema WG decided on "schemas" so as not to add unexpected obscurity to the specification.
See this message.
Expected obscurity is of course just fine.
.. was a language only something you can "program"?
If it was called it a programming language that would be wrong, but it's certainly a language.
I'm sure their library would give you the data in exactly the format that you need it in, be available for the language that you want it in, the platform that you need it on, and they will continue to update and support every single variety. You would also, therefore, completely trust this closed, 3rd party code that you've now integrated into your product, to not have any bugs or security holes.
Open data formats are a good thing.
I know. I did some XML development when I was a computer guy. [...] I'll stick with a binary pipe or a simple comma-delimited stream. XML really isn't a big deal at all. Nothing but hype.
Well, there are "computer guys", and then there are "computer guys". I can see why you're not one any more.
"Will it change the world? Of course not. It's just a markup language. Will any other computing tool change the world? Of course not. The end users have never cared how you got to the solution. They cared only if you got to the solution faster than the other guy."
Maybe that's part of the problem. XML isn't a "markup" language. XML is more a toolkit for creating, shaping, and using markup languages. Or another way to put it, is it's a set of rules for building markup languages. But not a "markup" language itself. No wonder people are complaining about it. People who expect HTML to be a "formatting" language are have a similiar problem.
that the same applications of XML that drive the keening about bloat and hype seen in these comments are precisely those which are driving the specs to the wrong side of the 80/20 for XML/XSL's original goals: bringing the semantic power of SGML and DSSSL to the Web. Goals for which its purist cousins RelaxNG, REST, et. al. remain admirably suited.
The back-end curmudgeons are right, XML stinks for a universal wire format. But for loosely-coupled, message-based, semantically-rich systems it is hard to beat. And document-oriented systems which don't use XML barely deserve notice any longer.
I gently refer s-expression trolls to paul and oleg
illegitimii non ingravare
Ok- what if google, amazon, etc were to do the same thing, but translate in binary data,
OK, what if that technology existed for decades, and Google, Amazon, etc never made use of it?
I can understand people being sceptical about tecnology snakeoil, but you got to remember that Hype gets people thinking about Applications, and Applications uitimately butter most of our breads.
Your world where we are waiting for a world-wide distributed consumer network running on COBOL and ASN.1 and DCE RPC just never happened.
Absolutely. All the possible attributes, and kids of any element are there in one (OK, two) place(s) and you can garner the information about any element in a matter of seconds. With XML Schema you have to keep track of the levels of nesting and rifle through a series of name/value pairs to get the same information. It is in its greater expressiveness that the advantage of XSD is seen to lie. And there might be applications where this expressiveness necessitates the use of XSD.
However, XML Schema, has besides this expressivenss, one other great advantage. It is XML. As such it can be processed with the same XML tools one uses elsewhere with an XML application.
As an example, in one application, I take a DTD, translate it into XSD, and then run an XSL stylesheet over the XSD file to generate some base code used in my application. In this way I can ensure that my code will automatically be changed to reflect any minor changes made to my Schema.
So while I continue to write DTDs, I look on XML Schema as a way to translate, and bring my DTD into the XML universe, with all its attendant advantages.
Better to be despised for too anxious apprehensions, than ruined by too confident a security. --Edmund Burke
The value of XML is not the structure of the data. The tags, nodes, elements and attributes are just another format for parsing data. The power comes with the ability to VALIDATE the format. No other data exchange format has such an integrated approach to assuring the validity AND structure of data. Also, the hirearchical nature of XML makes it idealy suited to most information sets. It also, takes the organization of relationalal data to another level because node groupings inherently define a relationship between the information that is contained in the document. XML as just XML isn't that special but the ability to nest information and validate the structure make XML a more *reliable* data format. In the world of CSV files, ini files, and excel spread sheets and the like, it is a welcome change. As the tools evolve to take the comlexity out of creating things such as schemas. XML's potential as an interchange format will be fully realized. As for its verbosity, it is needed. The less structure the more the format is left open to interpretation.
I was going to mention RELAX, but since your post is already here I'll just add a few links:
Official?? site of RELAX (RELAX earthiling! we come in peace!)
OASIS on Relax-ng (much more dry).
I'm not sure it would be so bad if both standards came to be popular. A few years ago at an XML conference one of the speakers described the XML world being split into three camps - data modelers (who would be backing XML-Schema), Document-centric folks (who would back RELAX), and one other group (whose leanings I forget but I guess they don't care about typed XML documents!!). Having a data-centric and document-centric approach to XML might not be so bad, each having good uses in different scenarios.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
" Modded me down again, huh? You little jackass. I'll get you, punk! I'll GET YOU!"
Don't forget his dog too.
You are really reaching. Gzip is not that processor intensive for a 3k gzip file!! Even older 486's should be OK with that.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
I reallly like the approach of the REST guys, a much lighter weight and more intuitive approach to web services than SOAP.
Basically, they are saying - use HTTP as it was intened to be used, not abusing it in a way it was not meant to be abused.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
That is incredibly helpful and well written.
You just gained a fan!
The man who trades freedom for security does not deserve nor will he ever receive either. - Benjamin Franklin
I see a lot of replies here that flame XML as an RPC protocol. Using XML as a message format for RPC is just one small part of what you can do with it.
:-)
The real strength you get for free when using XML is that you can use standard parsers and transformers to handle different kinds of data in a uniform manner.
I'll give you an example..
Today I was working on my wxWindows application, and I needed to translate (i18n) a lot of windows, dialogs and menus to a few languages. The resources are specified in XRC which is an XML format.
Within XRC files labels for buttons and other widgets are written in the language they were created in, usually english.
What I did was write an XML catalog file for each language. I then ran an XSLT processor with the XRC files, using the catalogs as plugins with a simple pattern match rule -- and I didn't have to write one single line of customized code to do it.
Though for RPC I'll stick to CORBA with IIOP any day
HTML is a subset of XML - an alternative to the bloated XMl language.
believe me, you wont use XML (and those pesky XSLTs) anymore if you once tried HTML
AND (most importantly) in virtually every single web browser that you can find, support for viewing this format over the internet is available and built into the browser itself!
Karma: NaN
I recently decided to go with RNG for my schemas after reading up on W3C XML Schema (WXS) and Relax NG (RNG) . RNG is just so much easier to read and understand. The real clincher for me was the inability in WXS 1.0 to describe non-deterministic structures. I mean, give me a break. I can't allow people to put the elements in a different order? That's just lame.
What's more there's a fantastic tool dtdinst that converts DTDs into Relax NG. There's also tools to convert back and forth between WXS and RNG. So if I ever need to provide someone with a WXS schema I can just run it off automatically.
Now I'm working on a system using AxKit to parse out the RNG schema, generate HTML forms for completion, roundtrip the data back to the server, assemble an instance document using DOM and display it using XSLT and CSS. But that's another story. People who don't "get" XML should really check out AxKit.
simon
home page
I hate snobby XML evangelists who do nothing but propose that companies change the systems that that have been working for them for 50 years, all to XML and the dozens of related technologies.
People should realize that XML adoption is a matter of degrees and it is perfectly acceptable to adopt just the concept of a simple markup language. Adoption of XML should NOT necessitate the adoption of XSLT, XHTML, SOAP, and every thing else under the sun that is related to it.
If you have a database system with a protocol for passing messages, then maybe add a new type of XML message with a privately known schema. Don't rip up your system and rewrite it as an XML web service, have it talk only the SQL Server XML services, only use XSLT to transform data, and only XPath for retrieving data.
I'm so fucking sick to death of overarchitects who never actually seem to get the job done on time. (And I'm no XP fan, either, so that's saying a lot!)
I would happily play around with XML Schema if only my Emacs/PSGML mode would accept a schema and treat it in the same way as it treats a DTD.
:-)
And sorry, I have neither the time to write my own Emacs mode nor the money to buy commercial XML tools.
Well, so I keep watching the tools and if they are Schema ready then so am I.
What happens when you want to have an Alpha box, a Pentium box, a handheld device, and an UltraSPARC box talk to one another? Simple, right? After all, an int is always 32 bits...err...umm...and everything is big endian...err...ummm...and all architectures use the same data structure padding... Well, at least your program took care of the padding issue...for that one data structure.
Wups! We've got a core dump waiting to happen. Okay, so we'll just make sure that everyone is using the same sizes and padding all around for any data structure I may need to pass over the network. Of course, this requires a mapping layer so I don't have to do this for every app and data structure that I write. I know: it's for interfaces and defines the general structure. I'll call it Interface Definition Language or IDL for short. Now I'll make sure that all of this information serializes to the network correctly and decodes on the other end without errors. This will be kind of like a stock broker it that I tell it what I want, and it translates it into something usable but more complex than I need to deal with for each app. I think I'll call it a broker too...an object broker...wait...missing something...messages going back and forth...asking for resources...aha! Object Request Broker! Yeah! Oh wait, but people may have different implementations and I want to be able to work with others. Let's agree on this. We'll call it the Common Object Request Broker...ummm...Architecture! Yeah, that's it!
Hmmm...now I need to make a configuration file for my program. I'll make it plain text. Hmmm...but it needs some kind of structure. I'll make it key/value pairs -- just put in a few equal signs and I'm done. Uh oh. My program is fairly modular, but I want to keep all of the settings in one place. If it's just key/value pairs, everything will get jumbled together. I know! I'll use an INI file. Microsoft used to use those to group items together. Now I can just use those nifty GetPrivateProfileString calls, specify the group and the key, and away I go. But uh oh! I have this subcomponent that requires a group within a group. Let me hack something together... Argh! This data file is getting tougher and tougher to parse. I want to finish writing my program that does something useful, not fiddle away at a dumb configuration file parser. What I need is a standardized, hierarchical format that is still plain text and human readable. Hmmm...what's this "XML" thing? I can have the configuration all in memory or read it in piecemeal? Parsers are already written? If I don't like the parser I'm using, I can just plug in another one? I can read the file from any programming language out there? Sign me up!
FYI: This binary vs. "plain text" tripe needs to go away. All text files are binary files. What is the letter 'B' but a 0x42 (66 in decimal, 01000010 in binary)? It's a piece of translation software that turns that 0x42 into the character 'B' on our screens. I just so happens that <foo/> is clearer to the human eye -- after the preliminary software translation step -- than a serialized C data structure. Clearer to the human eye means that the human fixing bugs can see the error faster. CPUs are hovering aroung the 3GHz range now, but the human mind seems to be falling further and further behind Moore's Law. Perhaps we should help the human mind out a bit and give a bit more work to the CPU.
Yes...I know...I'm a dick. I'm comfortable with that.
- I don't need to go outside, my CRT tan'll do me just fine.
At my firm we use XML to store all our Data. OK, the Data consists of Laws and other Text. But with XSL we can easily convert the Data into HTML-Files (Online/CD-Rom) or in LaTex-Files (Books). 1 Format for everything. So XML/XSL is not only INI/CSV Files...
The usefulness of XML schemas and the XML language in general comes from the fact that it's a standard.
Sure, you could do the same things with comma-delimited text files. But are there XSLT processors for comma delimited files? Could you easily transform a comma delimited file into:
1) an HTML file
2) an office 2002 file
3) an open office file
4) a pdf
5) etc.
6) different versions of all of the above that presented the data in a different manner
You could do it, but it wouldn't be easy. The fact that the entire industry has standardized on XML and XML schemas makes many things much simpler than before. That's it.
There is nothing magical about the language except for the fact that it is a standard.
In fact I use it all the time.
I work on a distributed Solaris based system (major test platform for a MAJOR telco) that uses CORBA for:
- Internal inter-process communications, transparently calling other procesess whether on the same host or not
- Interfacing with underlying test systems running on other platforms (some of which are nice enough to use standards based access via CORBA)
- Providing access to our system from clients running on a range of systems (incl. Sun, Windows) in both C++ and Java
CORBA is great if you use it for what it's meant for!
Suggest you (and anyone else finding W3C XML Schema confusing and/or difficult) take a look at RELAX-NG.
RELAX-NG is an alternative schema language that is much more flexible and easy to use than W3C XML Schema. RELAX-NG also defines a non-XML syntax that makes it even easier to work with, since the non-XML syntax can be translated into XML using a tools such as Trang. There are many other tools that make it easy to work with RELAX-NG.
Eric Van der Vlist is working on a RELAX-NG book that you can read at http://books.xmlschemata.org/relaxng/
Take a look at RELAX-NG - you might never want to work with XML Schema again...
We can even describe a subset of lisp sexes that would be more or less isomorphic to XML. Not all lisp sexes work (I believe, but would need to read the XML specification carefully to pontificate) : (1 2 3) would need to be something like : ... ) the mapping would be pretty complete.
<> 1 2 3 </>
but if all the sexes started with (atom (attribute-list)
So, if the notations are isomorphic (that is if there is a deterministic mapping from each to the other where composition gives the identity on both sides) there is no problem. Write a program that does the mapping and then work in whichever syntax you prefer.
I'd suspect though that such a mapping is tough. I remember watching a debate on an XML mailing list about the fact that it was difficult to produce a canonical form for an XML structure. I don't know if that has been resolved, but I do remember that some of the problems raised were quite subtle and difficult.
Which, if true, would mean that the structures (XML and Lisp) would be more homomorphic (whatever that means here) than isomorphic. Which (at least for those of us for whom "homomorphism" is not a scary word) raises questions of just what is, and is not, preserved.
That said, the parallel with lisp is quite interesting and productive. On a couple of occasions I've found that converting XML to lisp and doing some lisp magic the converting back has been productive and far easier than using the XML tools available.
Even better the analogy with lisp raises some very interesting ideas that I keep wanting to get around to exploring - but don't.
When James Clark speaks on the subject of SGML/XML/... I listen. And having looked at RELAX, DTD's and SCHEMA, and having discovered that none of them are easy to do correctly, I tend to favor RELAX. Largely because James Clark does usually know what he's talking about.
The more I look at XML and the whole circus surrounding it, the more I get the impression that these people are doing nothing more than reinventing the whole of computer science in between tags.
For example, XML Schema and RELAX NG both look like nothing more than minor extensions on regular expression matching. It looks like so much new notation but no new content.
What real insights are being gained here, if any?
Might I also add "storage".
But that word "ALL" up there. Thats a lot. Think of all the different data transfer/storage mechanisms people have invented. Unix types should go browse /etc for config files - passwd isn't the same as termcap isnt the same as sudoers isn't the same as sshd config isn't the same as ....
And so on. And on. And on. Think of all the one off formats people have invented. Mail (message format). HTML. HTTP. SMTP. RPM. passwd, printcap, procmail ...
Try to come up with a general markup mechanism for various kinds of data and you'll see how tough it is.
Of course, the difficulty is that to make XML powerful enough to handle these varied kinds of data representations, it gets big and pointy. And there are difficult problems not very far beneath the surface - XML does not solve them all by any means, but does provide at least a starting point.
And there are many more places where XML could at least provide a starting point.
XML is just a way to markup certain kinds of structured data. Thats a lot, and XML is a non-trivial solution (or set of solutions) to a non-trivial problem.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<text xmlns="urn:iana:xml:ns:cruft-1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema instance"
xsi:schemaLocation="urn:iana:xml:ns:cruft my_ass.xsd">
<phrase length="11" language="english" subfamily="us" charset="latin-1">
<word length="5" capitalized="yes" propernoun="no">
<letter capital="yes" type="ASCII" bits="8">H</letter>
<letter capital="no" type="ASCII" bits="8">e</letter>
<letter capital="no" type="ASCII" bits="8">l</letter>
<letter capital="no" type="ASCII" bits="8">l</letter>
<letter capital="no" type="ASCII" bits="8">o</letter>
</word>
<space breaking="no" width="normal" type="ASCII" bits="8"></space>
<word length="5" capitalized="yes" propernoun="no">
<letter capital="yes" type="ASCII" bits="8">W</letter>
<letter capital="no" type="ASCII" bits="8">o</letter>
<letter capital="no" type="ASCII" bits="8">r</letter>
<letter capital="no" type="ASCII" bits="8">l</letter>
<letter capital="no" type="ASCII" bits="8">d</letter>
</word>
</phrase>
</text>
This is my sig.
you're writing about a substitution for XML, and it's intriguing, but the conversation - DTDs vs. stylesheets - is discussing VALIDATING these data schemes.
the point of them is to make sure that (dynamically generated?) XML will fit the criteria of your app.
follow?
WTF? What is wrong with you people?
now here is some random stuff to deal with the "don't use so many caps it's like yelling" lameness filter. wow it sure takes a lot of text to do this. come to think of it, it's a wonder this whole article and all of its comments were not thrown out for too many caps.
So ease of collaboration is the primary benefit?
You can't judge a book by the way it wears its hair.