XML Schema a W3C Recommendation
J1 writes: "The World Wide Web Consortium has officially given its Stamp of Approval to the XML Schema specification. This makes it an official W3C Recommendation. The press release has the details."
← Back to Stories (view on slashdot.org)
XML: a standardized framework for creating
incompatible data formats.
Schemas are much more powerful than DTDs. They do not only allow you to specify the structure of the tags in a very flexible way, they also make it possible to do type checking on attributes, make the substructure of a tag dependant on an attribute value, etc. ...
So schemas are what DTDs never were: A really useful tool to check your XML, not just some simple sanity checks on the coarse structure
But you really need to read at least the primer (part 3) to appreciate what you can do with Schemas. They are *very* complex.
EagerEyes.org: Visualization and Visual Communication
After following Schema from its introduction a while back I just briefly looked at it and said "Another HTML Markup Language" and tossed it to the back of my mind. I had worked at a company who built a product exclusively using XML which had been hacked up to make it useful enough for the company, and found most of it lacking for the Unix side of things where programming was concerned, not interactive webpages, strictly lacking as complete portable solution.
Often I wonder when I hear these news stories about new protocols appearing, just how long will they last, and how much of an impact would they have in reality, ePerl, PHP, etc., and often I hear of one "standard" coming out only to be overshadowed by another one in the making. So not to troll but how many people are actually looking forward to this becoming a standard? Aren't the current available languages enough?
I guess it depends on what someone wants to do, but in all honesty I feel the market for things are becoming so saturated with so many different variations claiming to be the best thing, yet from what I see many people often use the standard norms available just fine.
So how exactly is this beneficial to achieve what you already can using the standards? Sometimes the language can be so confusing when your in the midst of nailing "the next best protocol" which was overshadowed just a second ago, and now you have to tweak what you already know to jump on this latest 'technology` all because its been endorsed, or recommended. Maybe its me not being innovative enough to really look at Schema for its face value, but all I see is another language. Not a big deal.
Please don't flame this, don't think I'm being arrogant, or trollish, just posting an honest thought to see some insightgul replies. Sure I joke here and there, but I would like some enlightenment.
Want Root?
A DTD defines what is allowable in an XML document. XML schema is just a different way to do this. DTD's have always been available to describe XML documents.
An XML document is well-formed if it adheres to the XML syntax specification; it is valid if it adheres to a DTD. XML documents do not have to be valid - i.e. do not required to have a DTD.
This doesn't come as a great suprise. In the release, Tim Berners Lee is the W3C director that gets quoted saying how great XML schema is. Since his new fangled Semantic Web relies on the mainstream acceptance of XML schema what else is he going to say?
I think he meant that XML, XMLSchema and XSL are all XML format so you can use a single XML parser with them. DTD and CSS files are not XML.
All M$ has to do is add a few characters to the start and end of every file format they currently use. For example, something like this:
Then they can hold a press conference to proudly announce: "Microsoft Office is the only suite that is 100% XML compliant!" The word XML is like the word consultant -- it can hold so many meanings that it's pretty much meaningless.
I'm rather clueless when it comes to XML, but I thought that a DTD did what the schemas seem to do. What exactly is the difference between them?
Suppose you were an idiot. And suppose that you were a member of Congress. But I repeat myself.
Give a man a fire, and he'll be warm for a day, but set him on fire, and he'll be warm for the rest of his life.
have W3C standards ever meant that I can get solid cross-platform, cross-browser compatibility on my (correctly coded) web pages, six months, a year, two years down the line?
The major fifth-generation web browsers (Mozilla, IE 5.x, Konqueror, Opera, etc.) support most of CSS1 and CSS2. If a page crashes 4.x browsers, that's the fault of the 4.x browser user for not installing a 5.x browser. 5.x browsers don't use that much more resources than 4.x browsers; see also Galeon and K-Meleon.
If people use shitty browsers, that's their problem.
Will I retire or break 10K?
So you dont think M$ can find a way to make 'just their product' compatible with XML?
They won't E&E XML too soon. They're still working on embracing and extending TCP/IP and ZIP codes.
After all, XML, including Schema, is just a way to format your data to make it easy for other machines to parse. It doesn't help you understand what the data means.
What MS is likely to do is to send data while not documenting the meaning of the data, and then claim that they're "Standards Compliant." I can just see the schema, defining fields such as
<xsd:element name="reserved" type=ObfuscatedType>
<xsd:complexType name="ObfuscatedType">
<xsd:sequence>
<xsd:element name="undoc1" type="xsd:integer"%gt;
<xsd:element name="undoc2" type="xsd:boolean>
</xsd:sequence>
<xsd:attribute name="BillsSecretCode" type="xsd:string"/>
</xsd:complexType>
It may be standards compliant, but without Microsoft Secret Decoder Ring 2001, it won't do you much good.
Well, not exactly. Kweelt is a development from Quilt, and so is XQuery. That means they probably have a lot in common. But i still wait for a "standard" to evolve, thats what W3C is for :)
Even if it supports all requirements, as long as it's not a standard.. its not really useful in the long run.
Probable impossibilities are to be preferred to improbable possibilities.
Aristotele
Well there's always one thing... there is no way to make good use of it yet ;)
XML (at current) doesnt have a query language, which means you dont have that much to use XML for. Sure you can repressent documents, and with stylesheets rewrap them into a design of your choise, but large-scale use are yet to come.
What we are waiting for is XQuery, that will hopefully make a big difference :-)
Probable impossibilities are to be preferred to improbable possibilities.
Aristotele
http://castor.exolab.org
So what is the big deal with XML Schema? XML Schema is important because it provides the widgets to define a "card catalogue" for your library of data, be it air plane parts, phone bill, hotel reservations, or porn.
Now metadata has been with us since the mud table libraries of Mesopotamia (they had indexes of stuff so they could find how many cows were traded in the Xth year of SomeRulerDude), however the printing press is what made all the difference. You see, before the printing press books were so expensive and time consuming to write, there were not that many of them. The general strategy to manage a library was an index of all the books. As long as the book population was not too big, then this works. For example, when you search on google for "McCain", you get congressman, porn sites, and damn near everything in between. Search engines today are just really, really big indexes of stuff. Still in the stone ages, aye?
The printing press changed that and forced libraries to find an EXTENSIBLE way to keep up with books. The Dewey Decimal System is a great example. So I pose to you the following question, "When was the last time the DDS was updated?" Well, how long have they been publishing books on computer science, biogenetics, or nanotechnology. The DDS is an extensible system to classify knowledge. So I leave you with the following statement...
HTML was the functional equivalent of the printing press, which is just an electronic version of fast, cheap publication. HTML forced us to follow down the path of XML, just like the printing press forced Mister Dewey to put on his thinking cap. The only difference is that the printing press took a few hundred years to do its thing where HTML only took a few years to do its thing.
Now for all the other XML specs out there (SAX & DOM, RDF, XSLT, XHTML, XPointer, etc) are just tools to work with your (library of) data. Better to have many specialized tools that can evolve independently than one big honking tool, aye? Use only the tools you need.
So does TBL's dream of a semantic web make more sense now...?
If you want some links, try...
Danny Hillis - The big picture
Roger Costello's XML Schema Tutorials
"You can drive a car by looking in the rear view mirror as long as nothing is ahead of you. Not enough software professionals are engaged in forward thinking." - Bill Joy
While somewhat important, I think that people give data validation far too high of a priority. People seem to think that "self-describing data" is going to save the world in the same way that XML was supposed to eliminate the need for parsing and interpretation of information by a computer program. I've been involved in using XML to exchange information and make remote invocations of services in a Web environment, and you still have to write programs to interpret the contents of the XML information in pretty much the same way as with data exchanged in any format.
So you can automatically validate it. So frikking what! The rabid theoriticians in the consortium of people that I work with get all hung up on this without realizing that most functioning protocols out there are able to exchange information without the need for a formal validation model. Not that you would really want to use one on either the generation or consumption side of a real system, since it just slows things down. All you need is a clear spec for the protocol.
Another thing that bugs me is the fiercely defended text-only approach used in XML. For some reason, XML fans seem to think that computers cannot exchange and understand binary data, or that editing tools would be unable to allow people to see it.
The text-only approach has two major limitations. First, there's no way to directly include binary data. There's lots of binary-encoded objects out there, like image or sound file formats, but you have to encode it in BASE64 or something. This is pretty strangely limited given that XML data is generally exchanged over an 8-bit clean pipe (i.e., the Web). Something like:
<xml:binary size=10>kjiu õéçäá</xml:binary>
would be quite reasonable, with "size" octets placed directly between the closing '>' of the opening tag and the opening '' of the close tag. They should have included a mechanism in XML Schema to declare this.
The second problem with text is the high cost of parsing it. Probably the majority of time spent in a system that processes a large bulk of XML data is spent in the lexical analysis stage of consuming the XML stream. They had their big chance with binary-WAP-XML, or whatever they called it, but that seems to be kind of screwed up and includes patented technology. What is needed is a simple, widely acceptable binary encoding of exactly the information included in XML text, which uses lookup tables to optimize handling tag names.
The third problem with exchanging raw text XML encoded data is that it explodes the information you want to ship over the Web by a factor of about 20 times. It needs to become commonly accepted practice to, at least, exchange this information in a compressed format, such as GZIP. The MIME tags really need to be updated too, to allow a nesting of formats, to say "this is a gzip-compressed stream of Bob's fabulous graphics markup format encoded in XML".
What we are waiting for is XQuery, that will hopefully make a big difference :-)
You might want to have a look at Kweelt which claim to (and I quote) "implements a query language for XML that satisfies all the requirements from the W3C query-language-requirements"
Is this what you're waiting for?
--jw
There's a host of languages you can use to pull subsets of XML data out. Everything from XPath expressions with XSLT to building DOM trees or SAX parsers to manipulate the data with your favorite programming language. That's as powerful as you can get.
large-scale use are yet to come.
Reuters produces all their news in XML format. There's a contant stream that comes in at a few MB an hour. That's a massice scale use if you ask me.
To all you stating that now this is a "standard" organizations will start "breaking" it:
It is not a standard, it is an official W3C recommendation. And part of the process of making it a standard is for developers to experiment with it to see what works and what doesn't. So whereas some propietary extensions die out, some survive and become part of the standard.
So now that it's a standard:
Microsoft already supposed XSD schemas in the MSXML 4 preview release. Microsoft has been more of a force in pushing the implementation of XML than any other company, so to fault them unjustly seems quite silly.
Just look at how unified html has become!
I am not a spork.
XML has needed a truly powerful schema language to enforce data constraints in data-heavy documents. This is very much akin to having database schema for databases. With a declarative language and a common processor enforcing primary constraints on data, you free each application from having to do their own consistency checks.
XML Schema has a lot of powerful features, including the separation of types from structure, two kinds of type inheritance, modularization, default values for attributes and simple elements, and the flexibility to be as strict or as lax as the situation dictates for validation.
Having said that, the big battle brewing is whether XML Schema is going to be shoehorned into all the other XML protocals that need a data model description before there's been a wide base of practical experience developed. There's already a divide between data modelers and application developers because of the specialized knowledge that SQL and relational database design imposes; I think XML Schema does nothing to narrow that gap, which is unfortunate since class hierarchies and the hierarchical data model of XML seem a natural fit.
If you post it, they will read.