Slashdot Mirror


DTD vs. XML Schema

AShocka writes "The W3C XML Schema Working Group has released the first public Working Draft of Requirements for XML Schema 1.1. Schemas are technology for specifying and constraining the structure of XML documents. The draft adds functionality and clarifies the XML Schema Recommendation Part 1 and Part 2. The XML Schema Valid FAQ highlights development issues and resources using XML Schema. This article at webmasterbase.com addresses the XML DTDs Vs XML Schema issue. Also see the W3C Conversion Tool from DTD to XML Schema and other XML Schema/DTD Editors."

17 of 248 comments (clear)

  1. One is derided, one is end-of-life'd by Ars-Fartsica · · Score: 3, Interesting
    DTDs are being deprecated one way or another.

    While the W3 continues to push Schema, they are also forming working groups for RELAX after pressure from XML luminaries such as James Clark.

  2. DTDs are broken by Zeinfeld · · Score: 3, Interesting
    DTDs are a hangover from SGML that will eventually go away. The big problem with DTDs is that they only define syntax, there is no data model. The syntax model isn't all that hot either, SGML was designed by a lawyer who hadn't heard of finite state machines, let alone Chomsky grammars.

    XML Schema is also kinda whacked. It shows all the signs of being a committee specification.

    The big problem with schema is that you actually have two type systems going. Element definitions are types for elements. Type definitions are actualy types for types for elements. I saw a hopelessly confused attempt by some UML people to express XML schema in UML, they simply could not understand that there was no way it could ever work. UML has completely different semantics.

    There are a bunch of schema proposals that folk have said good things about. Eve keeps telling me I should look at Relax. But for the time being XML schema is going to be the basis for standards in W3C and OASIS.

    There might be an opportunity to do a clean up job on XML schema in 4 or 5 years but that will only happen if it is causing real problems.

    --
    Looking for an Information Security student project suggestion?
    Try http://dotcrimeManifesto.com/
  3. RelaxNG by ine8181 · · Score: 2, Interesting

    RelaxNG vs. W3C Schema makes a much more interesting discussions. DTD is obsolete in many ways... and most of the XML parsers support schema now.

  4. Re:Who needs XML when you got PXML? by Anonymous Coward · · Score: 5, Interesting

    Better yet, use S-Expressions.
    There are tons of parsers available.
    markup is simple:
    (this_is_the_tag
    this is all data
    (except_this_is_a_nested_tag with still more data))

    Even better still, there are customizable parsers available that can treat these S-Expression as data OR interpret them as program OR a combination of both. One such parser is called "Lisp". Once again, several implementations are available.
    Note that things like S-Expressions and Lisp have only been around for 40 years so you might want to give these technologies some time to mature.

  5. Validating with XML Schemas by UpLateDrinkingCoffee · · Score: 3, Interesting
    I'm wondering, who actually validates their XML at runtime using XML schemas? We do, but most of our XML is used for configuration files where the overhead doesn't affect overall app performance too much (read once at the beginning). One issue we run into is the validation chain.. the XML document refers to it's schema (accessible via URL on the LAN hopefully) and those schemas refer to the "master" XSD schema. This is where we have had trouble, because we usually point it to the w3c master... if the internet is down, so is our app!

    It's occurred to me maybe we are being too diligent in actually validating the schema itself, but I'm wondering what others think?

  6. XML Schemas aren't just for validation by Osty · · Score: 4, Interesting

    I can't believe nobody's mentioned this yet. Microsoft has a tool that will do several things:

    • Generate an XML Schema from an XDR or XML file.
    • Generate source code (C#, VB, or JScript) from an XSD file (XML Schema file).
    • Generate XML Schemas for one or more types in a given assembly file.

    This makes writing your XSD almost trivial. The code-generation capabilities are very powerful, as well, as you can generate runtime classes for serialization/deserialization or classes derived from DataSet so you can treat XML files like any other database, etc. It's very useful if you're doing any .NET framework programming.

    I'd be very surprised if there weren't other tools out there doing similar things. I simply mentioned xsd.exe because that's what I'm familiar with.

  7. only partially agree by u19925 · · Score: 2, Interesting
    the thing that i don't like about dtd as well as schema is that they flag documents as invalid if it contains extra stuff. i guess there should be a validation mode which should flag document as valid as long as it contains atleast those stuff that i need it (and ignore additional stuff). e.g. a document may contains book name, author and price. I may be interested only in name and price. why should I consider such a document to be invalid? also, why should I validate whether the author name is in the correct format? can i just apply partial DTD (which contains only name and price) and ask the parser to validate the doc? Not at the moment.

    I don't agree with you that schema validation is useless. In many cases the documents are fully processed for business rules much later, but you want acknowledgement that your document has reached correctly and it passes atleast the most basic validation (e.g. dtd or schema validation). XML Schema do wonderful job at that. In our case, we always keep schema validation on new doc types until the system is stable and bug free and then remove validation for efficiency (for internal docs). We have discovered many subtle bugs in system which would have been extremely hard to track by looking at application error but were easier to find by looking at parser errors.

  8. Relax-NG is a Draft ISO Standard by Euphonious+Coward · · Score: 2, Interesting
    I gather that Relax-NG is on track to become an ISO Standard. Regardless of what happens with W3C, the ISO's XML schema based on Relax-NG won't go away. Given its natural advantages -- including the enormously greater ease of implementing it -- we might expect to see many more tools built around it.

    It would be somewhat unfortunate if both end up popular, because it will be more work to maintain both sets of tools than either one alone. That's probably what will happen, though, at least in the short term.

  9. They already addressed this issue by ebcdic · · Score: 2, Interesting

    The Schema WG decided on "schemas" so as not to add unexpected obscurity to the specification.

    See this message.

    Expected obscurity is of course just fine.

  10. Re:All this hype about XML by axxackall · · Score: 3, Interesting
    Great thing about XML, is if you need to convert your communications, you can write XSLT against it to convert it while you convert your XML source.. easily.

    Great thing about Lisp, is if you need to convert your communications, you can write Lisp against it to convert it while you convert your Lisp source.. easily.

    I plopped an XSLT processor in front of it. Took minutes to implement. In the mean time, I was able to properly rewrite the XML producing code. So I had some flexibility in terms of patching the protocol quickly, while taking the weeks I needed to fix things right.

    I plopped a Lisp processor in front of it. Took minutes to implement. In the mean time, I was able to properly rewrite the Lisp producing code. So I had some flexibility in terms of patching the protocol quickly, while taking the weeks I needed to fix things right.

    the point is, XML IS descriptive, so long as you use good names.

    the point is, Lisp IS descriptive, so long as you use good names.

    If you use XML to develop a lower level protocol you end up with bloated 10k messages.

    If you use Lisp S-expressions to develop a lower level protocol you don't end up with bloated 10k messages.

    Besides, in Common Lisp you'll really appreciate MOP - Meta-Object Protocol. Much better than SOAP.

    Trust me, I know well, actively use and actually love both Lisp *AND* XML.

    --

    Less is more !
  11. Ironic, no, really... by rodentia · · Score: 3, Interesting

    that the same applications of XML that drive the keening about bloat and hype seen in these comments are precisely those which are driving the specs to the wrong side of the 80/20 for XML/XSL's original goals: bringing the semantic power of SGML and DSSSL to the Web. Goals for which its purist cousins RelaxNG, REST, et. al. remain admirably suited.

    The back-end curmudgeons are right, XML stinks for a universal wire format. But for loosely-coupled, message-based, semantically-rich systems it is hard to beat. And document-oriented systems which don't use XML barely deserve notice any longer.

    I gently refer s-expression trolls to paul and oleg

    --
    illegitimii non ingravare
  12. XML and Schemas by Zebra_X · · Score: 2, Interesting

    The value of XML is not the structure of the data. The tags, nodes, elements and attributes are just another format for parsing data. The power comes with the ability to VALIDATE the format. No other data exchange format has such an integrated approach to assuring the validity AND structure of data. Also, the hirearchical nature of XML makes it idealy suited to most information sets. It also, takes the organization of relationalal data to another level because node groupings inherently define a relationship between the information that is contained in the document. XML as just XML isn't that special but the ability to nest information and validate the structure make XML a more *reliable* data format. In the world of CSV files, ini files, and excel spread sheets and the like, it is a welcome change. As the tools evolve to take the comlexity out of creating things such as schemas. XML's potential as an interchange format will be fully realized. As for its verbosity, it is needed. The less structure the more the format is left open to interpretation.

  13. Re:Who needs XML when you got PXML? by 21mhz · · Score: 3, Interesting
    Better yet, use S-Expressions.
    There are tons of parsers available.


    How does one specify the character set in some, imagined or real, S-Expression markup? Do these "tons of parsers" support Unicode at least? Where to put processing instructions? Character entities? External entities? "Raw data" sections with markup suppressed? How does one specify the document type identifier? Namespaces? All these things fulfill important tasks for XML to be an universal, yet concise, markup language, and all this can make your dreamt-up S-Expression language as contrived as XML is sometimes perceived to be.
    (this_is_the_tag
    this is all data
    (except_this_is_a_nested_tag with still more data))
    Attributes, I presume, are out of our concern? You note that the means for syntactic description of data trees are around for 40 years. Yet there was yearning for something more... handy, or something. Doesn't it give any hint to you?
    --
    My exception safety is -fno-exceptions.
  14. Tools availability? by Advocadus+Diaboli · · Score: 2, Interesting

    I would happily play around with XML Schema if only my Emacs/PSGML mode would accept a schema and treat it in the same way as it treats a DTD.

    And sorry, I have neither the time to write my own Emacs mode nor the money to buy commercial XML tools.

    Well, so I keep watching the tools and if they are Schema ready then so am I. :-)

  15. Re:All this hype about XML by envelope · · Score: 2, Interesting

    XML IS descriptive, so long as you use good names. Naming elements a, b and c is just developer fault.

    It is not just a matter of using good, descriptive names. Whatever code is reading the xml is going to have to know what the names mean. A program reading xml could care less if the name is "a" or "AVeryMeaningfulName"

    --

    appended to the end of comments you post, 120 chars
  16. Power is in the standard, not the technolegy by Anonymous Coward · · Score: 1, Interesting

    The usefulness of XML schemas and the XML language in general comes from the fact that it's a standard.

    Sure, you could do the same things with comma-delimited text files. But are there XSLT processors for comma delimited files? Could you easily transform a comma delimited file into:
    1) an HTML file
    2) an office 2002 file
    3) an open office file
    4) a pdf
    5) etc.
    6) different versions of all of the above that presented the data in a different manner

    You could do it, but it wouldn't be easy. The fact that the entire industry has standardized on XML and XML schemas makes many things much simpler than before. That's it.

    There is nothing magical about the language except for the fact that it is a standard.

  17. Re:Power by MarkWatson · · Score: 2, Interesting

    Hello David,

    I also just read through the RELAX NG tutorial and I am now looking at Bali (for generating Java RELAX NG validators).

    Good stuff! I agree with the other poster that W3C should punt on XML Schemas.

    That said, I think that for the forseeable future, that simply
    using DTDs works well because all the hooks are already
    in place for the popular XML parsers.

    I suppose the next step would be to get Xerces and other
    XML parsers to natively support RELAX NG (I have to look
    to see if Clark has such a parser already :-)

    - Mark Watson
    - Free web books: www.markwatson.com