Tim Bray Says RELAX
twofish writes to tell us that Sun's Tim Bray (co-editor of XML and the XML namespace specifications) has posted a blog entry suggesting RELAX NG be used instead of the W3C XML Schema. From the blog: "W3C XML Schemas (XSD) suck. They are hard to read, hard to write, hard to understand, have interoperability problems, and are unable to describe lots of things you want to do all the time in XML. Schemas based on Relax NG, also known as ISO Standard 19757, are easy to write, easy to read, are backed by a rigorous formalism for interoperability, and can describe immensely more different XML constructs."
When you want to come.
On the other hand, RELAX NG "just works".
(all IME of course...:)
ant.
Has anyone here ever tried to read an XML schema for anything relatively complex? It's a nightmare. RELAX looks much cleaner and more direct, which I wholeheartedly approve of.
using namespace slashdot;
troll::post();
"W3C XML Schemas (XSD) suck"
Hey Tim, don't hold back, tell us what you really think.
xml is a b**ch to read
Don't forget what we used to use... binary is even worse. XML was designed with people in mind, which is why it's easier for people to read and manipulate than your traditional binary file format.
Helpful hint for understanding the above: Tim Bray, author of TFA, is one of the guys who originally developed and spec'd out XML. Really. His name's on the spec and everything. So if he says that a particular XML tool has problems, it's probably a good idea to take him at his word ;)
And if you can't have a DB connection?
:(
For flat data, sure a flat file is fine...for structured/hierarchical data, a flat file is
I refuse to use XML in any shape way or form no matter what anyone say or does with it!!!
Check out YAML.
XML would be great if people validated their XML files before sending them out. And cut the verbosity and redundancy down by 90%. And used english elements instead of numbers. Ahh XML, the ideal most people pay lip service to but up to which they fail to live.
Between this standard and REST, it looks like we have some very lazy web services, RESTing and RELAX NG all the time . . .
Why the hell would you ever have to use flat file or xml for data/hierarchy anyway ?
now even for little stuff we use freely available databases and small snippets.
Read radical news here
I can send you a Dataset for your application needs or I can send you All of the data in a series of flat files that you can then manipulate with code/import to relational database.
I reject the XML self documenting data paradigm, it's just not applicable to most business processes. You are relying on the originating XML document to follow your own in house rules.
I want my data clean and neat and then I work my magic with it.
Relax NG has a compact non-XML syntax. But C++/Java is a horrible syntax to use if you want a language to be readable and easy to understand. Since when was 17 levels of operator precedence easy to understand? Of course any good programmer always uses parenthesis to avoid ambiguity, so why should a language have 17 levels of built-in ambiguity just to make it that much easier to make hard to find mistakes?
-Don
From my blog: Relax NG Compact Syntax: no to operator precedence, yes to annotations!
James Clark is a fucking genius! Hes the guy who wrote the Expat XML parser, works on Relax NG, and does tons of other important stuff. Relax NG is an ingeniously designed, elegant XML schema language based on regular expressions, which also has a compact, convenient non-xml syntax.
I totally respect the way he throws down the gauntlet on operator precedence (take that you Perl and C++ weenies!):
You can translate back and forth between Relax NG's XML and compact syntaxes with full fidelity, without losing any important information. Relax NG supports annotating the grammar with standard and custom namespaces, so you can add standard extensions and extra user defined meta-data to the grammar. That's useful for many applications like user interface generators, programming tools, editors, compilers, data binding, serialization, documentation, etc.
Here's an interesting example of a complex Relax NG application: OpenLaszlo is an XML/JavaScript based programming language, which the Laszlo compiler translates into SWF files for the Flash player. The Laszlo compiler and programming tools use this lzx.rnc Relax NG schema for the OpenLaszlo XML language. This schema contains annotations used by the Laslzo compiler to define the syntax and semantics of the XML based programming language.
The schema starts out by defining a few namespaces:
default namespace = "http://www.laszlosystems.com/2003/05/lzx" .0"
namespace rng = "http://relaxng.org/ns/structure/1.0"
namespace a = "http://relaxng.org/ns/compatibility/annotations/1
datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"
namespace lza = "http://www.laszlosystems.com/annotations/1.0"
The a: namespace defines some standard annotations like a:defaultValue, and the lza: namespace defines some custom annotations private to the Laszlo compiler like lza:visibility and lza:modifiers. Thanks to the ability to annotate the grammar, much of the syntax and semantics of the Laszlo programming language are defined directly in the Relax NG schema in the compact syntax, so any other tool can read the exact same definition the compiler is using!
To show how truly simple and elegant it is, here is the snake eating its tail: The Relax NG XML syntax, written in the Relax NG compact syntax:
# RELAX NG XML syntax specified in compact syntax.
default namespace rng = "http://relaxng.org/ns/structure/1.0"
namespace loc
Take a look and feel free: http://www.PieMenu.com
Totally agree.
While XML may have it's places (I've yet to encounter one in the commerical world), passing large amount of data is not one of them. A good flat file design is a lot more efficent than XML, and short of hardware accelartion I don't see that changing.
I'm currently trying to assist a customer, whose changing from one system to another, the current system generates flat files of approx 2gig in size every couple of days (billing data). The new system produces files of approx 13gig. The data contained within files result in the exact same bill being produced for the customers.
Needless to say, the extra diskspace (yes we do compress them), and processing time to parse/compress is such a waste.
In my mind, XML trades shorter development time / 'portability' (well so the theory goes), for greater resource usage (CPU/Disk), whereas most customers I've dealt with would rather take a little longer to develop, and have a lot less resource limitation issues on the production systems. The old methods of 'just throw more hardware at it' just don't work in the real world anymore.
I've been picking up Emacs lately, and the xml-mode standardly used (nxml-mode) uses RELAX over XML Schema. I suspect that probably says a lot for RELAX's parseability. I've had just a little bit of experience playing around with Schemas and they seem about as navigable as DTDs, which is to say not very. I haven't tried RELAX though.
then why are you using an ASCII encoding in the first place? Those tags just lower the signal to noise ratio. Even Apple's given up and started saving their meta data in a "compiled" version of XML.
Oh, and, "Hi! How you doing? Long time no see!"
Clear, Dark Skies
I'll take XML over a positional format any day, even if it only has to be looked at by human eyes 5% of the time. If you find yourself in a situation requiring eyeball examination of purchase order/shipping data at a large electronic commerce company it is likely an emergency and <ctrl>-f 'ing for a tag name, or using a web browser to check well-formedness can be a lifesaver.
while [ 1 ]; do echo -n -e "\xe2\x95\xb$((($RANDOM&1)+1))"; done
Relax NG is a great example of the triumph of Design-by-Inspired-Individuals vs. Design-by-Committee.
In The State of XML, Edd Dumbill explains the secret behind the success of Relax NG:
-Don
Take a look and feel free: http://www.PieMenu.com
Like any other formalism, it's difficult until you get used to it. The more familiar you are with a particular XML tagset and markup conventions, the easier it is to pick out the relevant structures and information. I remember being apalled at the verbosity of XSLT when I first begin to use it, but nowadays if I'm working with well structured XSLT code (and color-coding in the editor) I can scan it pretty efficiently.
That said, a non-XML syntax is almost always going to be more human-friendly. Which is another advantage of RELAX NG, of course, since it has a compact syntax that translates back and forth without loss of information to the XML form of the language.
"I may be old school, working with flat files and all for over 20 years, but I do work with a lot of newer technology."
Well I'm your counterpart in India and I'm happy to hear you're having problems getting use to newer technologies. Keep up the good work.
With a notation similar to RELAX NG compact syntax. XML has been a killer of readable formats like windows-style ini files. It tries to be readable by both human and machine and succeeds at neither. It's like programming in assembler, because it can be read by a human better than machine code and compiled faster than C.
XML was designed with people in mind, which is why it's easier for people to read and manipulate than your traditional binary file format.
Err... no.
XML was a step back from SGML's "human-friendly" clever tricks. XML was intended to be easy to PARSE, not easy to read.
Blasphemy is a human right. Blasphemophobia kills.
Tim Bray is right, and he couldn't have put it better: W3C XML Schemas (XSD) suck. The reason Relax NG is so much cleaner and more powerful than committee-designed XML Schemas, is that it's based on a sound mathematical foundation (tree regular expressions, or "hedge automata theory"). While XML-Schemas suffer from ad-hoc design, committee-burn, lack of focus, and half-baked attempts to solve too many unrelated problems.
Here's some interesting stuff from my blog about the design and development of Relax NG.
-Don
James Clark wrote about maximizing composability:
Clark describes the derivative algorithm's lazy approach to automaton construction:
The Relax NG derivative algorithm is implemented in a few hundred elegent declarative functional lines of Haskel, and also in tens of thousands of lines and hundreds of classes of highly abstract complex Java code.
Clark's Java implementation of Relax NG is called "jing", which is a Thai word meaning truthful, real, serious, no-nonsense, and ending with "ng".
Comparing the Java and Haskell implementations of Relax NG illustrates what a wicked cool and powerful language Haskell really is. The Java code must explicitly model and simulate many Haskel features like first order functions, memoization, pattern matching, partial evaluation, lazy evaluation, declarative programming, and functional programming. That requires many abstract interfaces,, concrete classes and brittle lines of code.
While the Java code is quite brittle and verbose, the Haskell code is extremely flexible and concise. Haskell is an excellent design language, a vehicle for exploring complex problem spaces, designing and testing ingenious solutions, performing practical experiments, weighin
Take a look and feel free: http://www.PieMenu.com
Speaking of XML, how much smaller would XML files be if they made one minor simple change...
Add to mean "close the matching element".
*sigh* I wish I'd been on the committee when they specified the standard.
Sometimes it's best to just let stupid people be stupid.
Yeah, well I have to look at EDI every day. I'd switch to XML in a heartbeat if it were up to me.
You picked some obvious strawmen to shoot down. XML isn't for building gigabyte databases (regardless of whether some people try to use it for that). It's for easily moving data between applications. If you think writing a flat text parser is easy, then you've never had to deal with nested data or escaped characters. Say what you will about XML, but it's nice to have one set standard that deals with all that, even if suboptimally, because I never want to write another ad-hoc parser for as long as I live. Been there, done that, have no desire to bother again.
Dewey, what part of this looks like authorities should be involved?
Of course ASCII (or UNICODE for that matter) is a binary standard as well. So special tools called text editors were created so that people could read it.
There are more sophisticated binary standards that are more efficient than ASCII and it wouldn't take a lot of effort to create viewers/editors for them as well. Of course most markup documents would be significantly smaller if tags didn't have to be S-P-E-L-L-E-D O-U-T character by character. Each HTML tag could be encoded in just two bytes with lots of room to spare.
It always fascinates me that we have no problem making customers use a new specialized tool like a browser, but it's taboo to use a non-ASCII tool for development. So we continue to structure our data as if it were going to be processed by a VT100.
If XML Schema was a work colleague they would be Wally from Dilbert - it's not that things are impossible to do with it, it's just that the relative simple things become hard and the complex almost impossible. Due to the fact that almost anything is possible with XML schema with enough work (weeks, months years...) instead of just scrapping it, people keep at it doggedly despite the number of times we get bitten. I'd love to see the community move more completely to RELAX NG if it makes my life easier.
Some companies have to interoperate with each other. And by some, I actually mean nearly all of them. Most of the data exchanged comes out of a database at some point, and as such, is naturally able to be put into a hierarchy with reasonable ease. XML, and similar formats, make this much less painful than if you had to flatten it out.
Where I currently work, I get data from several dozen other large companies. Most of it is not in XML. We generally have 2 people full time just on maintaining the parsers. If they were all in XML, the amount of maintainence required would be next to nothing.
From the xml-dev mailing list:
From: Rick Jelliffe
To: xml-dev@lists.xml.org
Date: Wed, 29 Nov 2006 12:46:06 +1100
Robert Koberg wrote:
Maybe a better analogy would be that the people who say that XSD is lovely is Mr Bush's "Mission Accomplished!"
Though of course there are differences between Iraq and XSD. One seems to be about people with their own fiefdom agendas stubbornly miring us in a quagmire, using a grabbag of thin reasons to justify it, denying any evidence that things are not rosy, perpetually promising that things are turning around, and enmeshing all sorts of decent people in a life of horror, difficulty and with no confidence in accomplishing the mission. The other is in the Middle East.
Just joking...
Rick
Take a look and feel free: http://www.PieMenu.com
This is exactly what DTDs and XSDs are there to take care of. Relying on the document to follow your own in house rules is the exact opposite of what you are supposed to do, in fact. The format document defines exactly what your parser should be doing/expecting, and if your data vendor doesn't respect that contract, its very easy to show who is in the wrong. With flat files, all I have to do is add an extra column and your parser will die. For one time imports this doesn't matter, but most business processes are not one time.
Slashdot tags are officially useless. Who the hell is going to search for "dontdoit" when looking for this article.
Mono has complete support for RelaxNG in the form of the Commons.Xml.Relaxng assembly.
In addition to RelaxNG, it provides NVDL and RNC support.
As someone who has used XML schemas pretty extensively, I was pretty amazed at how I was able to skim through the tutorial in about 10 minutes and understand Relax NG, versus reading an entire XML Schema book and still needing to refer to it whenever I write schemas.
One thing I really like about Relax NG is that it's possible (with very easy syntax) to constrain the XML structure based on an attribute value, something you can't do in schema or a DTD. For example, suppose you want to have an XML element:
true
'
With Relax NG it's possible to constrain the text in the arg element (e.g. "true" or "false") based on the value of the type attribute. For example, if type="int", you could limit the text in arg to an integer value. This is something you can't do in schemas or dtds.
The Sky is Blue!!!
Since you are simplifying your life by making the schema for web requests simpler, why not go all the way, ditch SOAP, and embrace REST for XML-over-HTTP communications?
"There is more worth loving than we have strength to love." - Brian Jay Stanley
I believe James Clark, who co-designed Relax/NG, understands and programs in Lisp pretty well (as well as Haskel, Java, C and many other languages). He helped design and implement DSSSL (wikipedia article), which is based on Scheme, and led to XSLT, which he also designed.
-Don
Take a look and feel free: http://www.PieMenu.com
It's good for transmitting information/energy, but it's not good for storing it.
-Don
Take a look and feel free: http://www.PieMenu.com
well, if your company have developed a mutual interface for 12 companies at the start, then you wouldnt need to have people parsing the data to your database.
Read radical news here
I call this the Line of View (as in PoV) or 'Horizon' Problem. The general problem is this: In XML we've got a standard that is universal for displaying n-dimensional structures in a basically 1-dimensional enviroment. (For the time being, we're ignoring that XML text ususally goes from left to right and top to bottom, making that something 2D to look at) ... At some point you will have to look at in which way you want to handle your stuff and which way you're going to unravel it. This will undoubtly influence on how much XML clutter you will have to construct. With XML it's the same as with databases: It/they will allways be pathetic crutches for us to latch on to the real work. Undispensable, but crutches nontheless.
The question now is: where do you draw the line of view? Along which line do I take my knife to cut open my n-dimensional structure to unravel it and flatten it out into a 1-dimesional string of characters? This is a problem that is impossible to solve satisfactory for all possible PoVs or - as I say - Lines of View, or better yet, Horizons to the structure. Will I unravel my DB of books by authors? By issues? By vendors? By publishers or by weight and size?
What I'm getting to is this: mapping n-dimensional stuff to 1-dimensional structures will allways suck one way or the other. It's just that with XML we all start agreeing upon in which way it's supposed to suck. I don't think that changing the Schema standard (or worse: introducing additional standards) will actually attack this hard problem. I have a strong suspicion that Relax NGs relief is illusional, short term and re-introduces downsides that XML Schema allready has takled with it's pesky and strict nature. For one it would be consistency with the View-Horizon once chosen all the way through the given data-structure. I don't know for shure - go test and find out - but I do know that universal serialization will allways come with downsides and RelaxNG (or any other schema) won't change that.
We suffer more in our imagination than in reality. - Seneca
This guy claims that this:
<element name="addressBook" xmlns="http://relaxng.org/ns/structure/1.0">
<zeroOrMore>
<element name="card">
<element name="name">
<text/>
</element>
<element name="email">
<text/>
</element>
</element>
</zeroOrMore>
</element>
is easier to read than this:
<!DOCTYPE addressBook [
<!ELEMENT addressBook (card*)>
<!ELEMENT card (name, email)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT email (#PCDATA)>
]>
WTF ?!
Religion is what happens when nature strikes and groupthink goes wrong.
I have enough experience with Relax NG to say that it is great.
The compact syntax is enjoyable as you can be quite precise (compared to XSD) and there are tools that convert between the compact syntax and the xml Relax NG syntax allowing you to use syntax that suites your needs. In general, JING it is quite a bit quicker than a few of the XSD validators for comparably complex schemas.
There are a few disadvantages:
* The full range of tools that are available are not advanced on a regular basis. I found a few bugs in the JING source code and had the opportunity to fix them where necessary.
* I feel that RelaxNG is marginalized because of XSD and along with that goes alot of additional OSS support. They are maintained by individuals instead of teams. I would recommend that the author of JING puts his software forward to the apache foundation (jakarta commons) and see if it can attract a bit more attention.
* Web services are a bit of a sticking point. The use of a Relax NG schema can be embedded into the WSDL, however, the various 3rd party clients may not necessarily understand the schema, and by extension, they would not generate any supporting classes making integration with a relax NG defined webservice a little more complex than it needs to be.
Relax NG really is great.
-Tim
I don't see why XML schemas has to exist. BNF notation serves the exact same purpose: it describes a grammar. A BNF-like derivative is more than enough to define XML schemas. The compact syntax of RELAX NG is just that, and a bright idea.
It is really annoying when CS has to be discovered all over again. The problem of validating text to a certain format has been solved many decades ago, and BNF and variations of are known from the 60s...
In most of the 'xml as hierarchical data storage' usage cases, JSON is what-it-should-have-been since the beginning (basically, ini files with nested structures):
- no element vs. attribute headaches
- no element-with-data-inside vs. element with elements inside headaches
- no way to declare external entities, cdata sections and other obscure features
- freaking easier to parse
- specs out very clearly charset encoding and escape sequences
maybe we'll have a post from Tim about that in a couple of years...
(damn short subject lines!)
I agree that RelaxNG is much easier to read, and it will much more completely describe a grammar than will the other standard - and MUCH more completely define it than will a DTD.
Unfortunately, as far as I can tell there is no way to, within an XML document, state "Use THIS RelaxNG schema file to validate this document", as you can with a DTD. Thus, even if I have placed my RelaxNG schema on my web server, I cannot set things up such that (for example) libXML2 can automatically fetch that schema when it starts parsing my document. I can map the RelaxNG schema to a DTD (losing information) and allow that to be fetched, but if I want to use a RelaxNG schema with libXML2 I the programmer must tell libXML2 where the schema is.
IMHO it would be a Good Thing if the W3C would standardize on some way to associate a RelaxNG schema with a given XML file - say, by some form of XML processing directive within the XML file.
www.eFax.com are spammers
I have used EDI and sucks almost as bad as XML. I have written more flat text file parsers than I can count. Nested data and escaped characters are no problem in flat text if the format is well defined.
I use XML every day in various applications. In my opinion is serves no purpose other than bloat. I would be all for a standard text format but XML is just ridiculous.
XML was intended to be easy to PARSE, not easy to read.
Correct, XML is slightly easier to parse because of explicit end tags but most people disabled short tag support and enforced end tags in SGML anyway (in the syntax declaration and DTD respectively). However, saying that XML is not as easy to read as SGML is stretching things a bit - I find them to have the same legibility, although when using namespaces in XML I find they tend to result in long tag names that obscure things a bit. The things that appeared to be novel or improvements in XML were the discarding of the antiquated syntax declaration (a hangover from early GML days) and the concept of "well formed XML" without explicitly requiring a DTD.
As someone who has used XML Schema a little, it amazes me that no on thought to shoot the designers as soon as they published the first draft. I've learned entire Turing-complete programming languages in less time than it took me to get to even moderate competence with Schema (Lisp, Erland and Smalltalk, for example, all took less long to learn than Schema; I could write a program in any of them that would validate an arbitrary XML document more easily than I could write a Schema, in spite of spending longer learning Schema).
I am TheRaven on Soylent News
Schema definition by it's nature is tedious but necessary at this point. If you're going to take a standard thats already entrenched and suggest everyone stop and polish the edges from it how about we kill the verbosity of the xml end-tag instead?
Do we lose anything other than bandwidth use by doing this,
<tagNameThatCanBeLong>Some Text</>
instead of this:
<tagNameThatCanBeLong>Some Text</tagNameThatCanBeLong>
If the next end tag must belong to the last start tag what's the point of naming it?
You are checking your backups, aren't you?
Reading XML is like reading compiled code. You might have to do it to debug something or to grok how the code works but XML is intended to be parsed, not read.
It seems to me that emphasis should be placed on features that improve parsing, not human readability. I don't know enough about XML or RELAX NG to opine on which is best for parsing but it seems that parsing should be the main criteria of which is "best".
I don't read your sig. Why are you reading mine?
Flame: to insult or criticize angrily
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
En tee
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
I just showed your example to a half dozen people (some programmers, some managers) and they agree that the longer form is vastly more readable and understandable.
Shit, you think people are born knowing what an asterisk postfix means? Terseness != Clarity.
Not Lisp, but S-expressions, which are the basis of Lisp syntax; Lisp is an "application" of S-expressions, the same as XML applications are applications of XML. S-expressions extended with something similar to XMLs encoding declarations could substitute for XML and would be arguably cleaner—certainly, cleaner to Lispers, though I'm not so sure that:
(foo
(bar baz (spam: "eggs"))
is really more readable (rather than just more compact) than:
<foo>
<bar>
<baz spam="eggs"/>
</bar>
</foo>
I've learned entire Turing-complete programming languages in less time than it took me to get to even moderate competence with Schema
What do you expect? Schemas were a Microsoft initiative IIRC.
FreeSpeech.org
That only works if you are the larger company. Most of these guys are huge international brokerages - they aren't changing for us.
The only thing I've found useful from the Schema namespaces is the set of datatypes (int, float, string, etc) which are quite useful for other things.
Could W3C please split these off into their own "standard" namespace family?
Some restriction examples: Is that enough restriction for you?
XSDs might be too complex for their own good, but if you're gonna bash them, at least know what you're talking about first. And btw, who the heck uses DTD nowadays? I never thought I'd see people mentioning those in 2006! Who in their right mind would use a non-xml-compliant definition file to validate a xml file? Weird...
shana