Tim Bray Says RELAX

Don't do it. by Anonymous Coward · 2006-12-04 15:27 · Score: 4, Funny

When you want to come.

Re:Don't do it. by Loconut1389 · 2006-12-04 15:49 · Score: 2, Funny

I tried to tag the article with frankiegoestohollywood,zoolander,killthemalaysianp rimeminister but it didn't all fit ;)
Re:Don't do it. by Keith+Russell · 2006-12-04 16:13 · Score: 1

Did you try "bluesteel, letigre, ferrari"?

--
This sig intentionally left blank.
Re:Don't do it. by Sporkinum · 2006-12-04 16:32 · Score: 1

Frankie goes to Sunnyvale....

--
"He's lost in a 'floyd hole"
Re:Don't do it. by hotdiggitydawg · 2006-12-05 00:35 · Score: 1

I was thinking more along the lines of "Relax, guy! Put your feet up!"

Couldn't agree more by antonyb · 2006-12-04 15:36 · Score: 5, Insightful

My experience with XML Schema is exactly that; hard to write in the first place, hard to maintain, and regular interop problems between different implementations that make the theory of web services a practical nightmare (idrefs are the first example that spring to mind).

On the other hand, RELAX NG "just works".

(all IME of course...:)

ant.

Re:Couldn't agree more by camperdave · 2006-12-04 16:30 · Score: 2, Funny

RELAXiNG works for me too.

--
When our name is on the back of your car, we're behind you all the way!
Re:Couldn't agree more by caluml · 2006-12-05 00:52 · Score: 1

RELAXiNG works for me too.
Your comment is even funnier with your sig: "Wake up, Zeke! The day ain't gonna waste itself."

--
Get your own free personal location tracker

I have to agree. by JanusFury · 2006-12-04 15:37 · Score: 4, Insightful

Has anyone here ever tried to read an XML schema for anything relatively complex? It's a nightmare. RELAX looks much cleaner and more direct, which I wholeheartedly approve of.

--
using namespace slashdot;
troll::post();

Re:I have to agree. by sien · 2006-12-04 15:55 · Score: 4, Interesting

Yes. I've done it using Relax NG and it was easy, simple and readable.
It also works really, really well with the nXML mode for emacs.
Finally, XML schemas in a way that are not verbose, ugly and unreadable. And if you do need one of the older schema languages there are translators from RelaxNG available.
Re:I have to agree. by radtea · 2006-12-04 16:47 · Score: 4, Interesting

I was at SGML '96 where XML was first announced, and was one of those people who went home and wrote a (non-validating) XML parser over the weekend, based on the draft spec. I've used both DTDs and XML Schemas and can say without question that schemas are actually a bigger pain to work with than DTDs. DTDs were bad enough, but schemas have been a major step backwards, adding complexity without adding the features one actually needs.

Some years ago I wrote a code generator that used DTDs as the data modelling language. I sold it to the company I was working for at the time and someone I had no control over re-wrote it use schemas because they were "simpler". The result had major bugs and dropped features, not entirely due to schema-related problems, although it is worth noting that the "simplifications" included handling schemas in completely incorrect ways, because if you handled them correctly they could not do the job. I created a new generator from scratch last year and tried to do thing "properly" with schemas. It was essentially impossible, and I wound up creating a custom XML-based language use as input.

At the time there was no Relax NG standards process, so I stayed clear of it. But it has the blessing of James Clarke too (author of the SP SGML parser and the expat XML parser.) So it is probably worth another very hard look.

--
Blasphemy is a human right. Blasphemophobia kills.
Re:I have to agree. by LizardKing · 2006-12-05 01:18 · Score: 1

I was at SGML '96 where XML was first announced

Was that at some hotel in Swindon, UK? If so then I was there as well, if not then it must have been very shortly after the announcement, as XML (along with XSL) dominated the meeting. When XSL was described by a heavily bearded academic guy, several of the audience members became apoplectic. Apparently they thought DSSSL was a better alternative, something that amused me as all the DSSL tools I was aware of were either incomplete or as fiddly as fuck to work with.
Re:I have to agree. by Skjellifetti · 2006-12-05 09:14 · Score: 1

Was that at some hotel in Swindon, UK?

No, It was in Boston. The original XML spec was a 20 page booklet. One tidbit: This was at the height of the browser wars and MS was all gung-ho for XML while Netscape wanted nothing to do with it. Many of the SGML gurus were quietly rooting for MS for just that reason.

--
FreeSpeech.org

To the point. by jhd · 2006-12-04 15:40 · Score: 2, Funny

"W3C XML Schemas (XSD) suck"

Hey Tim, don't hold back, tell us what you really think.

Re:it's a rather straightforward observation by GroovinWithMrBloe · 2006-12-04 15:59 · Score: 2, Insightful

if something, anything, is intended to be primarily parsed by machine, use xml

xml is a b**ch to read
Don't forget what we used to use... binary is even worse. XML was designed with people in mind, which is why it's easier for people to read and manipulate than your traditional binary file format.

Re:Just sit back... by ubernostrum · 2006-12-04 16:04 · Score: 4, Informative

What kind of programmer can't use XML effectively anyhow...oh wait... (No, I didn't read TFA!)

Helpful hint for understanding the above: Tim Bray, author of TFA, is one of the guys who originally developed and spec'd out XML. Really. His name's on the spec and everything. So if he says that a particular XML tool has problems, it's probably a good idea to take him at his word ;)

Re:XML Totally Sucks - All of it! by beavis88 · 2006-12-04 16:04 · Score: 2, Insightful

And if you can't have a DB connection?

For flat data, sure a flat file is fine...for structured/hierarchical data, a flat file is :(

I agree! by Maddog787 · 2006-12-04 16:06 · Score: 3, Funny

I refuse to use XML in any shape way or form no matter what anyone say or does with it!!!

Re:it's a rather straightforward observation by Peter+Cooper · 2006-12-04 16:06 · Score: 4, Informative

Check out YAML.

Re:XML Totally Sucks - All of it! by Anonymous Coward · 2006-12-04 16:09 · Score: 2, Insightful

XML would be great if people validated their XML files before sending them out. And cut the verbosity and redundancy down by 90%. And used english elements instead of numbers. Ahh XML, the ideal most people pay lip service to but up to which they fail to live.

Like two peas in a pod by KalElOfJorEl · 2006-12-04 16:12 · Score: 1

Between this standard and REST, it looks like we have some very lazy web services, RESTing and RELAX NG all the time . . .

Re:XML Totally Sucks - All of it! by unity100 · 2006-12-04 16:16 · Score: 1, Flamebait

Why the hell would you ever have to use flat file or xml for data/hierarchy anyway ?

now even for little stuff we use freely available databases and small snippets.

--
Read radical news here

Re:XML Totally Sucks - All of it! by l810c · 2006-12-04 16:17 · Score: 1

You either need a Dataset or All of the data.

I can send you a Dataset for your application needs or I can send you All of the data in a series of flat files that you can then manipulate with code/import to relational database.

I reject the XML self documenting data paradigm, it's just not applicable to most business processes. You are relying on the originating XML document to follow your own in house rules.

I want my data clean and neat and then I work my magic with it.

Relax NG's compact non-XML syntax by SimHacker · 2006-12-04 16:33 · Score: 2, Interesting

Relax NG has a compact non-XML syntax. But C++/Java is a horrible syntax to use if you want a language to be readable and easy to understand. Since when was 17 levels of operator precedence easy to understand? Of course any good programmer always uses parenthesis to avoid ambiguity, so why should a language have 17 levels of built-in ambiguity just to make it that much easier to make hard to find mistakes?

-Don

From my blog: Relax NG Compact Syntax: no to operator precedence, yes to annotations!

James Clark is a fucking genius! Hes the guy who wrote the Expat XML parser, works on Relax NG, and does tons of other important stuff. Relax NG is an ingeniously designed, elegant XML schema language based on regular expressions, which also has a compact, convenient non-xml syntax.

I totally respect the way he throws down the gauntlet on operator precedence (take that you Perl and C++ weenies!):

There is no notion of operator precedence. It is an error for patterns to combine the |, &, , and - operators without using parentheses to make the grouping explicit. For example, foo | bar, baz is not allowed; instead, either (foo | bar), baz or foo | (bar, baz) must be used. A similar restriction applies to name classes and the use of the | and - operators. These restrictions are not expressed in the above EBNF but they are made explicit in the BNF in Section 1.

You can translate back and forth between Relax NG's XML and compact syntaxes with full fidelity, without losing any important information. Relax NG supports annotating the grammar with standard and custom namespaces, so you can add standard extensions and extra user defined meta-data to the grammar. That's useful for many applications like user interface generators, programming tools, editors, compilers, data binding, serialization, documentation, etc.

Here's an interesting example of a complex Relax NG application: OpenLaszlo is an XML/JavaScript based programming language, which the Laszlo compiler translates into SWF files for the Flash player. The Laszlo compiler and programming tools use this lzx.rnc Relax NG schema for the OpenLaszlo XML language. This schema contains annotations used by the Laslzo compiler to define the syntax and semantics of the XML based programming language.

The schema starts out by defining a few namespaces:

default namespace = "http://www.laszlosystems.com/2003/05/lzx"
namespace rng = "http://relaxng.org/ns/structure/1.0"
namespace a = "http://relaxng.org/ns/compatibility/annotations/1 .0"
datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"
namespace lza = "http://www.laszlosystems.com/annotations/1.0"

The a: namespace defines some standard annotations like a:defaultValue, and the lza: namespace defines some custom annotations private to the Laszlo compiler like lza:visibility and lza:modifiers. Thanks to the ability to annotate the grammar, much of the syntax and semantics of the Laszlo programming language are defined directly in the Relax NG schema in the compact syntax, so any other tool can read the exact same definition the compiler is using!

To show how truly simple and elegant it is, here is the snake eating its tail: The Relax NG XML syntax, written in the Relax NG compact syntax:

# RELAX NG XML syntax specified in compact syntax.

default namespace rng = "http://relaxng.org/ns/structure/1.0"
namespace loc

--
Take a look and feel free: http://www.PieMenu.com

Re:Relax NG's compact non-XML syntax by Anonymous Coward · 2006-12-04 17:44 · Score: 2, Funny

Stop cutting and pasting from your fucking blog already. Make your point without it, or if you need to, then link to it.
Re:Relax NG's compact non-XML syntax by nagora · 2006-12-04 22:14 · Score: 1

That's nice looking; basically BNF with some twiddles. That I can read; XML is just plain bad. I remember when it came out the first thing I ever said about it was "Why the hell didn't they use some type of BNF?". This is exactly the sort of thing I had in mind.
"Twiddles" is a technical term.
TWW

--
"Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"

Re:XML Totally Sucks - All of it! by pleb1024 · 2006-12-04 16:33 · Score: 2, Insightful

Totally agree.

While XML may have it's places (I've yet to encounter one in the commerical world), passing large amount of data is not one of them. A good flat file design is a lot more efficent than XML, and short of hardware accelartion I don't see that changing.

I'm currently trying to assist a customer, whose changing from one system to another, the current system generates flat files of approx 2gig in size every couple of days (billing data). The new system produces files of approx 13gig. The data contained within files result in the exact same bill being produced for the customers.

Needless to say, the extra diskspace (yes we do compress them), and processing time to parse/compress is such a waste.

In my mind, XML trades shorter development time / 'portability' (well so the theory goes), for greater resource usage (CPU/Disk), whereas most customers I've dealt with would rather take a little longer to develop, and have a lot less resource limitation issues on the production systems. The old methods of 'just throw more hardware at it' just don't work in the real world anymore.

Telltale Sign... by PipianJ · 2006-12-04 16:35 · Score: 1

I've been picking up Emacs lately, and the xml-mode standardly used (nxml-mode) uses RELAX over XML Schema. I suspect that probably says a lot for RELAX's parseability. I've had just a little bit of experience playing around with Schemas and they seem about as navigable as DTDs, which is to say not very. I haven't tried RELAX though.

If it's not meant to be read by humans by porkchop_d_clown · 2006-12-04 16:37 · Score: 1

then why are you using an ASCII encoding in the first place? Those tags just lower the signal to noise ratio. Even Apple's given up and started saving their meta data in a "compiled" version of XML.

Oh, and, "Hi! How you doing? Long time no see!"

--
Clear, Dark Skies

Re:it's a rather straightforward observation by jomama717 · 2006-12-04 16:37 · Score: 1

xml is a b**ch to read Beats EDI.

I'll take XML over a positional format any day, even if it only has to be looked at by human eyes 5% of the time. If you find yourself in a situation requiring eyeball examination of purchase order/shipping data at a large electronic commerce company it is likely an emergency and <ctrl>-f 'ing for a tag name, or using a web browser to check well-formedness can be a lifesaver.

--
while [ 1 ]; do echo -n -e "\xe2\x95\xb$((($RANDOM&1)+1))"; done

Relax NG: Design-by-Inspired-Individuals by SimHacker · 2006-12-04 16:38 · Score: 3, Interesting

Relax NG is a great example of the triumph of Design-by-Inspired-Individuals vs. Design-by-Committee.

In The State of XML, Edd Dumbill explains the secret behind the success of Relax NG:

Incidentally the RELAX NG success can equally well be framed as a case of design-by-inspired-individuals vs. design-by-committee as much as it can be seen as a OASIS vs. W3C thing.

-Don

--
Take a look and feel free: http://www.PieMenu.com

Re:Relax NG: Design-by-Inspired-Individuals by SimHacker · 2006-12-04 19:10 · Score: 1

And you consider XSD to be progress??! You sound like George W Bush trying to put a good spin on Iraq. "Mission Accomplished!"

So how to you counter Tim Bray's arguments against XSD?

-Don

--
Take a look and feel free: http://www.PieMenu.com
Re:Relax NG: Design-by-Inspired-Individuals by arodland · 2006-12-05 02:36 · Score: 1

I didn't really think it was humanly possible to be so stupid that you would think that "hard to read, hard to write, hard to understand" is a subjective claim. It's not, those are perfectly testable metrics, and perfectly relevant in the real world because they affect how much work it takes to use the product.
Re:Relax NG: Design-by-Inspired-Individuals by SimHacker · 2006-12-05 18:12 · Score: 1

Why are you so afraid to post your ridiculous arguments under your real name?

I'm so glad you asked. Have you bothered to read the discussion of the design that went into Relax/NG? It was specifically designed to address many of the shortcomings and limitations and complexities of XSD.

One good metric of simplicity and power is "composability", as defined by James Clark in his description of Relax/NG. (See my other post on the subject, where he defines that term, and goes into a lot of detail about the reasons behind the design.)

James Clark wrote about maximizing composability:

First, a little digression. In general, I have made it a design principle in TREX to maximize "composability". It's a little bit hard to describe. The idea is that a language provides a number of different kinds of atomic thing, and a number different ways to compose new things out of other things. Maximizing composability means minimizing restrictions on which ways to compose things can be applied to which kinds of thing. Maximizing composability tends to improve the ratio between functionality on the one hand and simplicity/ease of use/ease of learning on the other.

Another good metric of the simplicity, readability and writability is the "syntactic surface area" of the syntax: XSD does not have a non-XML syntax, while Relax/NG has the compact syntax, which is MUCH easier to read and write than anything that uses XML syntax.

Dude, your arguments are coming from total ignorance and stupid anger. You obviously know very little about computer science, language design and human factors. Please tell us your name so we can know who could possibly be such a fool, or shut up and stop spouting such bullshit.

-Don

PS: Fucktard is one word, you anonymous asshat.

--
Take a look and feel free: http://www.PieMenu.com
Re:Relax NG: Design-by-Inspired-Individuals by SimHacker · 2006-12-05 18:20 · Score: 1

Ahem... DTDs are more powerful than XSD's, in that they support interleaving. Relax/NG also supports interleaving, but XSD does not. The XSD designers originally believed that interleaving was too computationally expensive to support efficiently, which is why left it out. But they were wrong, and James Clark showed them how to do it, by using lazy automata construction. It's fucking brilliant. I've read the Relax/NG Haskel code as well as the Java implementation, and it's really beautiful stuff. You're totally missing out, trying to construct a radio with stone knives and bear skins, if you're still stuck with XSD's.

So just why are you so inflexible, close minded, and incapable of using a better technology than XSD? If there was a good reason to use Relax/NG instead of XSD, and you really wanted to switch, could you? Would you? Why or why not?

It sounds to me like you made a horrible investment in XSD and got screwed by it, and you don't want to hear about how happy everybody is about Relax/NG. Why the long face?

-Don

--
Take a look and feel free: http://www.PieMenu.com

Re:it's a rather straightforward observation by Creosote · 2006-12-04 16:38 · Score: 1

xml is a b**ch to read

Like any other formalism, it's difficult until you get used to it. The more familiar you are with a particular XML tagset and markup conventions, the easier it is to pick out the relevant structures and information. I remember being apalled at the verbosity of XSLT when I first begin to use it, but nowadays if I'm working with well structured XSLT code (and color-coding in the editor) I can scan it pretty efficiently.

That said, a non-XML syntax is almost always going to be more human-friendly. Which is another advantage of RELAX NG, of course, since it has a compact syntax that translates back and forth without loss of information to the XML form of the language.

COBOL Totally Rocks - All of it! by Anonymous Coward · 2006-12-04 16:43 · Score: 1, Funny

"I may be old school, working with flat files and all for over 20 years, but I do work with a lot of newer technology."

Well I'm your counterpart in India and I'm happy to hear you're having problems getting use to newer technologies. Keep up the good work.

Re:COBOL Totally Rocks - All of it! by l810c · 2006-12-04 17:13 · Score: 1

Who said COBOL?
Re:COBOL Totally Rocks - All of it! by Gorshkov · 2006-12-04 20:52 · Score: 1

Well I'm your counterpart in India and I'm happy to hear you're having problems getting use to newer technologies. Keep up the good work.
Aren't you the guy that linked in that buggy, unreliable 10 Gb library into our application so you could use that new, just-freshly-developed WAY cool highly optimized parallel recursive garbage-collecting sort routine to deal with an unordered list of 10 items?

Great job, now to clean up XML itself by iamacat · 2006-12-04 16:48 · Score: 2, Insightful

With a notation similar to RELAX NG compact syntax. XML has been a killer of readable formats like windows-style ini files. It tries to be readable by both human and machine and succeeds at neither. It's like programming in assembler, because it can be read by a human better than machine code and compiled faster than C.

Re:Great job, now to clean up XML itself by killjoe · 2006-12-04 17:39 · Score: 3, Insightful

I believe you are looking for lisp. It's XML cleaned up, simplified and hulkified.

--
evil is as evil does
Re:Great job, now to clean up XML itself by iamacat · 2006-12-04 20:43 · Score: 1

LISP -> XML alternative == Postscript -> PDF. You don't always want to execute your data, especially with today's abundance of malware.
Re:Great job, now to clean up XML itself by Fnordulicious · 2006-12-05 04:54 · Score: 1

Nonsense. Just stick to READ and don't call EVAL on your data. Or write your own toy EVAL that is restricted to certain known operations.

You might want to set up your own READ macros as well, to ensure that nobody uses #. maliciously.
Re:Great job, now to clean up XML itself by I+Like+Pudding · 2006-12-05 05:21 · Score: 1

XML is not a programming language. Lisp is not a markup language. I believe the comparison you were looking for was to s-expressions, which are a lot lighter than XML but don't do nearly as much. That, and nobody outside Lisp/Schemers use them. Hell, the nascent JSON spec already has more traction.
Re:Great job, now to clean up XML itself by DragonWriter · 2006-12-05 08:19 · Score: 1

XML is not a programming language. Lisp is not a markup language. I believe the comparison you were looking for was to s-expressions, which are a lot lighter than XML but don't do nearly as much.

Bare S-expressions don't define enough semantics to do what XML does; Lisp goes to far for what XML is used for in being a full programming language (though, given all the XML-related technologies that are widely used to add more and more programming-like features to XML, it may not be "too far"); somewhere between the two you could construct something that built on S-expressions to do what XML does fairly cleanly. It probably wouldn't be as good as XML as a "markup language" for structured text, but it would be more concise and arguably cleaner for exchanging other kinds of data, and XML is often used for things that aren't really marked-up text.

Re:it's a rather straightforward observation by radtea · 2006-12-04 16:51 · Score: 2, Informative

XML was designed with people in mind, which is why it's easier for people to read and manipulate than your traditional binary file format.

Err... no.

XML was a step back from SGML's "human-friendly" clever tricks. XML was intended to be easy to PARSE, not easy to read.

--
Blasphemy is a human right. Blasphemophobia kills.

Maximizing Composability and Relax NG Trivia by SimHacker · 2006-12-04 16:53 · Score: 4, Informative

Tim Bray is right, and he couldn't have put it better: W3C XML Schemas (XSD) suck. The reason Relax NG is so much cleaner and more powerful than committee-designed XML Schemas, is that it's based on a sound mathematical foundation (tree regular expressions, or "hedge automata theory"). While XML-Schemas suffer from ad-hoc design, committee-burn, lack of focus, and half-baked attempts to solve too many unrelated problems.

Here's some interesting stuff from my blog about the design and development of Relax NG.

-Don

James Clark wrote about maximizing composability:

First, a little digression. In general, I have made it a design principle in TREX to maximize "composability". It's a little bit hard to describe. The idea is that a language provides a number of different kinds of atomic thing, and a number different ways to compose new things out of other things. Maximizing composability means minimizing restrictions on which ways to compose things can be applied to which kinds of thing. Maximizing composability tends to improve the ratio between functionality on the one hand and simplicity/ease of use/ease of learning on the other.

Clark describes the derivative algorithm's lazy approach to automaton construction:

I don't agree that <interleave> makes automation-based implementations impossible; it just means you have to construct automatons lazily. (In fact, you can view the "derivative"-based approach in JTREX as lazily constructing a kind of automaton where states are represented by a canonical representative of the patterns that match the remaining input.)

The Relax NG derivative algorithm is implemented in a few hundred elegent declarative functional lines of Haskel, and also in tens of thousands of lines and hundreds of classes of highly abstract complex Java code.

Clark's Java implementation of Relax NG is called "jing", which is a Thai word meaning truthful, real, serious, no-nonsense, and ending with "ng".

Comparing the Java and Haskell implementations of Relax NG illustrates what a wicked cool and powerful language Haskell really is. The Java code must explicitly model and simulate many Haskel features like first order functions, memoization, pattern matching, partial evaluation, lazy evaluation, declarative programming, and functional programming. That requires many abstract interfaces,, concrete classes and brittle lines of code.

While the Java code is quite brittle and verbose, the Haskell code is extremely flexible and concise. Haskell is an excellent design language, a vehicle for exploring complex problem spaces, designing and testing ingenious solutions, performing practical experiments, weighin

--
Take a look and feel free: http://www.PieMenu.com

Re:Maximizing Composability and Relax NG Trivia by heinousjay · 2006-12-05 03:07 · Score: 2, Funny

Thanks for the Java flame. I was worried that there wouldn't be any offtopic ranting in this story, but you eased my worries just a few comments into it.

--
Slashdot - where whining about luck is the new way to make the world you want.
Re:Maximizing Composability and Relax NG Trivia by Scarblac · 2006-12-05 04:39 · Score: 1

Impressive! I don't think I've ever seen a Slashdot comment that long that was still informative.

--
I believe posters are recognized by their sig. So I made one.
Re:Maximizing Composability and Relax NG Trivia by drew · 2006-12-05 04:50 · Score: 1

That's an awful lot of cutting and pasting just to take a worthless jab at the Java language. While I haven't even looked at the code, and I don't really know all that much about either language, I can guess just from your description that the reason that the Java version is so complex is that the Haskell version was written first, and then somebody tried to write the Java version using exactly the same logic as the Haskell version, and therefore ended up reimplementing half of Haskell in the process. As much as people seem to get that not all programming languages are equal, I'm consistently surprised at the number of people who seem to think that you should be able to implement an algorithm in exactly the same way in every language. I see the same thing in a lot of the new "AJAX frameworks" that take 80k of JavaScript to implement 20k worth of features because in the process they decided to bolt on 60k of syntactic sugar to make JavaScript look like Java/C#/whatever.

--
If I don't put anything here, will anyone recognize me anymore?
Re:Maximizing Composability and Relax NG Trivia by Erixxxxx · 2006-12-05 05:30 · Score: 2, Insightful

From the Haskell implementation:

"This document does not describe any algorithms for transforming a RELAX NG schema into simplified form, nor for determining whether a RELAX NG schema is correct."

From the Jing implementation:

"This version of Jing implements:

* RELAX NG 1.0 Specification,
* RELAX NG Compact Syntax, and
* parts of RELAX NG DTD Compatibility, specifically checking of ID/IDREF/IDREFS."

also from the Jing implementation:

"Jing also has experimental support for schema languages other than RELAX NG; specifically

* W3C XML Schema (based on Xerces-J);
* Schematron;
* Namespace Routing Language."

Implement the same level of functionality in Haskell as is being implemented in Jing, then come back and compare.

Also, number of lines of code is only one standard, how does the Haskell implementation hold up under heavy loads? How well does it scale?

I personally think Jing tries to do too much, and I think there is definitely a need for a better java implementation of a RelaxNG validator, but your post (largely dealing with a non-sensical argument about semantics) is rather lazy.
Re:Maximizing Composability and Relax NG Trivia by Thuktun · 2006-12-05 05:51 · Score: 1

Comparing the Java and Haskell implementations of Relax NG illustrates what a wicked cool and powerful language Haskell really is. The Java code must explicitly model and simulate many Haskel features [...]
To be fair, this may be due to someone more familiar with Haskell trying to port their implementation to Java, rather than a native Java implementation being required to simulate those features. I don't know the history of those two implementations, but the fact that the Java implementation tries to simulate those features does not imply that it must.
While the Java code is quite brittle and verbose, the Haskell code is extremely flexible and concise.
A poor implementation in a particular language does not imply the language is bad. It may imply that the developer was not sufficiently versed in that language.
Re:Maximizing Composability and Relax NG Trivia by John+Whitley · 2006-12-05 06:19 · Score: 2, Informative

That's an awful lot of cutting and pasting just to take a worthless jab at the Java language.

For many problem domains, it often doesn't matter what language you throw up against Haskell -- the Haskell program will often be smaller by one or more orders of magnitude (for a sufficiently rich/interesting program, anyways). The grandparent poster didn't even craft the example in question; Java was just the vicitm-elect of this particular case. I'll observe that even if the Java program there could be made shorter by an order of magnitude (!!), it would still be an order of magnitude larger than the Haskell implementation.

Although it's a bit long in the tooth now, Paul Hudak and Mark Jones wrote a paper that surveys the results of a Naval Surface Warfare Center prototying study comparing a number of different programming languages. See Haskell vs. Ada vs. C++ vs. Awk vs. ... An Experiment in Software Prototyping Productivity. It's a fascinating read if you aren't already familiar with how different programming in Haskell is from many currently popular languages. I highly recommend delving into Haskell for any dedicated developer. Even if you don't find yourself developing in Haskell on a daily basis, the experience will positively impact how you think about code, and bring new conceptual models and patterns into your toolbox.
Re:Maximizing Composability and Relax NG Trivia by drew · 2006-12-05 12:29 · Score: 1

I don't doubt that Haskell is a very cool language, although I haven't had an opportunity to try it out yet- as far as I've been able to tell, it wouldn't be very useful for my primary areas of work.

My comment regarding his post was directed at this:
The Java code must explicitly model and simulate many Haskel features like first order functions, memoization, pattern matching, partial evaluation, lazy evaluation, declarative programming, and functional programming. That requires many abstract interfaces, concrete classes and brittle lines of code.

While the Java code is quite brittle and verbose, the Haskell code is extremely flexible and concise.

I was not claiming that the Java code could be made as small as the Haskell code. I was merely arguing that it was probably made needlessly "brittle and verbose" by trying to emulate the Haskell implementation in a very dissimilar language, and that saying that the Java code must emulate those Haskell features (and is therefore necessarily complex and brittle) is more likely the mark of a poor (or stubborn) programmer than of a poor programming language.

--
If I don't put anything here, will anyone recognize me anymore?
Re:Maximizing Composability and Relax NG Trivia by SimHacker · 2006-12-05 18:23 · Score: 1

Not so much of a Java flame as a Haskel plug. Everything I said about Haskel and Java is true. I even counted the number of lines of code. How is that a flame?

-Don

--
Take a look and feel free: http://www.PieMenu.com
Re:Maximizing Composability and Relax NG Trivia by SimHacker · 2006-12-05 18:29 · Score: 1

You have to remember we're talking about Java code written by James Clark here, not John Q. Random programmer. Clark is an excellent Java programmer, and has written many lines of hard core Java and C code (and many other langauges like Scheme and Haskell), implementing a long line of SGML an XML standards, including DSSSL, XSLT, XPath, Expat, etc. And it's all open source code, so you can go look at it determine how good a programmer he is by yourself.

Anyway, Java IS a bad language, especially compared to Haskell. Sheez, what planet are you from?

-Don

--
Take a look and feel free: http://www.PieMenu.com
Re:Maximizing Composability and Relax NG Trivia by SimHacker · 2006-12-05 18:46 · Score: 1

You can't get around the fact that Java simply does not have those many important features I listed (and linked to their definitions on Wikipedia), which are all extremely useful for implementing things like Relax/NG validators.
James Clark, the guy who wrote the Haskell code, is the SAME guy who wrote the Java code, and he's written a whole lot of other complex Java code, as well as many other languages, and also designed and implemented many XML standards. FYI, he served as the technical lead of the original W3C XML Working Group and as the editor of the XSLT and XPath recommendations.
Kiddo, you have no idea who you're calling a "poor (or stubborn) programmer". James Clark is one of the best programmers on the planet, who has written some of the most important code that's run by millions of people every day. Have you ever heard of Expat, the XML parser? Or XSLT? And no, James Clark is NOT the guy who founded Netscape. That Jim Clark just made millions of dollars off of the open source code generously designed, written and shared by James Clark.
Here is a brilliant interview with James Clark from Dr. Dobb's Journal. I've included some of my favorite parts, but the entire interview is fascinating and well worth reading. A Triumph of Simplicity: James Clark on Markup Languages and XML:

If you peek under the hood of high-profile open-source projects such as Mozilla, Apache, Perl, and Python, you'll find a little program called "expat" handling the XML parsing. If you've ever used the man command on your GNU/Linux distribution, then you've also used groff, the GNU version of the UNIX text formatting application, troff. If you've ever done any work with SGML, from generating documentation from DocBook to building your own SGML applications, you've undoubtedly come across sgmls, SP, and Jade.
Whether you've heard of him or not (and mostly likely, you haven't), James Clark (below right) has made your life easier. In addition to authoring these and other widely used open-source tools (see http://www.jclark.com/ for a complete list), Clark served as the technical lead of the original W3C XML Working Group and as the editor of the XSLT and XPath recommendations. He recently founded Thai Open Source Software Center (http://www.thaiopensource.com/). His latest project is TREX, an XML schema language. Clark sat down with Eugene Eric Kim to discuss markup languages, the standardization process, and the importance of simplicity.
DDJ: How did you get involved with SGML?
JC: I was interested in using SGML as a replacement for one part of what groff was doing. Then I got Charles Goldfarb's book, The SGML Handbook, and I thought, "Hmm, this is an interesting thing. Let's see if I can write a program for it." Then Charles Goldfarb released his ARCSGML SGML parser, and I started working with that. The more I worked with it, the more I felt it needed improvements and bug fixes, and nobody else seemed to be doing that. There seemed to be a real need for turning a research-worthy tool into more of a production-quality tool, and that turned into sgmls. Working with sgmls, I got more and more dissatisfied with its basic internal structure. There were some things in SGML that would have been very hard to implement within sgmls, and I felt that I really understood how SGML parsing worked, and so I produced a completely new SGML parser, SP.
DDJ: Did you feel like there were any major itches that you got to scratch with the specification of XML?
JC: I knew how insanely complex writing an SGML parser was. SGML is really doing something very simple. It's providing a standard way to represent a tree, and your nodes have a label with names and they can have attributes. That's all it's doing. It's not a complicated concept. Yet SGML manages to make writing something that implements it into a several-man-year project.
A lot of the features do have a reasonable mo

--
Take a look and feel free: http://www.PieMenu.com
Re:Maximizing Composability and Relax NG Trivia by Thuktun · 2006-12-08 07:41 · Score: 1

You have to remember we're talking about Java code written by James Clark here, not John Q. Random programmer. Clark is an excellent Java programmer [...]
I'm taking issue with a logical fallacy--"Program A written in Language B is ugly compared to the version in Language C, therefore Language B is worse than Language C"--not the the developer in question.

What I was trying to say was, I doubt a Java master that didn't know Haskell would write the program in such a way that he was forced to implement emulations of Haskell features before he could complete the Java version.

In this case, the author probably wrote the Haskell version and, rather than re-implement the entire thing when he had an already-working version, port over the code providing shims where the language didn't provide enough support.

Anyway, Java IS a bad language, especially compared to Haskell.
Java is an object-oriented, imperative language; Haskell is a functional language. They're not directly comparable, as they're meant to approach problems from a different direction.

You're comparing an auto to an airplane, saying that the plane is better than the car because it can travel around the world faster and in a straighter line. Well of course, it's designed for that type of task. Try trying to commute 30 miles to work every day, and you'll find that the plane simply isn't well-suited for that kind of task.

Thanks for the ad hominem, though, that was thoughtful.
Re:Maximizing Composability and Relax NG Trivia by drew · 2006-12-09 05:52 · Score: 1

Actually, I do know who he is, and I have a great deal of respect for the work that he has done. I wasn't addressing his statements, I was addressing yours.

And I never actually called anyone a poor programmer. My statement was "Anyone who says 'language X is a poor language because in order to implement Y with language X, you first have to implement all of these features from language Z' is probably saying more about themself than about language X." As far as I know, he's never said that, and for that matter, neither you never actually said it either, although you certainly very heavily implied it. Anyway my real point was that you spent far more of that post criticizing Java (which was IMO misplaced) than providing any useful discussion.

--
If I don't put anything here, will anyone recognize me anymore?

One fix to XML I'd like to have... by Reality+Master+101 · 2006-12-04 16:56 · Score: 1

Speaking of XML, how much smaller would XML files be if they made one minor simple change...

Add to mean "close the matching element".

*sigh* I wish I'd been on the committee when they specified the standard.

--
Sometimes it's best to just let stupid people be stupid.

Re:One fix to XML I'd like to have... by Reality+Master+101 · 2006-12-04 17:03 · Score: 2, Interesting

Damn! I mean, add </>...

(Argh, the "wait between comments" thing is infuriating...)

--
Sometimes it's best to just let stupid people be stupid.
Re:One fix to XML I'd like to have... by horster · 2006-12-04 17:13 · Score: 1

totally agree...
Re:One fix to XML I'd like to have... by nuzak · 2006-12-04 17:21 · Score: 4, Insightful

That feature is in SGML. In fact it can be even shorter than that, you can express an entire tag and its content with is optional). SGML even lets you change the angle brackets to anything else you want. You can make any SGML doc look like nothing you or anyone else has ever seen ... all part of the feature set.

SGML is full of fun little hacks like that, and it was a pain in the ass to read. Omitting the tag name from the end tag makes it impossible to know you have an improperly closed tag til the end of the document, and then you have no idea which tag wasn't closed. And no, that wasn't a theoretical problem either, this became a real problem with giant SGML docs that used all the shortcuts.

If you really hate XML's verbosity so much, realize that it was designed for easy reading, not easy writing. I whipped up my own xml mode in emacs and made '</' trigger an "electric-slash" behavior that closes the tag automatically. Not rocket science.

--
Done with slashdot, done with nerds, getting a life.
Re:One fix to XML I'd like to have... by Electrum · 2006-12-04 18:06 · Score: 1

Speaking of XML, how much smaller would XML files be if they made one minor simple change...

Add to mean "close the matching element".

You mean like Lisp S-expressions?
<copy> <todir>../new/dir</todir> <fileset> <dir>src_dir</dir> </fileset> </copy> (copy (todir "../new/dir") (fileset (dir "src_dir")))
Re:One fix to XML I'd like to have... by Nasarius · 2006-12-04 18:12 · Score: 1

If file size is a concern, XML compresses easily. The OpenOffice file formats are zipped XML.

--
LOAD "SIG",8,1
Re:One fix to XML I'd like to have... by tbray · 2006-12-04 19:51 · Score: 1

Heh, my own tagged-text mode uses just '/' to mean "close whatever needs closing". Works great. (control-/ if yoiu want a real /).
Re:One fix to XML I'd like to have... by TheRaven64 · 2006-12-05 01:44 · Score: 1

XML is slow to parse. Adding zip into the mix does nothing to help this.
XML is basically a bloated way of expressing S-expressions. More compact (i.e. easy to parse, and small to store) versions already exist. What is really needed is a storage format that allows branches to be parsed in parallel. XML is inherently sequential; I have to parse an entire branch to know where the next one starts. It would be nice if I could scan ahead quickly and to the next branch at the same depth and parse this at the same time (after all, more parallelism seems to be the trend at the moment in CPUs).

--
I am TheRaven on Soylent News

Re:MyXML scheme (sucks too) by Pulse_Instance · 2006-12-04 16:56 · Score: 1, Funny

it would work but only if you express your idea like this

Post ingenious scheme on Wikipedia
Proclaim international standard
...
Profit!!!

Re:XML Totally Sucks - All of it! by Just+Some+Guy · 2006-12-04 17:00 · Score: 4, Insightful

While XML may have it's places (I've yet to encounter one in the commerical world), passing large amount of data is not one of them.

Yeah, well I have to look at EDI every day. I'd switch to XML in a heartbeat if it were up to me.

You picked some obvious strawmen to shoot down. XML isn't for building gigabyte databases (regardless of whether some people try to use it for that). It's for easily moving data between applications. If you think writing a flat text parser is easy, then you've never had to deal with nested data or escaped characters. Say what you will about XML, but it's nice to have one set standard that deals with all that, even if suboptimally, because I never want to write another ad-hoc parser for as long as I live. Been there, done that, have no desire to bother again.

--
Dewey, what part of this looks like authorities should be involved?

XML uses a binary format by ClosedSource · 2006-12-04 17:12 · Score: 4, Insightful

Of course ASCII (or UNICODE for that matter) is a binary standard as well. So special tools called text editors were created so that people could read it.

There are more sophisticated binary standards that are more efficient than ASCII and it wouldn't take a lot of effort to create viewers/editors for them as well. Of course most markup documents would be significantly smaller if tags didn't have to be S-P-E-L-L-E-D O-U-T character by character. Each HTML tag could be encoded in just two bytes with lots of room to spare.

It always fascinates me that we have no problem making customers use a new specialized tool like a browser, but it's taboo to use a non-ASCII tool for development. So we continue to structure our data as if it were going to be processed by a VT100.

Re:XML uses a binary format by Al+Dimond · 2006-12-04 17:34 · Score: 1

I wish I hadn't used all my mod points earlier today, because that's an interesting post... it would be interesting to set up a programming environment with a binary format and specialized editor, though on second thought it might not work so well. ASCII text is very flexible and almost universally understood across different platforms. It's hard to imagine a non-text based paradigm for developing full programs that's as flexible, though there are certainly examples such as resource editors for GUI creation where graphical tools make life easier. Maybe non-text based programming would be good for rapid development. HyperCard largely used non-text interfaces for development, and the overall organization was certainly not based on text.

A "compiled" or specially crafted binary version of HTML/XML might work nicely, and could potentially save bandwith; lots of web pages are sent gzipped, but a specialized binary format should be able to get the size down even more.
Re:XML uses a binary format by 2short · 2006-12-04 19:11 · Score: 3, Interesting

You could certainly make XML vastly more compact if you had some table of tags mapped to 2-byte codes. You're not the first to have such an idea, and I and others will be happy to use it... as soon as you've got it standardized, implemented, and as widely accepted as ASCII. Point being, I, and everyone I've never even met who will ever touch some particular XML file, already has a text editor.

We also all have some way of decompressing files in several standard compression formats, which will squash the XML down to the same size as your custom scheme, if storage space is an issue, which it generally isn't. There's all manner of custom schemes one can use to do various things better when one defines the platform. When you want to inter-opperate well, you need to use the capabilities that already exist on only semi-known systems.

Generally we don't actually make customers use new specialized tools. We take advantage of the new specialized tools they already have. I'm pretty sure not one of my customers ever got a browser to read my documentation; I wrote it in HTML because they've all got browsers already.
Re:XML uses a binary format by quigonn · 2006-12-04 20:48 · Score: 1

You could certainly make XML vastly more compact if you had some table of tags mapped to 2-byte codes. You're not the first to have such an idea, and I and others will be happy to use it... as soon as you've got it standardized, implemented, and as widely accepted as ASCII.

Not as widely accepted as ASCII, but standardized and implemented: WBXML

--
A monkey is doing the real work for me.
Re:XML uses a binary format by Tei · 2006-12-04 21:19 · Score: 1

Well.. some people tried binary code. And failed. Maybe because you are wrong and non-text code is a horrible painfull idea.

--
-Woof woof woof!
Re:XML uses a binary format by dkf · 2006-12-04 22:20 · Score: 1

It always fascinates me that we have no problem making customers use a new specialized tool like a browser, but it's taboo to use a non-ASCII tool for development. So we continue to structure our data as if it were going to be processed by a VT100.
Having tried to do real programming with a graphical programming language (no, not just GUI layout!) and having tried to actually write said graphical programming language, I have come to the conclusion that graphical programming is really really hard. Any time you have a really tricky problem to do, there is nothing better than a textual language for doing it (there are some things you can do with mixed models, but they never quite manage to go to being fully graphical).

There is no reason to stick to ASCII though (full UNICODE works fine in practice) and the use of variable-width fonts can work too, or it would if people didn't assume fixed-width fonts in a massive amount of existing code. It's probably a bad idea to convey real information in the font though; approaches more like the auto-colorizing of most programmers' editors works better.

--
"Little does he know, but there is no 'I' in 'Idiot'!"
Re:XML uses a binary format by lahi · 2006-12-04 23:09 · Score: 1

I agree completely: ASN.1 rules!

For those who don't know: ASN.1 (Abstract Syntax Notation 1) is an ISO syntax notation which I believe was developed for defining packets and other stuff used in the OSI standards suite. It is also used in many popular Internet protocols and standards, such as Z39.50 (WAIS), SNMP, LDAP and PKI. (Just to mention a few.)

There are a few different encoding rules (mappings to binary), and I suppose a readable encoding - even an XML-like encoding - would be possible, although I don't know if one already exists.

-Lasse
Re:XML uses a binary format by radarsat1 · 2006-12-05 02:59 · Score: 1

You could certainly make XML vastly more compact if you had some table of tags mapped to 2-byte codes.

I used to entertain this idea as well. But then it occurred to me, that it is almost the same thing if you simply zip the XML code, using a compression program such as zip, gzip, or bzip2.
If you read up on how these lossless compression algorithms work, essentially they go through the input and built up an "index" of repeated strings. That is, any sequence of characters found to be repeated is placed in the index, and the instances of this sequence are replaced by the index number.
(Essentially, as you probably know, this is the Lempel-Ziv algorithm.)

So basically, using standard compression techniques, you can build a "binary-encoded XML" file by simply compressing the XML. Then you have standard programs and libraries that can decompress it, and parse it with a standard XML parser. Magic!

I've since concluded that this way you can get the advantages of a binary encoding AND the advantages of a simple and standard markup at the same time.
(Note however that it is definitely not as CPU-friendly as a well-defined binary format like RIFF.)
Re:XML uses a binary format by mikael · 2006-12-05 03:46 · Score: 1

I wish I hadn't used all my mod points earlier today, because that's an interesting post... it would be interesting to set up a programming environment with a binary format and specialized editor, though on second thought it might not work so well.

There's always the APL programming language - just about every mathematical function call is mapped to a single unicode character.

--
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
Re:XML uses a binary format by ClosedSource · 2006-12-05 04:44 · Score: 1

"I'm pretty sure not one of my customers ever got a browser to read my documentation; I wrote it in HTML because they've all got browsers already."

You're missing the point. Until about a decade ago almost nobody had a browser.
Re:XML uses a binary format by ClosedSource · 2006-12-05 04:58 · Score: 1

1. Representation schemes that are designed with specific knowledge of the type of data that is going to be represented compress better than a general purpose compression algorithm..

2. Broswers don't accept zipped pages, so the file would have to be manually unzipped before presentation to the broswer.

3. Broswers could be modified for either a binary HTML or to accept a zipped page, but there would be more run-time processing involved to unzip than there would be to natively support a binary HTML.
Re:XML uses a binary format by ClosedSource · 2006-12-05 05:06 · Score: 1

I didn't really mean to suggest that source code should be represented in a different binary format (i.e. non-ASCII), but that data be represented in a more efficient form.

Given how fast browsers were adopted by almost all platforms suggests to me that resisting binary formats is primarily a cultural issue among programmers than rather than a technical or practical one. I come from an embedded background, so it's not a problem for me.
Re:XML uses a binary format by ClosedSource · 2006-12-05 05:18 · Score: 1

I guess my post wasn't that clear, since several people thought I was suggesting that source code should be written without ASCII. I meant data (but alas, I didn't use that word). Nevertheless, I found your post interesting.
Re:XML uses a binary format by 2short · 2006-12-05 05:27 · Score: 1

I'm not missing the point. When most people didn't have browsers, I didn't write my documentation in HTML, and it would have been silly to ask me to, becasue people couldn't read it.

Today, it is silly to ask me to encode my data in a custom format that hasn't been widely adopted if I want others to be able to manipulate it. Maybe one such custom-xml-compression schemes will get widely adopted; and then I'll start using it. But today I'm going to go with zip-compressed XML, because people can read it.
It's the usual chicken-and-egg adoption-rate problem to be sure. But ASCII, html, XML, and Zip compression made the jump. If this custom scheme has something compelling to offer, it will too. I do find that hard to imagine, since I don't see what it offers over zip-compressed XML, which is already there.
Re:XML uses a binary format by 2short · 2006-12-05 05:38 · Score: 1

Which is basically what I was trying to say with my second paragraph. Everyone can handle zip-compressed XML right now, so a custom scheme would have to offer a big advantage over that to get anywhere, and it won't. As for CPU-friendliness; it's a non-issue. The average PC out there has a CPU that can decompress a zip file faster it can be read off that same PCs disk, let alone downloaded over a net connection; and that gap is widening. In the coding I do with very large amounts of data, we regularly compress stuff before writing to disk, not to save space, but to save time by using CPU power to reduce disk load.
Re:XML uses a binary format by radarsat1 · 2006-12-05 06:05 · Score: 1

I agree. The CPU issue only becomes important when talking about servers accepting thousands of connections at a time, where all data being passed around is gzipped XML. Believe it or not, I have read that there are situations where CPU time can exceed the I/O bottleneck. Not to mention the power consumption associated with it. It all depends on what kind of scale you are talking about.
Certainly, for most applications, the CPU thing is a non-issue.
Re:XML uses a binary format by ClosedSource · 2006-12-05 06:54 · Score: 1

I was just suggesting that a non-ASCII XML would be better. I didn't expect you to start using a hypothetical standard immediately. I have no expectation that such a standard will be widely used because there are strong cultural taboos against it in the programming world, particularly in the UNIX commmunity that created the web standards.
Re:XML uses a binary format by RobbieGee · 2006-12-05 07:04 · Score: 1

2. Broswers don't accept zipped pages, so the file would have to be manually unzipped before presentation to the broswer.

I assume you meant Internet Explorer doesn't accept compressed pages, because both Opera and Firefox does. I don't know about Safari, but I would assume so.

--
If you get this, we're 10 of a kind.
Re:XML uses a binary format by ClosedSource · 2006-12-05 08:37 · Score: 1

I stand corrected.
Re:XML uses a binary format by Magnus+Reftel · 2006-12-05 10:32 · Score: 1

(Back home so I can reply logged in)

1. Representation schemes that are designed with specific knowledge of the type of data that is going to be represented compress better than a general purpose compression algorithm..

Yes, but not by much, unless you count destructive compression. HTML and other textual formats compress very well using general-purpose compressors (as one would expect). See for instance Keith Packard and James Gettys' LBX postmortem - ssh's built-in gzip compression is close enough to LBX's specialized X11 compression for it not to matter.

2. Broswers don't accept zipped pages, so the file would have to be manually unzipped before presentation to the broswer.

They do. Mozilla, Opera, and even Internet Explorer.

3. Broswers could be modified for either a binary HTML or to accept a zipped page, but there would be more run-time processing involved to unzip than there would be to natively support a binary HTML.

The performance impact is small enough for most people not to be aware that they are already using it. Unless you're on an ancient browser, you already are using it yourself - Slashdot has used gzip encoding for ages.

But perhaps I should expand on my original point ("man gzip" might have been a bit terse ;-) ): textual formats (roughly meaning files in some ascii-like encoding that use letters and punctuation for markup and text for data) are easy to work with, as text files have an astounding tool support. Specialized binary formats usually lack tools for anything beyond basic editing and viewing (wot - no AWK?).

The benefit of specialized binary formats is in parsing speed and file size, but the size part is not that important, since there are lots of good text compressors that let you get almost all the size advantage at almost none of the cost. The only thing left for specialized binary formats is parsing speed, but processing power stopped being the bottle-neck for most systems years ago (now it's about bandwidth, repeating the previous point).

--
print "Yet another p{erl,ython} hacker\n",

XML nightmare by rgaginol · 2006-12-04 17:25 · Score: 4, Insightful

If XML Schema was a work colleague they would be Wally from Dilbert - it's not that things are impossible to do with it, it's just that the relative simple things become hard and the complex almost impossible. Due to the fact that almost anything is possible with XML schema with enough work (weeks, months years...) instead of just scrapping it, people keep at it doggedly despite the number of times we get bitten. I'd love to see the community move more completely to RELAX NG if it makes my life easier.

Re:XML Totally Sucks - All of it! by Nataku564 · 2006-12-04 17:32 · Score: 1

Some companies have to interoperate with each other. And by some, I actually mean nearly all of them. Most of the data exchanged comes out of a database at some point, and as such, is naturally able to be put into a hierarchy with reasonable ease. XML, and similar formats, make this much less painful than if you had to flatten it out.

Where I currently work, I get data from several dozen other large companies. Most of it is not in XML. We generally have 2 people full time just on maintaining the parsers. If they were all in XML, the amount of maintainence required would be next to nothing.

XSD: "Mission Accomplished!" by SimHacker · 2006-12-04 17:33 · Score: 3, Funny

From the xml-dev mailing list:

From: Rick Jelliffe
To: xml-dev@lists.xml.org
Date: Wed, 29 Nov 2006 12:46:06 +1100

Robert Koberg wrote:

I wonder if the people who think RNG won have "Re-elect Gore" bumper stickers...

Maybe a better analogy would be that the people who say that XSD is lovely is Mr Bush's "Mission Accomplished!"

Though of course there are differences between Iraq and XSD. One seems to be about people with their own fiefdom agendas stubbornly miring us in a quagmire, using a grabbag of thin reasons to justify it, denying any evidence that things are not rosy, perpetually promising that things are turning around, and enmeshing all sorts of decent people in a life of horror, difficulty and with no confidence in accomplishing the mission. The other is in the Middle East.

Just joking...
Rick

--
Take a look and feel free: http://www.PieMenu.com

Re:XML Totally Sucks - All of it! by Nataku564 · 2006-12-04 17:36 · Score: 1

This is exactly what DTDs and XSDs are there to take care of. Relying on the document to follow your own in house rules is the exact opposite of what you are supposed to do, in fact. The format document defines exactly what your parser should be doing/expecting, and if your data vendor doesn't respect that contract, its very easy to show who is in the wrong. With flat files, all I have to do is add an extra column and your parser will die. For one time imports this doesn't matter, but most business processes are not one time.

Slashdot Tags by sc0p3 · 2006-12-04 17:51 · Score: 1

Slashdot tags are officially useless. Who the hell is going to search for "dontdoit" when looking for this article.

Re:Slashdot Tags by Mr2001 · 2006-12-04 18:04 · Score: 1

Or "sharks" when looking for any article about lasers. Or "itsatrap"/"fud" when looking for any article about anything. We need a way to moderate tags.

--
Visual IRC: Fast. Powerful. Free.
Re:Slashdot Tags by An+ominous+Cow+art · 2006-12-04 18:11 · Score: 1

Relax. :-)
Re:Slashdot Tags by jZnat · 2006-12-04 20:39 · Score: 1

Well, you can be sure that most articles tagged with "sharks" is actually about lasers, and articles tagged with "itsatrap" are probably about Microsoft.

--
'Yes, firefox is indeed greater than women. Can women block pops up for you? No. Can Firefox show you naked women? Yes.'
Re:Slashdot Tags by jpkunst · 2006-12-05 04:41 · Score: 1

Indeed. I turned tags off a long time ago and haven't missed them one bit. It seems that the tagging system quickly devolved into a playground for trolling.
Re:Slashdot Tags by Mr2001 · 2006-12-05 07:33 · Score: 1

Except for that day when every single story was tagged "itsatrap", and only one of them was about MS.

--
Visual IRC: Fast. Powerful. Free.

Relax NG. by miguel · 2006-12-04 18:07 · Score: 1

Mono has complete support for RelaxNG in the form of the Commons.Xml.Relaxng assembly.

In addition to RelaxNG, it provides NVDL and RNC support.

Re:Relax NG. by SeaFox · 2006-12-04 18:16 · Score: 2, Funny

Mono has complete support for RelaxNG in the form of the Commons.Xml.Relaxng assembly.

So should the lesson here be to "RELAX if you have MONO"?

Relax NG - constraining based on attribute values by SashaMan · 2006-12-04 18:13 · Score: 1

As someone who has used XML schemas pretty extensively, I was pretty amazed at how I was able to skim through the tutorial in about 10 minutes and understand Relax NG, versus reading an entire XML Schema book and still needing to refer to it whenever I write schemas.

One thing I really like about Relax NG is that it's possible (with very easy syntax) to constrain the XML structure based on an attribute value, something you can't do in schema or a DTD. For example, suppose you want to have an XML element:

true
'
With Relax NG it's possible to constrain the text in the arg element (e.g. "true" or "false") based on the value of the type attribute. For example, if type="int", you could limit the text in arg to an integer value. This is something you can't do in schemas or dtds.

Re:XML Totally Sucks - All of it! by l810c · 2006-12-04 18:39 · Score: 1

The Sky is Blue!!!

Why just RELAX when you can REST too? by SuperKendall · 2006-12-04 19:06 · Score: 1

Since you are simplifying your life by making the schema for web requests simpler, why not go all the way, ditch SOAP, and embrace REST for XML-over-HTTP communications?

--
"There is more worth loving than we have strength to love." - Brian Jay Stanley

Re:20-year prediction by SimHacker · 2006-12-04 19:24 · Score: 1

I believe James Clark, who co-designed Relax/NG, understands and programs in Lisp pretty well (as well as Haskel, Java, C and many other languages). He helped design and implement DSSSL (wikipedia article), which is based on Scheme, and led to XSLT, which he also designed.

-Don

--
Take a look and feel free: http://www.PieMenu.com

XML is like Electricity by SimHacker · 2006-12-04 19:27 · Score: 4, Insightful

It's good for transmitting information/energy, but it's not good for storing it.

-Don

--
Take a look and feel free: http://www.PieMenu.com

Re:XML Totally Sucks - All of it! by unity100 · 2006-12-04 19:54 · Score: 1

well, if your company have developed a mutual interface for 12 companies at the start, then you wouldnt need to have people parsing the data to your database.

--
Read radical news here

I call this the LineOfView (as in PoV) Problem by Qbertino · 2006-12-04 19:57 · Score: 4, Insightful

I call this the Line of View (as in PoV) or 'Horizon' Problem. The general problem is this: In XML we've got a standard that is universal for displaying n-dimensional structures in a basically 1-dimensional enviroment. (For the time being, we're ignoring that XML text ususally goes from left to right and top to bottom, making that something 2D to look at)
The question now is: where do you draw the line of view? Along which line do I take my knife to cut open my n-dimensional structure to unravel it and flatten it out into a 1-dimesional string of characters? This is a problem that is impossible to solve satisfactory for all possible PoVs or - as I say - Lines of View, or better yet, Horizons to the structure. Will I unravel my DB of books by authors? By issues? By vendors? By publishers or by weight and size? ... At some point you will have to look at in which way you want to handle your stuff and which way you're going to unravel it. This will undoubtly influence on how much XML clutter you will have to construct. With XML it's the same as with databases: It/they will allways be pathetic crutches for us to latch on to the real work. Undispensable, but crutches nontheless.

What I'm getting to is this: mapping n-dimensional stuff to 1-dimensional structures will allways suck one way or the other. It's just that with XML we all start agreeing upon in which way it's supposed to suck. I don't think that changing the Schema standard (or worse: introducing additional standards) will actually attack this hard problem. I have a strong suspicion that Relax NGs relief is illusional, short term and re-introduces downsides that XML Schema allready has takled with it's pesky and strict nature. For one it would be consistency with the View-Horizon once chosen all the way through the given data-structure. I don't know for shure - go test and find out - but I do know that universal serialization will allways come with downsides and RelaxNG (or any other schema) won't change that.

--
We suffer more in our imagination than in reality. - Seneca

Re:I call this the LineOfView (as in PoV) Problem by julesh · 2006-12-04 22:00 · Score: 1

I think your problem is that you're using XML to perform the job of a relational database.

Not all tasks can be solved with the same tools.
Re:I call this the LineOfView (as in PoV) Problem by firewrought · 2006-12-05 06:42 · Score: 1

I don't think that changing the Schema standard (or worse: introducing additional standards) will actually attack this hard problem.

A schema language is suppose to give the developer a tool for validating instance documents. Relax NG approaches this from the same Line of View (to use your terminology) that's being taught in thousands of compiler/information theory courses and that's been deeply baked into existing programming platforms (in the form of regular expressions, context-free grammars, etc.).
XML Schema, on the other hand, takes a schizophrenic's Line of View on the issue. All that extra verbage/abstraction/indirection is a castle in the cloud which obscures (instead of facilitating) meaning; it makes XSD a write-only language.

--
-1, Too Many Layers Of Abstraction
Re:I call this the LineOfView (as in PoV) Problem by blafasel · 2006-12-05 20:33 · Score: 1

Put differently: you want to be able to describe any graph, while XML is built on the tree structure. So obviously, you have to do something clever about the circles in your graph... that's why s-expression-based programming languages allow for recursion and non-local jumps. Since s-expressions and xml are (largely) equivalent, you're of course free to do so, though it's hairy to get right.

--

check your speling

Wait wait wait by bytesex · 2006-12-04 20:53 · Score: 1

This guy claims that this: <element name="addressBook" xmlns="http://relaxng.org/ns/structure/1.0"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element> </zeroOrMore> </element> is easier to read than this: <!DOCTYPE addressBook [ <!ELEMENT addressBook (card*)> <!ELEMENT card (name, email)> <!ELEMENT name (#PCDATA)> <!ELEMENT email (#PCDATA)> ]> WTF ?!

--
Religion is what happens when nature strikes and groupthink goes wrong.

Re:Wait wait wait by Anonymous Coward · 2006-12-05 00:52 · Score: 1, Informative

No, this is:

start =
element addressBook {
element card {
element name { text },
element email { text }
}*
}
Re:Wait wait wait by Skjellifetti · 2006-12-05 09:42 · Score: 1

You've missed the point. DTDs don't allow for many of the common restrictions people want to place on their data values. Schemas are a poor and incomplete attempt to solve this problem.

--
FreeSpeech.org

Relax NG experiences by Tim12s · 2006-12-04 21:23 · Score: 1

I have enough experience with Relax NG to say that it is great.

The compact syntax is enjoyable as you can be quite precise (compared to XSD) and there are tools that convert between the compact syntax and the xml Relax NG syntax allowing you to use syntax that suites your needs. In general, JING it is quite a bit quicker than a few of the XSD validators for comparably complex schemas.

There are a few disadvantages:

* The full range of tools that are available are not advanced on a regular basis. I found a few bugs in the JING source code and had the opportunity to fix them where necessary.

* I feel that RelaxNG is marginalized because of XSD and along with that goes alot of additional OSS support. They are maintained by individuals instead of teams. I would recommend that the author of JING puts his software forward to the apache foundation (jakarta commons) and see if it can attract a bit more attention.

* Web services are a bit of a sticking point. The use of a Relax NG schema can be embedded into the WSDL, however, the various 3rd party clients may not necessarily understand the schema, and by extension, they would not generate any supporting classes making integration with a relax NG defined webservice a little more complex than it needs to be.

Relax NG really is great.

-Tim

RELAX compact syntax = BNF notation by master_p · 2006-12-04 22:48 · Score: 1

I don't see why XML schemas has to exist. BNF notation serves the exact same purpose: it describes a grammar. A BNF-like derivative is more than enough to define XML schemas. The compact syntax of RELAX NG is just that, and a bright idea.

It is really annoying when CS has to be discovered all over again. The problem of validating text to a certain format has been solved many decades ago, and BNF and variations of are known from the 60s...

Re:RELAX compact syntax = BNF notation by portnoy · 2006-12-05 06:11 · Score: 1

It's because they don't just describe a grammar. They also define a conceptual arrangement for the data, and can be used to express what the common types of the document are. If all they cared about was syntactic grammar, BNF forms would be absolutely fine -- and indeed you can find references to arguments about whether to use DTDs or BNFs for some internet XML structures where syntax was the main concern.

But BNFs by their nature don't have a formal means to differentiate between syntactic rules that define a major structural component of the document, and rules which are present simply to encode a restriction on the data format. Schema structures were designed to formally encode the structural information as well.
Re:RELAX compact syntax = BNF notation by SimHacker · 2006-12-05 19:17 · Score: 1

What's annoying is how the politically-oriented industry-driven XSD committee ignored years of computer science and language theory, and instead came up with an ugly mish-mash of inconsistent hacks and kludges held together with scotch tape and bailing wire.

What's delightful is how the maverick geniuses who came up with Relax/NG solidly based it on "hedge automata theory", which is an extension of "tree automata theory" that applies to XML data.

-Don

--
Take a look and feel free: http://www.PieMenu.com
Re:RELAX compact syntax = BNF notation by master_p · 2006-12-05 22:24 · Score: 1

It's because they don't just describe a grammar.

Oh they do. Please read on.

They also define a conceptual arrangement for the data

BNF grammars do just that. For example:

foo1 = foo2 bar1 | foo3
foo2 = bar2 bar3+ | A
bar1 = A | B | digit

, and can be used to express what the common types of the document are.

So can BNF. Grammar rules are types which can be reused in different parts of a document.

Please remember that any program (and data is a program!) can be expressed as a grammar! That is a proven theorem in computer science. If there was a computer with infinite power, programs would be checked by writing a grammar for each program...

If all they cared about was syntactic grammar, BNF forms would be absolutely fine -- and indeed you can find references to arguments about whether to use DTDs or BNFs for some internet XML structures where syntax was the main concern.

The document you present has no argument against BNF. All it says is that some people don't know or understand BNF.

But BNFs by their nature don't have a formal means to differentiate between syntactic rules that define a major structural component of the document,

So what was the example I gave above, if not what you describe?

and rules which are present simply to encode a restriction on the data format.

With BNF, you can describe any restriction you like.

Schema structures were designed to formally encode the structural information as well.

That's exactly what BNF does.

Re:XML Totally Sucks - All of it! by giuntag · 2006-12-04 23:44 · Score: 1

In most of the 'xml as hierarchical data storage' usage cases, JSON is what-it-should-have-been since the beginning (basically, ini files with nested structures):
- no element vs. attribute headaches
- no element-with-data-inside vs. element with elements inside headaches
- no way to declare external entities, cdata sections and other obscure features
- freaking easier to parse
- specs out very clearly charset encoding and escape sequences
maybe we'll have a post from Tim about that in a couple of years...

Unfortunately, there is no automatic fetching.... by wowbagger · 2006-12-05 01:04 · Score: 1

(damn short subject lines!)

I agree that RelaxNG is much easier to read, and it will much more completely describe a grammar than will the other standard - and MUCH more completely define it than will a DTD.

Unfortunately, as far as I can tell there is no way to, within an XML document, state "Use THIS RelaxNG schema file to validate this document", as you can with a DTD. Thus, even if I have placed my RelaxNG schema on my web server, I cannot set things up such that (for example) libXML2 can automatically fetch that schema when it starts parsing my document. I can map the RelaxNG schema to a DTD (losing information) and allow that to be fetched, but if I want to use a RelaxNG schema with libXML2 I the programmer must tell libXML2 where the schema is.

IMHO it would be a Good Thing if the W3C would standardize on some way to associate a RelaxNG schema with a given XML file - say, by some form of XML processing directive within the XML file.

--
www.eFax.com are spammers

Re:XML Totally Sucks - All of it! by jmyers · 2006-12-05 01:21 · Score: 1

I have used EDI and sucks almost as bad as XML. I have written more flat text file parsers than I can count. Nested data and escaped characters are no problem in flat text if the format is well defined.

I use XML every day in various applications. In my opinion is serves no purpose other than bloat. I would be all for a standard text format but XML is just ridiculous.

Re:it's a rather straightforward observation by LizardKing · 2006-12-05 01:28 · Score: 1

XML was intended to be easy to PARSE, not easy to read.

Correct, XML is slightly easier to parse because of explicit end tags but most people disabled short tag support and enforced end tags in SGML anyway (in the syntax declaration and DTD respectively). However, saying that XML is not as easy to read as SGML is stretching things a bit - I find them to have the same legibility, although when using namespaces in XML I find they tend to result in long tag names that obscure things a bit. The things that appeared to be novel or improvements in XML were the discarding of the antiquated syntax declaration (a hangover from early GML days) and the concept of "well formed XML" without explicitly requiring a DTD.

Re:Relax NG - constraining based on attribute valu by TheRaven64 · 2006-12-05 01:48 · Score: 1

As someone who has used XML Schema a little, it amazes me that no on thought to shoot the designers as soon as they published the first draft. I've learned entire Turing-complete programming languages in less time than it took me to get to even moderate competence with Schema (Lisp, Erland and Smalltalk, for example, all took less long to learn than Schema; I could write a program in any of them that would validate an arbitrary XML document more easily than I could write a Schema, in spite of spending longer learning Schema).

--
I am TheRaven on Soylent News

Polishing a turd by 955301 · 2006-12-05 03:41 · Score: 1

Schema definition by it's nature is tedious but necessary at this point. If you're going to take a standard thats already entrenched and suggest everyone stop and polish the edges from it how about we kill the verbosity of the xml end-tag instead?

Do we lose anything other than bandwidth use by doing this,

<tagNameThatCanBeLong>Some Text</>

instead of this:

<tagNameThatCanBeLong>Some Text</tagNameThatCanBeLong>

If the next end tag must belong to the last start tag what's the point of naming it?

--
You are checking your backups, aren't you?

Re:Polishing a turd by multipartmixed · 2006-12-05 04:23 · Score: 1

Let me also suggest that we replace the data construct with tag=data[newline]

I've been writing configuration files like that for years, and it works great. The only time I really want tags anyhow is when nesting stuff... and if you live in corporate IT land, you'll realize that doesn't actually happen because people who specify XML configuration files don't usually understand that a container is something that could hold things other than whisky and so forth.

--

Do daemons dream of electric sleep()?
Re:Polishing a turd by 955301 · 2006-12-06 03:13 · Score: 1

Perhaps, but you can make *any* text file messy.
You're example is actually this:

</>
</>
</>
</>
<someTag>Hello</>
</>
</>
</>

Besides, any editor worth it's salt will highlight the bracket association. I guess my grip is that xml is passed over the network far more than it is read by humans, and yet we sacrificed size for human readability. I mean, people suggest using it for RPC!

--
You are checking your backups, aren't you?
Re:Polishing a turd by 955301 · 2006-12-06 03:16 · Score: 1

Editors do do that; however, xml files are passed between computers waaay more often that a human opens the file to read it. Why sacrifice network bandwidth and parsing time for something an editor can also solve.

I mean, we don't name our end brackets to if statements in C except with comments - the machine doesn't have to chew on those during runtime.

--
You are checking your backups, aren't you?

Re:it's a rather straightforward observation by mspohr · 2006-12-05 03:55 · Score: 1

Why do people read XML? It's intended to be parsed!

Reading XML is like reading compiled code. You might have to do it to debug something or to grok how the code works but XML is intended to be parsed, not read.

It seems to me that emphasis should be placed on features that improve parsing, not human readability. I don't know enough about XML or RELAX NG to opine on which is best for parsing but it seems that parsing should be the main criteria of which is "best".

--
I don't read your sig. Why are you reading mine?

Someone needs a dictionary.com link by p3d0 · 2006-12-05 04:29 · Score: 1

Flame: to insult or criticize angrily

--
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....

It's a bona fide Kuro5hin reunion by p3d0 · 2006-12-05 04:53 · Score: 1

En tee

--
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....

YES YES YES! by Anonymous Coward · 2006-12-05 07:33 · Score: 1, Insightful

I just showed your example to a half dozen people (some programmers, some managers) and they agree that the longer form is vastly more readable and understandable.

Shit, you think people are born knowing what an asterisk postfix means? Terseness != Clarity.

XML by DragonWriter · 2006-12-05 08:03 · Score: 1

I believe you are looking for lisp. It's XML cleaned up, simplified and hulkified.

Not Lisp, but S-expressions, which are the basis of Lisp syntax; Lisp is an "application" of S-expressions, the same as XML applications are applications of XML. S-expressions extended with something similar to XMLs encoding declarations could substitute for XML and would be arguably cleaner—certainly, cleaner to Lispers, though I'm not so sure that:

(foo (bar baz (spam: "eggs"))
is really more readable (rather than just more compact) than:

<foo> <bar> <baz spam="eggs"/> </bar> </foo>

Re:XML by DragonWriter · 2006-12-06 12:36 · Score: 1

Uh. yeah, I've read that exact argument before. I took it into account. I never argued that an S-expressions should or was likely to replace XML, I said an S-expression based syntax that was representationally-equivalent to XML and more concise and cleaner for some uses (particularly, applications where the "marked-up text" is not a good description of what is being transmitted, which are not all that uncommon for XML now that it has become a common generic data interchange format as much as a "markup" language) could be developed.

Also, that Common Lisp and Scheme languages "predate" Unicode says absolutely nothing about the utility of S-expressions as a basis for a markup/data-interchange language, though certainly it is (one of many) arguments against the position (which I've specifically stated is wrong previously in this thread) that Lisp-as-is is suitable as drop-in replacement for XML (a bigger argument is that Lisp is the wrong "level" of solution, anyway, being an too-specific application of S-expressions.)

But, anyhow, thanks for cut-and-pasting, without comment or application to the discussion at hand, an argument against a position that wasn't even the one being discussed, but only tangentially relevant.

Re:Relax NG - constraining based on attribute valu by Skjellifetti · 2006-12-05 09:28 · Score: 1

I've learned entire Turing-complete programming languages in less time than it took me to get to even moderate competence with Schema

What do you expect? Schemas were a Microsoft initiative IIRC.

--
FreeSpeech.org

Re:it's a rather straightforward observation by GroovinWithMrBloe · 2006-12-05 10:29 · Score: 1

XML was intended to be easy to PARSE, not easy to read. From the Origin and Goals of XML: XML documents should be human-legible and reasonably clear. True, their primary purpose isn't to be read by human beings, but comparatively it is superior to non-ASCII binary formats by what I would say a significant amount.

Re:XML Totally Sucks - All of it! by Nataku564 · 2006-12-05 14:52 · Score: 1

That only works if you are the larger company. Most of these guys are huge international brokerages - they aren't changing for us.

Just keep the data types by ishmalius · 2006-12-08 23:24 · Score: 1

The only thing I've found useful from the Schema namespaces is the set of datatypes (int, float, string, etc) which are quite useful for other things.

Could W3C please split these off into their own "standard" namespace family?

Re:Relax NG - constraining based on attribute valu by sh4na · 2006-12-09 14:11 · Score: 1

With Relax NG it's possible to constrain the text in the arg element (e.g. "true" or "false") based on the value of the type attribute. For example, if type="int", you could limit the text in arg to an integer value. This is something you can't do in schemas or dtds. Uh? WTF are you talking about? Of course you can restrict text in a schema. I ditched DTD years ago for XSD exactly for that reason, you can restrict pretty much anything in a xsd so you can validate your xml structure *and* data with it. You can even build your custom types and extend them if you need...

Some restriction examples:

<xs:element name="date" type="xs:dateTime"/> <xs:attribute name="noMoreNoLess"> <xs:simpleType> <xs:restriction base="xs:int"> <xs:minInclusive value="0" /> <xs:maxInclusive value="999" /> </xs:restriction> </xs:simpleType> </xs:attribute> <xs:attribute name="pattern"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="^([a-zA-Z0-9_\-])+(\.([a-zA-Z0-9_\-])+)$">< /xs:pattern> </xs:restriction> </xs:simpleType> </xs:attribute> <xs:attribute name="Status"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration id="Single" value="S" /> <xs:enumeration id="Married" value="M" /> </xs:restriction> </xs:simpleType> </xs:attribute>

Is that enough restriction for you?

XSDs might be too complex for their own good, but if you're gonna bash them, at least know what you're talking about first. And btw, who the heck uses DTD nowadays? I never thought I'd see people mentioning those in 2006! Who in their right mind would use a non-xml-compliant definition file to validate a xml file? Weird...

--
shana
......gone crazy, back soon, leave message

139 of 180 comments (clear)