Tim Bray on the Birth of XML, 10 Years Later

Is this paper available in OOXML format? by Anonymous Coward · 2008-02-18 04:42 · Score: 1, Funny

Just wondering as I'd love to read it! ;)

Thanks BillG

Classic by Gothmolly · 2008-02-18 04:45 · Score: 5, Funny

Young Buck: Hey, we have a data exchange problem between two systems, lets use XML !
Greybeard: Ok, but now you have 2 problems.

--
I want to delete my account but Slashdot doesn't allow it.

Re:Classic by smittyoneeach · 2008-02-18 05:07 · Score: 5, Insightful

In defense of XML, the parsing problem is handled.
Best wishes on solving the semantic snarls.
XML, like all good approaches, handles mechanism, not policy.

--
Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
Re:Classic by fireboy1919 · 2008-02-18 05:29 · Score: 3, Interesting

In defense of XML, the parsing problem is handled.

To me that says that XML handles a problem that wasn't there. Parsing problem for pretty much everything is almost universally solved by regex...

I don't really care about the XML format. Personally, I'd be happier if it were stored in binary. The thing I like is the DOM tree as a data construct, XPath as a means of addressing, and XQuery as a means of getting parts out of it. (XSLT is okay, but from my experience, it's a lot clearer to represent a transformation as a series of productions than it is to use XSLT...perhaps a production-oriented approach that used XPath addressing?)

With those, you've got a good mechanism for serializing, reading, and deserializing objects, classes, and all manner of other things.

There are only a few problems with this:
1) Non-ancestor relationships and references (i.e., having the same node as multiple locations in the XML document) are not covered by XML, but are possible with objects.

2) Attributes in XML have no obvious mapping to objects...so what do you do with them?

I wish we could use something like XML (in that it could use DTDs as schemas, and had support for DOM methods along with XQuery and XPath), but with a more effecient format (binary), and with the ability to encode references.

That would be just about perfect.

--
Mod me down and I will become more powerful than you can possibly imagine!
Re:Classic by oyenstikker · 2008-02-18 05:45 · Score: 4, Insightful

There are only a few problems with this:
1) Non-ancestor relationships and references (i.e., having the same node as multiple locations in the XML document) are not covered by XML, but are possible with objects.
You can with refids and keys.
but with a more effecient format (binary)
It is wonderful to be able to easily read and edit the data in a text editor. If you want it more compact for storage and transmission, compress it. I understand that a binary format could lead to more efficient processing and parsing, but I think the benefits of readable text outweigh the efficiency.

--
The masses are the crack whores of religion.
Re:Classic by Flambergius · 2008-02-18 06:08 · Score: 3, Insightful

To me that says that XML handles a problem that wasn't there. Parsing problem for pretty much everything is almost universally solved by regex...

XML doesn't handle parsing. XML makes parsing easier; in fact so easy that parsing XML isn't a problem anymore.

For an expert, I think XML and regex are complementary techniques. For anyone other than an expert regex are way too brittle. Ordinary people need to be able to operate on their data, it can't require voodoo. (Not that XML in all its arcane application is anything close to plain English, but it's much better than custom data formats and regex.)

--
Computers are useless. They can only give you answers - Pablo Picasso
Re:Classic by Anonymous Coward · 2008-02-18 06:22 · Score: 0

Except that, normally, you don't read XML. And with a simple binary format, an "xml-editor" is piece of cake and you retain the 'easily editable' advantage.
Re:Classic by shutdown+-p+now · 2008-02-18 06:34 · Score: 2, Informative

To me that says that XML handles a problem that wasn't there. Parsing problem for pretty much everything is almost universally solved by regex...
God, no... another Perl hacker...
Regex are not a solution to everything, and most certainly not to writing fast parsers!
(Not that XML is easy to parse fast, but that's another story. You still don't write a JSON parser using regex.)
Re:Classic by Etrigoth · 2008-02-18 06:48 · Score: 1

Parsing problem for pretty much everything is almost universally solved by regex... As a moderate to heavy user of both Regex and XML, I cant agree with this statement; for the fact that in its proper form, Regex is not a Push-down automata and doesn't allow you to test for balanced groups/nested contructs.
If you're parsing any kind of ML with Regex, this becomes IMHO a show-stopper.

Ok ok, I know some Regex implementations implement a stack, the .Net one for example, but I find it about as clear as Klingon Algebra :)
I believe you should strive for legibility and clear layout when coding, and especially in these kinds scenarios - You'll thank yourself when you have to try and debug it 4 months down the line!

--
When we remember we are all mad, the mysteries disappear and life stands explained.
Re:Classic by typicallyterrific · 2008-02-18 07:24 · Score: 1

Honestly, it seems to me that parsing XML in C++/Java feels a lot like voodoo, at least based on what I had to write this summer.
I'm open to the idea that I failed catastrophically at researching the problem, but the best I could find in Java was an event-based SAX parser which is frankly a pain in the ass to use compared with say, Ruby's rexml (I'm quite aware which one is likely to be more efficient, mind you).

Maybe things have changed in the Java API since 1.4.2 (what my company standardized on), but from what I can tell, things are much more complicated than just providing xpaths.

Seeing people recommending regexes to parse XML leads me to believe that most other people haven't found any good solutions, either, or at least aren't terribly worried with parsing correctness.
Re:Classic by xero314 · 2008-02-18 09:37 · Score: 1

1) Non-ancestor relationships and references (i.e., having the same node as multiple locations in the XML document) are not covered by XML, but are possible with objects. XML is fully Extensible so creating and element that references other elements is very easy. There are plenty examples of XML subset that do just that.
2) Attributes in XML have no obvious mapping to objects...so what do you do with them? XML Nodes map to properties. XML Attributes are XML Nodes, as are XML Elements, so they both map to properties. The only difference is that an Attribute can not contain other nodes while an Element can. There is also nothing in the XML spec that states you have to use Attributes in your XML subset.
I wish we could use something like XML (in that it could use DTDs as schemas, and had support for DOM methods along with XQuery and XPath), but with a more effecient format (binary), and with the ability to encode references. Might I suggest checking out Binary XML
Re:Classic by xero314 · 2008-02-18 09:45 · Score: 1

but the best I could find in Java was an event-based SAX parser There are plenty of other options for parsing XML in Java, SAX is only one API and it was designed to be flexible without being complex or include a large API, which it succeeds at. Other options includes things such as DOM, StAX and Digester, as well as plenty of others. There are even abstractions on top of SAX that make it's use easier, but increases the API size significantly.
Re:Classic by Anonymous Coward · 2008-02-18 11:58 · Score: 0

In defense of XML, the parsing problem is handled.

Let's all pat ourselves on the back. We've solved the non-problem of parsers, and it only costs us about 2x in size. So now when people screw up parsing, we can at least say it's because they misused DOM, or SAX, or weren't using an XML API with the right features, or their XPath query isn't quite right (even though a trivial loop would have worked fine), or the DOM/SAX API they were using (despite being quite popular) had a bug they happened to run into (this happened to me just last month). But at least it wasn't *technically* a parsing problem! Yay us!
XML, like all good approaches, handles mechanism, not policy.

Yeah, like that Macintosh, that tried to handle user-interface policy? Users hate it when things have consistent policies like that. X11 is so much easier to use, and X11 toolkits that try to suggest policy, like Qt and GTK+, are largely ignored. And Ruby on Rails, which tries to define policy? Look at how few people like that! People hate it so much they're porting the same thing to other languages. When will system architects learn that suggesting policy is a bad thing, and it's best to leave every aspect of policy up to individual programmers?
Re:Classic by martin-boundary · 2008-02-18 13:43 · Score: 1

Hey, you can build Peano arithmetic out of this!
"0" = data exchange problem.
"1" = XML(data exchange problem)
"2" = XML(XML(data exchange problem))
"3" = XML(XML(XML(data exchange problem)))
etc.
"x" + data exchange problem = "x"
"x" + XML("y") = XML("x" + "y")
Re:Classic by Zaiff+Urgulbunger · 2008-02-19 03:45 · Score: 1

There's nothing preventing anyone from doing exactly that, but I think it makes sense for the default format to be text based. If you have an application that would particularly benefit from a more compact binary format (e.g. limited bandwidth and/or cpu, e.g. mobile applications) then you can do that. But if XML had been binary by default then I think it would've made it less accessible and therefore less likely to be widely adopted.
Re:Classic by poot_rootbeer · 2008-02-19 04:00 · Score: 1

In defense of XML, the parsing problem is handled.

Mostly -- the debate over object-oriented parsing vs. stream-oriented parsing continues on, and always will until hardware resources become infinite.

XML, like all good approaches, handles mechanism, not policy.

A Truly Good approach will not allow policy to go un-handled.
Re:Classic by poot_rootbeer · 2008-02-19 04:10 · Score: 1

Parsing problem for pretty much everything is almost universally solved by regex...

1. You begin with a problem.
2. You apply a solution that uses regular expressions.
3. Now you have \d+ problems.

I love the power of PCRE, but it's a write-only language. As a pattern's complexity increases, its maintainability approaches zero. I'd rather leave complex data structures to a parser specifically designed to handle them.

I don't really care about the XML format. Personally, I'd be happier if it were stored in binary.

What kind of binary? 8-, 16-, 32-, 64-, or 128-bit word size? Signed or unsigned? Big-, little-, or middle-endian?

Sure it's not the most efficient encoding, but there isn't a rational piece of data that can't be expressed somehow as a sequence of UTF-8 characters.

2) Attributes in XML have no obvious mapping to objects...so what do you do with them?

Attributes are object properties, just like object children. There's just additional context about the property implied by its attribute-ness.

XML and Interfaces by PIPBoy3000 · 2008-02-18 04:46 · Score: 2, Insightful

I realize the XML is used for a lot of things, but whenever my fellow developers learn that the vendor is shipping us some interface in XML, the groans are audible. About half the time, their XML format isn't quite standard, and we've got to dig around for utilities to try and work with it (or write something custom). I'd say the vast majority of our interfaces are good ol' delimited text files.

For other purposes, XML is great and very readable, but I'm not sure it makes sense to use it everywhere.

Re:XML and Interfaces by MBCook · 2008-02-18 05:02 · Score: 4, Informative
Here are some of the "fun" things I have run across in other people's (almost certainly custom) XML interpreters/producers:
- Tags must be upper case
- Tags can't be upper case
- You must put line breaks between elements
- There can't be any whitespace between elements
- It's import to URL encode the XML before it gets sent from them to me
- You don't need CDATA blocks, just put the ampersands and >s right in there, it'll be OK
- Your XML should all be inside a CDATA block in container XML
- No tags can self-close
- Self closed tags need a space between the slash and bracket
- Self closed tags can't have a space between the slash and bracket
That's just what I can think of off the top of my head. We've seen quite a bit of crazy stuff. If everyone would just use one of the already written XML producers or parsers (the big ones, the ones that work) life would be much easier around here from time to time.
--
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
Re:XML and Interfaces by Aladrin · 2008-02-18 05:08 · Score: 1

It never makes sense to use any 1 thing 'everywhere', but if people would actually stick to the standard and use it intelligently, XML could be very beneficial.

Unfortunately, as you point out, very few do. I'm sick of not-quite-standard crap as well. It's a nightmare to work with... Even moreso than no standard would have been.

--
"If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
Re:XML and Interfaces by Anonymous Coward · 2008-02-18 06:54 · Score: 0

# Tags must be upper case
# Tags can't be upper case

XML is case sensitive, so if a particular XML-based format is defined to use lower case, or upper case, or camel case, or whatever, that's what you have to use.
# You must put line breaks between elements
# There can't be any whitespace between elements

XML parsers generally return whitespace to the application as it appears in the document, so applications with those requirements may be somewhat sloppy, but not completely insane (having said that, the latter is probably saner than the former).
Re:XML and Interfaces by netpixie · 2008-02-18 06:59 · Score: 1

>> * Self closed tags can't have a space between the slash and bracket

My reading of this http://www.w3.org/TR/2006/REC-xml-20060816/ (especially production 44) was that the slash and the bracket *MUST* be together.
Re:XML and Interfaces by Frans+Faase · 2008-02-18 07:18 · Score: 1

My experiences with some of generic XML parser is not very good. (Technically speaking we are dealing here with lexical scanners, not parsers.) Especially the SAX interface is not a pretty one. It is a typical interface that was designed from the inside (the parser) point of view, but not from the user point of view. Also, because XML is very rich, and you hardly every use all of this richness, there is always a performance penalty. If you have to parse megabyte size of XML files that you know only make use of a very limited subset of XML, there is no reason to not use a custom made XML parser. For our application, I developed a XML parser making use of C++ templates in just 9Kbyte. This parser implements a XML iterator with which it is easy to write clear parsing methods that follow the structure of the XML you want to parse. Also because of the use of template and inline methods, a lot of overhead for method calling is done away with.
Re:XML and Interfaces by MBCook · 2008-02-18 07:20 · Score: 1

You'd think. Proper XML isn't a problem. What we run into is people who's XML parsers (which we usually suspect to be customer, often in the form of simple string extraction and not even real parsing) who have these weird little desires that are contrary to the XML spec. Some of it seems sane, some of it is way off (I've seen XML, that gets URL encoded, put in a CDATA block, in XML... and that was how they sent everything).
XML, done right is just fine. But some people just get it very very WRONG.

--
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
Re:XML and Interfaces by MBCook · 2008-02-18 07:22 · Score: 1

That's my understand too. I listed that to make the distinction between it and "you must have a space between the slash and bracket" set. Both upper and lowercase tags are allowed, but some people choose a side (like a bad parser that requires the space) and then require it like it's the law.

--
Comment forecast: Bits of genius surrounded by a sea of mediocrity.

XML was formalized? by damn_registrars · 2008-02-18 04:49 · Score: 1

Considering all the (internet, and elsewhere) crapola that gets passed around as XML, with pretty much anything-you-want included, I don't really understand how we can call it "formalized".

Add to that the fact that then the ability to "display" XML comes down to the whatever-you-want-to-write manner, and I think there are plenty of people who would be hard to convince that there really is a "formal standard" for XML.

Perhaps Duke Nukem Forever will be written with this fantastic standard?

--
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.

Re:XML was formalized? by tilted · 2008-02-18 05:05 · Score: 1

"Perhaps Duke Nukem Forever will be written with this fantastic standard?"

done.
Re:XML was formalized? by Jerf · 2008-02-18 05:43 · Score: 4, Insightful

Yes. XML was formalized. It is strictly defined and easy to check for compliance (with the right tools). Only a little bit of the definition has passed out of common usage, mostly focused around DTDs.

If you encounter a file that claims to be XML, but does not meet the XML standard, then it is not the XML standard that is to blame. The claim is wrong and the file is not XML.

XML is not a fuzzy-wuzzy adjective that can be applied willy-nilly to anything and magically turn it into "XML". It is not a marketing term or English Professor term. It is a rigidly specified engineer term for a document format, and a given document is XML if and only if it meets that format.

If someone wants to hack together a half-assed parser or emitter of any language, they will. I've seen half-assed XML parsers, I've seen half-assed JSON parsers, I've seen half-assed HTML parsers, I've seen half-assed YAML parsers, I've seen ... you get the idea. If a standard can't solve the problem, you can't count the lack of solution against it.
Re:XML was formalized? by operagost · 2008-02-18 05:48 · Score: 1

The fact that incompetents who don't know how to use XML are unable to implement it properly does not invalidate the fact that XML is a well documented standard. It is also not surprising (e.g.: HTML).

--

Gamingmuseum.com: Give your 3D accelerator a rest.
Re:XML was formalized? by damn_registrars · 2008-02-18 05:52 · Score: 1

If a standard can't solve the problem, you can't count the lack of solution against it.
Forgive my lack of knowledge on XML - I primarily just see bad implementations of it.

But what problem was XML supposed to solve? Exactly who/what/where was in need of an extensible markup language, anyways?

--
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
Re:XML was formalized? by msuarezalvarez · 2008-02-18 07:08 · Score: 1

Anyone wanting to combine different markups, like html+mathml+svg+rdf+etc.
Re:XML was formalized? by Jerf · 2008-02-20 06:39 · Score: 1

XHTML makes heavy use of it, at least in theory. XMPP makes heavy use of it, in practice.

XML really is one of the best solution for heterogenous documents or streams that consist of several standardized components stuck together in a standard way. Few people may use it that way, but it does it reasonably well, and there's virtually no competition in that space. (Yeah, in theory you could encode it in a number of other formats, but not in a standard way.)

Re:10 Years and still waiting by CRCulver · 2008-02-18 04:50 · Score: 4, Informative

Do you maintain a website? XML has been a godsend for those who want to maintain web and print output side by side. By keeping your data in an XML format, you can use simple XSL stylesheets to generate multiple types of output. See e.g. O'Reilly's XSLT Cookbook for dozens of very real-world examples (it's probably in your library).

That's just one example of how XML technology has made coding easier. Others I'm sure will point out others.

If you aren't a developer, then I'm not sure XML was supposed to directly revolutionize your end-user experience.

When in Rome by Anonymous Coward · 2008-02-18 04:53 · Score: 0

<Greetings> <Birthday>Happy</Birthday> <Who>XML</Who> </Greetings>

Dupe by Anonymous Coward · 2008-02-18 04:54 · Score: 0

Wasn't there a story about this 10 years ago?

Java and XML, bad tastes that are worse together by Omnifarious · 2008-02-18 04:58 · Score: 4, Insightful

I've recently taken a job at a primarily Java shop. After seeing XML used and abused for ant, maven and various other things I've grown even more disenchanted with it. And now I've also gotten the chance to see that not only does Java represent a poor trade off between the annoyances of a strongly typed language and the speed of a dynamic interpreted one, it has a horrible mess of dependency issues that nobody really solves besides.

I'm much more hopeful about technologies like Thrift and/or D-Bus than I ever was about such abysmal abominations as SOAP, or the only slightly better XML-RPC.

The Java XML world seems like this little closed ecology of mutual masturbators who all come up with more Java and XML 'solutions' to problems that never existed before they started using Java and XML.

I see the value of XML for long-lived documents that don't spend a lot of their life on the wire. And possibly for config files, though IMHO it is too ugly and unreadable for those. But as a general tool for Internet plumbing it's awful.

--
Need a Python, C++, Unix, Linux develop

Re:IVE BEEN WAITING SINCE 1998 by halivar · 2008-02-18 05:00 · Score: 3, Funny

Looks like you're going to have to wait a little longer. Try holding your breath, this time.

Oblig by mariuszbi · 2008-02-18 05:00 · Score: 5, Funny

XML is like violence.. when it doesn't work, use some more!

Re:Oblig by Omnifarious · 2008-02-18 05:07 · Score: 1

*laugh* It's so funny because it's true!

--
Need a Python, C++, Unix, Linux develop

It's still treated a lot like HTML sometimes by MikeRT · 2008-02-18 05:01 · Score: 1

I recently began testing some RSS and Atom parsing modules for Movable Type that I wrote, and noticed that they were breaking on different feeds that Google Reader handles easily. When I looked at the RSS and Atom markup, I noticed that the reason was that the various generators that were causing problems weren't always generating RSS and Atom in the way that I expected. WordPress.com, for example, was using content:encoded tags for some of the content for blog posts, and had an empty description tag in that item block.

It's XML, not HTML, so it's not going to be as hard to get working if its done as properly formatted XML, but one problem I have is with the ad-hoc mixing of tags. If you are going to provide a syndication feed or something to that effect, using a standard, and stick to it, even if there are limitations to the standard.

Re:It's still treated a lot like HTML sometimes by ianalis · 2008-02-18 05:18 · Score: 1

Welcome to the real world! :)
Re:It's still treated a lot like HTML sometimes by EMN13 · 2008-02-18 05:40 · Score: 1

Semantics are difficult. XML does not solve semantic issues like what tags mean. Be happy if your RSS provider provides syntactically valid XML - at least you can unambiguously interpret the structure of the document now!

As to semantics, if you're trying to interpret such a home-grown format as atom/rss, without reference implementation (and most specifically without a good test), your problems lie not with XML, but with that spec.

And indeed, RSS and atom aren't very good in that sense. It may be hard to make a spec and reference implementation, but it's worth its weight in gold ten times over if you've got them.

Java and XML - Addendum by Omnifarious · 2008-02-18 05:03 · Score: 3, Insightful

And, of course, my post is incomplete with reference to my little rant on why CORBA and other forms of RPC are bad. Both Thrift and D-BUS are pretty close to the ideal solution I describe later. They focus on message content over semantics and are extremely easy to parse. SOAP and XML-RPC fail on both of those counts. They are about semantics (you are making a remote function call that does some specific thing, not sending a hunk of data that has some particular content) over content and they are a huge pain to parse.

--
Need a Python, C++, Unix, Linux develop

Re:Java and XML - Addendum by cjonslashdot · 2008-02-18 05:22 · Score: 3, Interesting

CORBA uses IDL for interface definition. Therefore, you don't even have to write code to parse it: the parsing code is generated automatically. So the arguments about parsing are non issues. With regard to content, one can define content in IDL very easily. I have not used the APIs you refer to (e.g., Thrift), so I cannot comment on those. I will say this though: when I used to write apps 10 years ago using CORBA, it took me so little time to throw a system-to-system interface together that I almost didn't even think about it. The same with EJB, except that persistent EJBs were flawed and so EJBs lost credibility even though the API model (similar to CORBA) was (and is) extremely easy to use. Then people started wanting to communicate across firewalls, and OMG didn't get its act together and make IIOP capable of traversing firewalls before people got hooked on hand-coding HTTP messaging, which then led to XML messages and SOAP and Web services. The right answer was to fix the OMG spec for pushing IIOP through firewalls in a standard way. Nowadays, whenever I have to create an inter-system interface and the options involve SOAP or Web services or some other XML-based interface, I groan and it takes me ten times as long to get the interface built and reliable. That is not progress. I will look at Thrift and the other API that you mention, and we may disagree on some thing (e.g., the value of type safety), but I agree with you that XML-based messaging has been a huge, huge step backwards.
Re:Java and XML - Addendum by Omnifarious · 2008-02-18 05:31 · Score: 4, Insightful

CORBA is a minor pain to parse. From what I could tell you could just sit down with a spec and code up your own parser for ye-old random language in a day or two. But that's not my major issue with it.

My major issue with it was that it promotes designing distributed systems that focus on the semantic roles of the participants instead of the data moving around. In fact it discourages programmers using it from even thinking of what they're doing as sending messages to some system many milliseconds away. Among other evils this leads to all kinds of interesting issues with threading and concurrency that didn't even have to exist.

--
Need a Python, C++, Unix, Linux develop
Re:Java and XML - Addendum by cjonslashdot · 2008-02-18 06:36 · Score: 1

Interesting thoughts. Could you please explain how some of the technologies that you mention improve on this? Thanks.
Re:Java and XML - Addendum by shutdown+-p+now · 2008-02-18 06:37 · Score: 0

SOAP and XML-RPC fail on both of those counts. They are about semantics (you are making a remote function call that does some specific thing, not sending a hunk of data that has some particular content) over content and they are a huge pain to parse.
The difference is purely imaginary. Analogously, in OOP world, calling methods and passing messages to objects is considered to be two equivalent ways to say the same thing. For XML-RPC and SOAP in particular, they are in fact more geared towards sending messages - you just get a transport for a two-way trip with any XML payload you wish - anything from a simple argument tuple to a tree or a serialized object graph.
Re:Java and XML - Addendum by Omnifarious · 2008-02-18 08:06 · Score: 1

That is like calling overloaded operators syntactic sugar. Sure they are, but your language is still very different if you have them.

Libraries that make messages look like function calls obscure some very important details about messages that make them rather different animals then function calls. The message may be going to an entity outside your 'state horizon'. Basically it may be going to an entity that has goals and motives that are completely opposed to yours, which is very different from a function call. Also the message will take at least milliseconds to get there, and a reply will take at least milliseconds to get back. This is in great contrast to a function call in which those things are measured in units 6 orders of magnitude (base 10) smaller.

And since these things are so different, messages sent over a network to a remote system should not look like function calls or be handled in the same way.

--
Need a Python, C++, Unix, Linux develop
Re:Java and XML - Addendum by shutdown+-p+now · 2008-02-18 19:51 · Score: 1

And since these things are so different, messages sent over a network to a remote system should not look like function calls or be handled in the same way.
And SOAP does not prescribe them to be anything like that. It's true that many commonly used implementations (such as the .NET one) represent SOAP requests as method calls (yet others don't); but this is not a flaw in the protocol itself.
Re:Java and XML - Addendum by Stu+Charlton · 2008-02-19 01:24 · Score: 1

Though this is different from the OP's areas of excitement, I'd suggest taking a look at the architecture of the web, and Roy Fielding's (quite readable, IMO) thesis on the development of that architecture while he was working on HTTP 1.1. A major factor behind why the Web has become successful is that it focuses on the data, not the roles of the participants. Similarly, this is why systems like UNIX pipes are so useful -- a uniform interface provides for many benefits (at the cost of some tradeoffs, such as latency). Unfortunately the mainstream seems to have missed this, though times are 'a changin'

--
-Stu
Re:Java and XML - Addendum by Omnifarious · 2008-02-19 02:23 · Score: 1

I'm still thinking about how to answer this question. It may take me awhile and result in another small paper on my site. I think I'll also have to learn more about Thrift to see if I really feel that way about it or not. I know that D-BUS got a whole bunch of things right. It still, annoyingly enough, represents network messages as function calls, but for the domain of communication between processes on the same machine that's not nearly so evil as over-the-Internet RPC is.

--
Need a Python, C++, Unix, Linux develop
Re:Java and XML - Addendum by Omnifarious · 2008-02-22 16:36 · Score: 1

I have written up an answer to this question in my journal here: Thoughts on Thrift and D-BUS

In short, there are several features of D-BUS that combined with it's limited area of application make it really useful and not nearly as evil as most RPC-based technologies. But Thrift is both missing these features and operates in a much more demanding environment, so it's not nearly so nice. OTOH, a significant portion of Thrift is devoted to a language and architecture agnostic data description language, and perhaps that feature alone can be leveraged along with generous helpings of other stuff to make it useful.

--
Need a Python, C++, Unix, Linux develop
Re:Java and XML - Addendum by Omnifarious · 2008-02-22 16:40 · Score: 1

I've written up a little summary of my thoughts on Thrift and D-BUS that may interest you because I address exactly this issue. :-)

--
Need a Python, C++, Unix, Linux develop

Re:10 Years and still waiting by mini+me · 2008-02-18 05:11 · Score: 1

It already did. They called the revolution Web 2.0. Sorry you missed it.

YAML and JSON by goombah99 · 2008-02-18 05:13 · Score: 2, Insightful

I'm perpetually surprised every-time I see a new implementation of XML. For example, macintosh plists, many of which replace older ad hoc Unix configs, are in XML. Why oh why do people use XML for data centric, quasi-human readable configuration files when YAML is the ideal solution for this. And for web usage, where perl, python, and ruby abound, why would would people not use YAML since it's so easy to parse with just regular expressions, and because you don't have to instantiate the multi-megabyte structured data entire file just to grep out one record. And in the day of java script and web 2.0, there's JSON. So why does this ponderous obsolete dinosaur XML persist.

Perhaps I'm being too negative here. I sound like a troll. But really folks, do yourself and the rest of us a favor and read up on JSON and YAML. You''ll see I'm being only too kind and generous to YAML.

--
Some drink at the fountain of knowledge. Others just gargle.

Re:YAML and JSON by ral8158 · 2008-02-18 05:31 · Score: 2, Insightful

Actually, OS X uses plists because XML, which is more widely known than YAML and much easier to learn, is built directly into the Cocoa API.

Using an XML file basically consists of the following code:
NSError xmlError = [[NSError alloc] init];
NSXMLDocument doc = [[NSXMLDocument alloc] initWithContentsofURL:@"Put your URL here" options:NSXMLDocumentTidyXML error:&xmlError]; //Handle errors around here

Then you can basically do anything with your doc object. You can insert a child at a certain index, you can ask it for the root element, you can set the DTD to something else, you can apply XSLT to get transformed XML or HTML markup back, you can validate it against its DTD, you can delete children at a certain index, etc. All of these actions take one line. It's a really beautiful, simple system. YAML is... not.
Re:YAML and JSON by tjansen · 2008-02-18 05:40 · Score: 2, Informative

As you say, YAML is a specialized markup-language (data-centric, almost human-readable) and not a good choice for many use-cases (document-centric languages like XHTML and DocBook, combining languages with XML namespaces). In other words, it can not replace XML, it's just another syntax to learn. It needs a completely new infrastructure: new parsers, new editors, new schema description language, new translation languages and so on. Is that really worth it, only to make editing files with a simple text editor easier?
Re:YAML and JSON by Anonymous Coward · 2008-02-18 05:40 · Score: 0

" [XML is} much easier to learn"
Your being sarcastic right? oh my gosh, no. You really think that's true. Ha ha ha ha ha.
As for the rest of your digression on XML being preferred because the interface in cocoa takes so few lines. You realize this begs the question right? . There's no reason YAML could not be a built it too. Heck it's built in to ruby. and its one import away from almost every language including objective C that Cocoa runs on. The it'd have the same interface. If they had implemented in DBASE instead you'd be telling me the same thing: oh oh the API is so simple. You want even simpler? how about then using Perl Tie's. There's no simpler interface to a database than that. Come on...think!
Re:YAML and JSON by goombah99 · 2008-02-18 05:49 · Score: 1

Is that really worth it, only to make editing files with a simple text editor easier?

I think you answered the question. Yes. for many uses being able to use a simple text editor is great.

Have a look sometime at the yaml documetnation quick reference-card : it's written in YAML and fits on one page. That's how compact yet human readable it is, try that in XML.

These days everything is one library away from being immediately available for use. Since YAML can do everything XML can do, and because it's trivial to insert XML into YAML (but not the reverse), you really could replace XML with it. YAML even handles relational data, something XML has only begun to do.

--
Some drink at the fountain of knowledge. Others just gargle.
Re:YAML and JSON by fireboy1919 · 2008-02-18 05:54 · Score: 2, Insightful

In other words, it can not replace XML

That's pretty much completely wrong. YAML's functionality is a superset of XMLs while being easier to read & understand (because the *basic* usage of it is exactly the same as XML's, but with a simpler syntax). It just hasn't been adopted anywhere except configuration because that's the easiest niche to move into.

it's just another syntax to learn.

That's a stupid thing to say. Anybody that can't learn the syntax of either XML or YAML in less than five minutes shouldn't be working with either of them. They're both ridiculously simple to understand.

It needs a completely new infrastructure: new parsers, new editors, new schema description language, new translation languages and so on.

That is true, and probably the reason we won't be moving to YAML for quite a while.

--
Mod me down and I will become more powerful than you can possibly imagine!
Re:YAML and JSON by tjansen · 2008-02-18 05:56 · Score: 1

Actually, after looking at that reference card, YAML is much more complex than I thought it was.. compared to that, XML is simple (provided you ignore all that outdated crap like DTD/Doctype, processing instructions) and just use elements, attributes and built-in entities.
Re:YAML and JSON by cliveholloway · 2008-02-18 05:57 · Score: 5, Funny

<reply xmlns="Slashdot:Comment"> <paragraph> <sentence>What?</sentence> <sentence>Are you telling me that this isn't the preferred way of presenting data?</sentence> <sentence>Honestly, this & SOAP are two technologies that have made my life so much more "interesting" as a developer.</sentence> <sentence>Fucking XML...</sentence> </paragraph> </reply>

--
-- Trinity in high heels carrying a whip: The donimatrix - there is no spoonerism
Re:YAML and JSON by goombah99 · 2008-02-18 06:03 · Score: 1

Beautifully said.

--
Some drink at the fountain of knowledge. Others just gargle.
Re:YAML and JSON by goombah99 · 2008-02-18 06:15 · Score: 2, Interesting

Actually, after looking at that reference card, YAML is much more complex than I thought it was.. compared to that, XML is simple (provided you ignore all that outdated crap like DTD/Doctype, processing instructions) and just use elements, attributes and built-in entities.
Well good for you, for actually looking. But as you say about XML, most of the time you only use the base elements in YAML too. In YAML those are "-" for arrays, ":" for hashes, and "|" for block quotes. YAML streamlines things even further by getting rid of close-tags and it mostly dispenses with attributes being special data and having to live in tag, and just merges them all into the payload area, putting all data and attributes on equal footing.

Here's another document to look at that's a great 1-page introduction to YAML in action.

But sure I agree YAML does not have a lot of pre-written stuff out there for exploiting it. My original lament was that XML is the default choice when it's a poor choice. For Configuration files, and document headers, and simple output from most programs YAML makes a far superior choice both in human readability and for fast parsing.

--
Some drink at the fountain of knowledge. Others just gargle.
Re:YAML and JSON by msuarezalvarez · 2008-02-18 06:40 · Score: 2, Insightful

I've never understood why people complain about XML as you do.

Are you generating XML by hand in your applications? Are you not parsing it using some standard library into an abstract tree or using a standard library to transform XML documents into sequences of events, in exactly the same way lex tokenizes a string of characters? Are you generating it by concatenating strings?

SOAP is complicated, but that has nothing to do with XML.

XML does exactly one thing: it allows you to pretend that data is provided to you in the form of an abstract data structure instead of as a sequence of bytes, taking care of encoding issues, namespacing, and what not---assuming, of course, that you are using proper tools. How is that bad?
Re:YAML and JSON by TheRaven64 · 2008-02-18 06:54 · Score: 1

For example, macintosh plists, many of which replace older ad hoc Unix configs, are in XML. Property lists are part of the OpenStep specification (circa 1993). They have one canonical representation, which is similar to JSON (which postdates it by some years). OS X also supports two other representations, one is XML, the other binary. The XML form is commonly used on OS X, presumably so that they can be modified using XSLT or XPath type things.
I agree, it doesn't make a huge amount of sense. Both the old format and the binary format are easier to parse than the XML format, and the old format is much easier for humans to read.

--
I am TheRaven on Soylent News
Re:YAML and JSON by Just+Some+Guy · 2008-02-18 07:33 · Score: 1

Why oh why do people use XML for data centric, quasi-human readable configuration files when YAML is the ideal solution for this.
I'm looking at the YAML 1.1 specs and don't see anything about schemas or data validation. Am I overlooking something?

because you don't have to instantiate the multi-megabyte structured data entire file just to grep out one record.
You don't have to do that with XML, either. You can, but you don't have to.

But really folks, do yourself and the rest of us a favor and read up on JSON and YAML.
(Un)?fortunately, "the rest of us" seems to make up about 5% of the programming population. The rest of the rest of us are using XML.

--
Dewey, what part of this looks like authorities should be involved?
Re:YAML and JSON by tjansen · 2008-02-18 07:42 · Score: 1

Given XML's ubiquity, I am much faster reading XML than YAML (actually I don't understand a lot of the stuff on the 1-page-introduction). And I know the APIs very well. And my favorite IDE supports XML, but not YAML. So YAML would be a poor choice for me...
Re:YAML and JSON by bytesex · 2008-02-18 08:10 · Score: 1

XML is for people who do not know about perl TIEs. Who do not know about tree-based functional declaration formats or languages. Who do not know about parser syntax or regular expressions. There is a divide in the realm of programmers. You either close the divide (through education), or provide them with XML. That's how it works.

--
Religion is what happens when nature strikes and groupthink goes wrong.
Re:YAML and JSON by bytesex · 2008-02-18 08:19 · Score: 2, Interesting

It's bad because people ARE generating XML by hand, which, according to the spec, they should be able to do, making a lot of syntactical mistakes in the process (to which it is prone). Plus; it's terrible to read. It's also bad because on the machine side, it takes a lot of effort (CPU cycles, parser-programmer effort) to decipher. In other words, it the worst of both worlds. It's the Visual Basic of formats: you can really only use it with GUI tools, but you can't really do what you really want to do with it in the way you want to do it.

--
Religion is what happens when nature strikes and groupthink goes wrong.
Re:YAML and JSON by goombah99 · 2008-02-18 09:00 · Score: 1

Why oh why do people use XML for data centric, quasi-human readable configuration files when YAML is the ideal solution for this.
I'm looking at the YAML 1.1 specs and don't see anything about schemas or data validation. Am I overlooking something?

Yes you missed a couple things. As you may know XML has had multiple ways to define schema and multiple validators over the years. YAML has validation at several different levels. First in syntax, since there's no close tags one never has a missing close tag to deal with. Simple data types can be declared by !!tags. And More complex structures (i.e. classes) can be declared by the user with local !tags. And of course built in data structures like hashes, arrays and references/pointers are part of the syntax. IN all these cases the document description is really part of the document it self. that is the thing telling you that the next statement is an integer or string or float is in the document it self.

Since yaml has many built in URI (binary, hex, float, hash, ordered hash, array, etc...) plus it has extensible type tags for locally declared types (i.e. classes) that can validate their own attributes. There is not as much desire for a separate schema declaration or validator as there is with XML

However sometimes you still want a schema and validator. For higher order abstraction of the document's expected structure, one can use a different schema validator such as Kwalify which also works for JSON too. Finally, since one can place any XML document into a YAML document simply by indenting it one space--and no other changes at all--, you can always put XML document schema language into a YAML document and use that.

because you don't have to instantiate the multi-megabyte structured data entire file just to grep out one record.
You don't have to do that with XML, either. You can, but you don't have to.
Grepping anything out of even a modestly sophisticated XML document is pretty much unreliable and generally slow. You have to wade through the nested close and open tag, balance all nested quotations, and in many cases worry about the utf encoding. Because XML does not gaurentee the white space things get even weirder.

let's take a for instance. Suppose I was looking for all the hash entries that has the key "highschool". In XML you'd have to worry about all the nested qoutations and then find every key open and close tag, then parse this and find it's content and see if it is "High school". You could not go looking for the word "highschool" because that might be simply text found elsewhere. And you can't look just for tags like "key" because those might be part of some quotations.

In yaml you would grep like this:

grep "^\s*highschool\s*:"

to yank out the all the highschool key value pairs. (If the value might be a multiple line entry a few more items in the regex are needed)

the latter is blaziningly fast.

But really folks, do yourself and the rest of us a favor and read up on JSON and YAML.
(Un)?fortunately, "the rest of us" seems to make up about 5% of the programming population. The rest of the rest of us are using XML.
Right. that's the problem. Need to spread the word. Now you know and you can spread it too.

--
Some drink at the fountain of knowledge. Others just gargle.
Re:YAML and JSON by Tsagadai · 2008-02-18 09:38 · Score: 1

I'm sorry did you just say that XML "takes a lot of effort (CPU cycles, parser-programmer effort) to decipher." Have you ever written a computer program at all? If the parser is a big chunk of your code why are you even generating code at all? Maybe you should just have a nap or find a De Lorean to take you back to your own time.

Just the way you wrote GUI tools like it's some sort of acid in your mouth astounds me.
Re:YAML and JSON by kyz · 2008-02-18 10:53 · Score: 1

Grepping anything out of even a modestly sophisticated XML document is pretty much unreliable and generally slow.

If by "grepping" you mean "searching for", then that's not the case. Just use an xquery. For example, "xml sel -t -c //title *.rss" will print all the title tags in all the *.rss files. That's using the xmlstarlet package. If you don't like the syntax of the command, use another package.

If by "grepping" you mean "use the line-oriented grep(1) regexp matching tool", then yes you're going to run into difficulties working with data that isn't just line oriented. That's why UNIX admins contort everything into line-oriented data, so they can run their precious sed, grep, sort and uniq on it. They take 2D data, munge it into 1D line data, grep it, then unmunge it. They could just act on the data directly (XML or not), but then they'd be programmers.

You have to wade through the nested close and open tag, balance all nested quotations

Just use xpath. That's what it was invented for. You get the document parsed for free, and you can match against any element of its structure or content.

let's take a for instance. Suppose I was looking for all the hash entries that has the key "highschool". //hash[@key='highschool']

would match any tags (and their contents), but not key="your highschool" or anything else. Isn't that easier than pissing about trying to come up with a regexp that parses the data format itself, instead of just searching the data and letting the tool do the work of parsing the format?

In yaml you would grep like this: grep "^\s*highschool\s*:"

I don't think you would. Vanilla grep doesn't have Perl's "\s" whitespace markers. Perhaps you mean "[\n\r\t ]"

--
Does my bum look big in this?
Re:YAML and JSON by Mneme · 2008-02-18 11:27 · Score: 1
The use of XML at Apple is both interesting and sad, but also I think reflects the broader problems with how XML is (mis)understood and (mis)used.
Apple took the really clean OpenStep Property List format, which is very very similar the JSON format people are adopting today, and for some reason decided it needed "modernized" by using an XML-based format. To understand why this is so bad, let's look at the before and after versions, using an excerpt of real-world data from the Mac application OmniGraffle. Before:

{ Class = LineGraphic; ID = 4; Head = { ID = 3; }; Tail = { ID = 2; }; Points = ((225.5, 94.5), (296.5, 94.5)); Style = { stroke = { CornerRadius = 5; HeadArrow = FilledArrow; TailArrow = "0"; }; }; }

And this is the same data in Apple's XML property-list format:

<dict> <key>Class</key> <string>LineGraphic</string> <key>ID</key> <integer>4</integer> <key>Head</key> <dict> <key>ID</key> <integer>3</integer> </dict> <key>Tail</key> <dict> <key>ID</key> <integer>2</integer> </dict> <key>Points</key> <array> <array><real>225.5</real><real>94.5</real></array> <array><real>225.5</real><real>94.5</real></array> </array> <key>Style</key> <dict> <key>stroke</key> <dict> <key>CornerRadius</key> <real>5</real> <key>HeadArrow</key> <string>FilledArrow</string> <key>TailArrow</key> <string>0</string> </dict> </dict> </dict>

(Sadly Slashdot ate all my nice formatting spaces which made both versions prettier, but was much more necessary for the XML version than the OpenStep/JSON-ish version.)
The XML version is less easy to read and considerably more bloated. What was gained over the original format? Very little. Apple's Property Lists are an example of really poor use of XML.
But at least XML is extensible you say! Not for property lists. Apple has a special custom parser in core foundation that just parses XML property lists, not general XML. Part of the reasoning for that might be that XML parsing libraries are fairly heavyweight and things like the kernel need to read property lists. But no one stopped and said “Wait a moment, we're writing a custom parser, it's not extensible, this usage isn't markup, it's data serialization and/or human-readable configuration files, so is XML the answer?”
But Apple's “not getting” XML doesn't stop with property lists. Anyone who thinks property lists were an aberration or thinks XML automatically equates with human readability should take a look at the XML produced by Apple's iWork applications (Keynote, Pages, etc.). It's barely readable—you need to be a masochist to wade your way through the grotesquely tangled and verbose mess. It is, in essence, a dump of the internal data structures used by their applications. And it isn't (easily) writable by anyone except Apple, because only they know exactly what the rules are for the composition of their XML format. Want a DTD? Dream on. Want to use XSLT and friends? Good luck!
XML does have its place, but before choosing XML as a format, you should understand the trade-offs involved in that choice. Know when you should use XML, how you should use it, and what the alternatives are. In particular, be aware of
- The reasons XML wins over the alternatives. Can you give an informed argument as to why you shouldn't jus
Re:YAML and JSON by Anonymous Coward · 2008-02-18 12:20 · Score: 0

Take a class in algorithms and learn the difference between O(N) and O(N^2). Then you can call yourself a programmer. Your XML command would not even work on a data structure larger than the memory of the computer. It would be slow as a dog too boot.
Re:YAML and JSON by Anonymous Coward · 2008-02-18 13:42 · Score: 0

As I did that about 10 years ago now, I can quite happily tell you that XML parsers can search through documents in O(N) time, and the memory required is a function of the tag nesting depth. You only need to keep the state of open tags on the way to your current position in the document. That's how xpath was designed to work. If you need to go back, you know where the previous tag started, for example.
Re:YAML and JSON by cliveholloway · 2008-02-18 15:25 · Score: 1

I would argue that if it was a binary format that nobody could fuck with, it would be fine. But it isn't, and people *do* edit it by hand.

I have to work with a bunch of conf files that are in XML format - for no real reason that I can ascertain (except that they, heh, want them to be human readable). I have been trying to abstract the interface to this data away (possibly into an sqlite DB with a perl module front end), but I get resistance from people who "just want to be able to edit the data by hand", or to scan it to see what it contains. And that is so, so wrong if you're using XML. Aside from allowing human error to creep in, it's really, really bad to lose control of the interface to the data - just begging for errors to creep in.

If you really want to edit by hand though, you want YAML instead - and XML should have avoided all this geek hate by *never* being designed to be browseable as a plain text format in the first place.

--
-- Trinity in high heels carrying a whip: The donimatrix - there is no spoonerism
Re:YAML and JSON by Anonymous Coward · 2008-02-18 17:37 · Score: 0

If XPath can call for unbounded backtracking (via complex comparisons against position(), or preceding descendants of a node's ancestors), you need random access to the document (ruling out a lot of pipe-based toolchains) and will have to reparse sections you've already passed. That pretty much rules out a O(n) guarantee, and will eventually be more expensive than just constructing a DOM in virtual memory.

A restricted XPath that a SAX parser could execute in one pass using bounded space (depending on how much lookbehind an expression may do, and how large an element or attribute value may be accessed) would be useful, but a quick STFW didn't turn one up for me.
Re:YAML and JSON by IntlHarvester · 2008-02-18 21:18 · Score: 1

Mac plists are one of the classic examples of XML for marketing reasons only. They certainly didn't come up with

<key></key>
<string></string>
<key></key>
<string></string>

to make data exchange easier.

--
Business. Numbers. Money. People. Computer World.
Re:YAML and JSON by bytesex · 2008-02-18 22:40 · Score: 1

I was writing XML parsers in C in 1998, because my boss thought it had to be done; it was the next big thing. (He had no notion whatsoever that it was just a friggin' data format, but that's an aside). So I like to think that I know what I'm talking about; the format is bloated and difficult (whitespace rules ! escape sequences ! CDATA sections ! Unicode support in programming languages in 1998 !) and expensive to parse. My advice is: if you can avoid it, do so. If you can't, use perl. GUI tools are fine. Tech that can only live thanks to GUI tools is bad. Automating through shell scripts must always be possible and GUI tools are not particularly good with that.

--
Religion is what happens when nature strikes and groupthink goes wrong.
Re:YAML and JSON by vhogemann · 2008-02-19 00:32 · Score: 1

Well,

I know there are times when you just have to deal with bad formatted data, but it's true for every data format out there, not just XML.

Usually when I'm dealing with other people generated XML I demand a Schema to validate the document, this way if they manage to garble the document it's their fault and I can prove it! And when someone have to consume XML generated by MY applications I also provide a Schema file, this way I'll be able to prove that my application's data is valid.

XML is nice because when it's properly used it works just like a contract.

--
---- You know how some doctors have the Messiah complex - they need to save the world? You've got the "Rubik's" complex
Re:YAML and JSON by goombah99 · 2008-02-19 03:23 · Score: 1

Unless you assume the depth of recursion is not proportional to document size, which you should, this O(N^2). Additionally you have high proportionality faction since for every tag you need to test its kind and decode its the attributes. Additionally one must recursively unwind every single nested quotation before proceeding to the next element. If the largest substructure is too large to fit in memory this is going to fall over. If you have relational IDREFs you may have to backtrack for multiple passes. In short you are paying the majority of the penalty of reconstituting the data structure exactly like the OP said.

Now as for "working on the data " not the storage format. Well YAML is structured data too. One is free to traverse its tree as well in exactly the same manner. But because it's storage format is efficiently decodable one does not actually have to instantiate the tree to discover many of its element.

Seriously dude, don't disparage efficient coding.

--
Some drink at the fountain of knowledge. Others just gargle.
Re:YAML and JSON by bytesex · 2008-02-19 04:10 · Score: 1

Schemas are really only one step away from a parser definition. You might as well tell them to provide you with a yacc-file (or what's it called today in the java world - antlr ?). Integrates much better into your project anyway. No, I'm trolling - sorry. It's fine that you can demand such things from your other parties. Some people can't, or when they request a DTD (remember those ?), they overload a server at the W3C.

--
Religion is what happens when nature strikes and groupthink goes wrong.
Re:YAML and JSON by goombah99 · 2008-02-19 08:26 · Score: 1

A lot of truth in what you say. But then if you did do this XML itself just a concrete realization of a database. So why choose something as inefficient as XML? The only time XML has something resembling efficiency that I can see is when the best method for the job is specifically markup in nature. (i.e. text formatting) rather than data serialization.

So I think that answers the question of why XML isn't pure binary. People want the visual attraction of seeing how the DB structures is formed by inspecting the file by hand.

--
Some drink at the fountain of knowledge. Others just gargle.

Re:10 Years and still waiting by Klaus_1250 · 2008-02-18 05:14 · Score: 1

If everyone had jumped on the boat 10 years ago, it might have. But that didn't happen.

XML is too difficult, and allows abuse/over-use too easy. Personally, I love it, but I'm a minority. The other key-factor is that there is simply no short term need for it in many places. Or better, the need for it isn't recognized by the majority. Pragmatic solutions have a tendency to win over new revolutionary ones.

--
It only takes one man to change the Wisdom of the Crowd to Tyranny of the Masses.

Re:10 Years and still waiting by TechyImmigrant · 2008-02-18 05:15 · Score: 1

>By keeping your data in an XML format, you can use simple XSL stylesheets to generate multiple types of output.

Just like LaTeX! Reinvention is a wonderful thing.

--
Evil people are out to get you.

Re:Java and XML, bad tastes that are worse togethe by MBCook · 2008-02-18 05:16 · Score: 2, Interesting

I do a lot of Java and XML. I don't know what you're using for a library, but I'd suggest JDOM.

As for the abuses for Maven and Ant... yeah. I'll agree. There are a lot of things that seem to use XML just because they can. I know there is some theory behind why they use them (machine readable, blah blah blah) but for most things it's just a giant pain for the complexity you get. Maybe if you were trying to build Windows with Ant.

--
Comment forecast: Bits of genius surrounded by a sea of mediocrity.

Re:10 Years and still waiting by Coelacanth · 2008-02-18 05:17 · Score: 2, Interesting

Excellent point, and I'll take it one step further. When coupled with XSLT and other WS-* standards, you have an extremely flexible way to connect otherwise absurdly different applications (See Sun's OpenESB and JBI standard).

The hatred for XML, I think, stems from frequent, ugly misuse. Here's one basic, freakin' obvious rule: if a human, at any time at all, has to read or manually edit an XML document, you're doing it wrong. Just because it's ASCII doesn't mean it's human-compatible.

Re:10 Years and still waiting by CRCulver · 2008-02-18 05:22 · Score: 3, Informative

Just like LaTeX! Reinvention is a wonderful thing.

LaTeX is restricted to certain types of print output. It emphatically cannot output HTML easily. Just look at the umpteen thousand threads on comp.text.tex where someone complains that

latex2html</ecore> can't handle anything more than a handful of quasi-default LaTeX packages. Plus, Unicode support in LaTeX has been shoehorned in and is still incomplete (though xetex is making strides), while at least XML was designed around Unicode. And then there is the fact that XML encourages semantic markup, while LaTeX contains non-semantic tags like <ecode>\textit

.

Re:Java and XML, bad tastes that are worse togethe by GodfatherofSoul · 2008-02-18 05:24 · Score: 5, Interesting

Yay! Nothing like the combination of XML and Java to bring out the haters. Incompetent use of a language/API doesn't equate to a bad language/API. I can show you plenty of crappy C/C++ code freely browsable in some open source libraries. Does that mean C++ sucks? Hell no.

My experience with Java+XML you ask? OFX servers for financial institutions. Without name dropping, check out the list of banks, brokerages, tax services, and credit card providers (Quicken) out there successfully serving up client data. I guess we're all circle jerking while you're downloading your account information into Quicken or Money.

Some good uses for XML:

Ephemeral representations of atomic, structured data; usually for transport.
Config files. More verbose and the syntax is far better at keeping you from fat fingering a setting and blowing up your app. If you can't clearly read XML, you need glasses.

Some bad uses for XML:

High volume, rapid response data streams; like say an on-line multiplayer game (though I've never benchmarked this)
Unbounded data streams; e.g. streaming media
Databases

I have to admit, I'm clueless about your Java dependency issues. The only way I can see that ever happening is if you're dumping all of your classes into the default top-level package; and that's major user error if you are.

--
I swear to God...I swear to God! That is NOT how you treat your human!

Re:10 Years and still waiting by jrumney · 2008-02-18 05:28 · Score: 0

Just because it's ASCII doesn't mean it's human-compatible.

That should read: Just because it's UTF-8 doesn't mean it's human-compatible.

XML: A Capsule Review by smcdow · 2008-02-18 05:28 · Score: 0

As compression schemes go, XML is probably the worst I've encountered.

How can it be that XML is 10 years old, and there's STILL no industry-standard way to embed binary data into an XML document without base64 encoding. I want bits and bytes. I want small.

--
In the course of every project, it will become necessary to shoot the scientists and begin production.

Re:XML: A Capsule Review by bckrispi · 2008-02-18 05:39 · Score: 1

You *do* realize that most systems that read/write xml also read/write gzipped xml, dont'cha?

--
Xenon, where's my money? -Borno
Re:XML: A Capsule Review by tjansen · 2008-02-18 05:43 · Score: 1

Actually XOP has W3C technical recommendation status since October 2005: http://www.w3.org/TR/xop10/
Re:XML: A Capsule Review by smcdow · 2008-02-18 06:07 · Score: 1

This doesn't address the requirement that binary data be encoded/decoded to/from base64.

--
In the course of every project, it will become necessary to shoot the scientists and begin production.
Re:XML: A Capsule Review by smcdow · 2008-02-18 06:10 · Score: 1

Unless I've missed somthing, the xop:Include element would be base64 encoded.

--
In the course of every project, it will become necessary to shoot the scientists and begin production.
Re:XML: A Capsule Review by tjansen · 2008-02-18 07:39 · Score: 1

In the data-model: yes. When being transmitted: no.

XOP optimizes only the transport of base64 encoded binary objects. When you parse the file with a XOP-capable parser, the element would look to you like a base64 string.

Here, let me fix that for you ... by trolltalk.com · 2008-02-18 05:33 · Score: 4, Insightful

If everyone would just use one of the already written XML producers or parsers (the big ones, the ones that work) life would be much easier around here from time to time.

If everyone would just went back to using simple delimited ascii text life would be much easier around here.

--
Kevin Smith on Prince

Re:Here, let me fix that for you ... by FlyingGuy · 2008-02-18 06:06 · Score: 0

MOD PARENT UP+

XML, The answer to the question that nobody asked!

--
Hey KID! Yeah you, get the fuck off my lawn!
Re:Here, let me fix that for you ... by surajbarkale · 2008-02-18 06:16 · Score: 1, Funny

Ever tried parsing CSV?

--
With Great Power Comes No Love Life! - Samit Basu
Re:Here, let me fix that for you ... by kyz · 2008-02-18 06:28 · Score: 5, Insightful

I have, and I can tell you that it's a waste of time.

It amazes me how something that looks so simple can have so many corner cases, and how they can be solved so differently by different implementations.

CSV is fine if you want to store data that has no quote marks, commas, carriage returns or linefeeds. For everything else, please use a better specified format, preferably one that has a formal definition. Like XML, for example.

--
Does my bum look big in this?
Re:Here, let me fix that for you ... by trolltalk.com · 2008-02-18 06:48 · Score: 2, Informative

"Ever tried parsing CSV?"
All the time. Its not that hard. Also, if you're worried about such things as quoting, etc., you can always use fixed-width fields - makes indexing, looking up, and modifying values REAL FAST. Compare that to the mess of xml.

--
Kevin Smith on Prince
Re:Here, let me fix that for you ... by msuarezalvarez · 2008-02-18 06:54 · Score: 1

That will work until you need to use non-ascii content, to include the delimiter in the data itself (so that you need an escaping mechanism), you want to represent hierarchical data, you want to be able to compose data from different sources without semantic collisions, and what not.

So you solve these issues, turning for `simple delimited ascii text' format into something a bit more complicated, and next you will be wantint to interchange instances with others, so you will have to start coming to an agreement with others on how to do everything exactly.

Guess what: that's what the people who came up with XML did!

Unix is not the only thing that will be reinvented poorly many times...
Re:Here, let me fix that for you ... by trolltalk.com · 2008-02-18 06:57 · Score: 1

Plain ascii text != csv.
You can delimit your data with nulls, for example (1 null byte per field, 2 null bytes per record). Even javascript can parse that out, and its unicode-compatible.
Or you can use fixed-width fields and records, in which case reading a record is as simple as an lseek (header_size + record_len * (recno - 1)). Generating indexes on the data is also much quicker, as is editing the data. With a fixed-width field and record, there's no need to rewrite the rest of the file if your new value is not the same width as the old value.
Transforms? Again, a lot quicker, since its much quicker to read than an xml tree.
10 years later, xml still sucks in comparison. The attempts to "fix" the problems ("binary xml", anyone?) are jokes.

--
Kevin Smith on Prince
Re:Here, let me fix that for you ... by thrillseeker · 2008-02-18 07:10 · Score: 3, Interesting

I knew we would (d)evolve to punch cards eventually.
Re:Here, let me fix that for you ... by trolltalk.com · 2008-02-18 07:12 · Score: 1

"That will work until you need to use non-ascii content, to include the delimiter in the data itself (so that you need an escaping mechanism), you want to represent hierarchical data, you want to be able to compose data from different sources without semantic collisions, and what not."
Using a single null (0x00) to delimit each field, and two nulls to delimit each record, is UTF-friendly. As for non-ascii contents, just encode them in base64 (which you would probably be doing anyway in a cdata section).

--
Kevin Smith on Prince
Re:Here, let me fix that for you ... by msuarezalvarez · 2008-02-18 07:21 · Score: 1

So you do not see the use of empty fields?
Re:Here, let me fix that for you ... by CaptainPinko · 2008-02-18 07:22 · Score: 3, Insightful

ASCII doesn't even support the letters needed by the majority of the world's language.

--
Your CPU is not doing anything else, at least do something.
Re:Here, let me fix that for you ... by msuarezalvarez · 2008-02-18 07:26 · Score: 1

And what if you are not dealing with a table with record and fields but with a tree?
Re:Here, let me fix that for you ... by trolltalk.com · 2008-02-18 07:37 · Score: 1

"And what if you are not dealing with a table with record and fields but with a tree?"
Trees and graphs are not a problem - just like they aren't in regular table design (though it IS ugly). One field holds the parent node record id, or is blank if its a top-level node. A second field can hold the node type. Extend your schema as required.

--
Kevin Smith on Prince
Re:Here, let me fix that for you ... by trolltalk.com · 2008-02-18 07:40 · Score: 1

"ASCII doesn't even support the letters needed by the majority of the world's language."
UTF does - so just use UTF - no big deal, and a lot easier to parse out than xml, which is butt-ugly to parse, terrible to index, doesn't support random access read/write, etc.

--
Kevin Smith on Prince
Re:Here, let me fix that for you ... by eknagy · 2008-02-18 07:44 · Score: 1

CSV is fine if you want to store data that has no quote marks, commas, carriage returns or linefeeds.
You can store quote marks, commas, carriage returns and linefeeds just fine in CSV - just try it in OpenOffice.org. If you can't write a proper parser for CSV, please use one that was written by a professional programmer (and STFU).
Thanks.
Re:Here, let me fix that for you ... by Anonymous Coward · 2008-02-18 08:10 · Score: 0

Um, nice straw man. There are many real reasons to prefer XML to CSV, but that isn't one of them. In fact, several of those items have to be replaced in XML too. You're not allowed to have ampersands, less than signs, or greater than signs in XML, just to name a few. Sure, the CSV method of escaping the forbidden characters is weird (why not use \n for newline, \t for tab, and \" for quotes like everyone else?), but it is nicer than XML's escaping method.
Re:Here, let me fix that for you ... by msuarezalvarez · 2008-02-18 08:10 · Score: 1

That being proposed as an alternative in the context of a XML sucks thread is... hmmm... sad.
Re:Here, let me fix that for you ... by kyz · 2008-02-18 08:16 · Score: 1

But that's not "CSV", that's "OpenOffice.org's CSV", and it's different from "Foo 2.0's CSV", "MegaApp's CSV" and many other differing, incompatible CSVs out there.

For example, how should I store the following three columns of data?
Column 1: Hello"there
Column 2: Hello there
Column 3: Hello,"'",\

"Hello\"there","Hello \013 there","Hello,\"'\",\\\013"

might be one answer. But it's only one answer, and I just invented it now. It is not the only answer ever implemented in CSV. It is not the canonical, centrally agreed standard answer like exists for XML parsing. THERE IS NO CANONICAL CSV STANDARD. Nowhere can you tell someone writing "CSV" that they're writing it badly or parsing it badly, because there's NO STANDARD.

As another example, you could choose to quote doublequotes by doubling them:

"Hello""there","Hello \013 there","Hello,""'"",\\\013"

You might choose to quote doublequotes by using singlequotes for cells which have doublequotes, and simply break if the cell has both singlequotes and doublequotes. And then you could ship this software and make anyone who wants to interact with your "CSV" output have to write a special case for you.

You might want to never consider that data might include newlines, so just emit them in the clear, and make the data indistinguishable from two seperate rows. This can only be resolved by knowing how many columns should be present, and to read on past newlines until you have enough columns.

Even "professional programmers" can't anticipate how a future program might implement CSV. There are hundreds of possible ways to escape data, and everybody has chosen a different one. No standards.

"Professional programmers" show their professionalism by demanding full and accurate specifications up-front, rather than spending time implementing something only to be told it's "wrong". You want CSV input or output? OK, whose CSV?

--
Does my bum look big in this?
Re:Here, let me fix that for you ... by kyz · 2008-02-18 08:19 · Score: 1

In fact, several of those items have to be replaced in XML too.

Yes, and there is ONE SINGLE STANDARD THAT EVERYONE AGREES ON for how to quote special characters. You can actually be conforming or non-conforming to this standard. You can't just make it up on the fly and be right no matter what you do, like you can with CSV.

--
Does my bum look big in this?
Re:Here, let me fix that for you ... by glesga_kiss · 2008-02-18 08:37 · Score: 1

I use CSV and XML regularly. I only use CSV is when I encounter XMLs big weakness...the ability to cheaply append to a file several million times over a few hours.

If anyone knows a way to do this efficiently in XML I'd love to hear it. From what I understand, all of the document writers will recreate the whole file on each write.
Re:Here, let me fix that for you ... by trolltalk.com · 2008-02-18 08:40 · Score: 1
"That being proposed as an alternative in the context of a XML sucks thread is... hmmm... sad."
Don't knock it if you haven't tried it.
Advantages:
1. quick to parse out with any tool at hand
2. easy to index
3. supports random read/write
4. supports an infinite hierarchy and arbitary entities
5. can be easily normalized for storage in tables in a database
Contrast this with xml:
1. complicated parsers
2. not indexable (hence the push for "binary xml" a while back ...)
3. sequential read only - writing at any point may require spooling back out the rest of the file
4. bloated
--
Kevin Smith on Prince
Re:Here, let me fix that for you ... by Anonymous Coward · 2008-02-18 08:49 · Score: 0

Who needs those silly accents and non-English languages anyway.
Re:Here, let me fix that for you ... by kyz · 2008-02-18 09:11 · Score: 1

Define an XML format which lets you have any number of a top level element, then just keep appending like you would CSV. In most languages, you can generate XML to a file or into a buffer. Just write the XML to your buffer, and then append the buffer to your file.

Of course, that's presuming you need/want XML output for whatever you're appending to. Do you? If it's just an internal format and it's not for interchange with any other applications, there's no real need for XML (or CSV, or any format).

--
Does my bum look big in this?
Re:Here, let me fix that for you ... by mrchaotica · 2008-02-18 09:27 · Score: 1
Two things immediately spring to mind:
- Are you actually only appending? Or could you be "appending to the front," or appending and updating something (e.g. a count) at the beginning of the file?
- Presumably, most XML libraries are designed to support arbitrary changes, so they probably work by reading the whole file in, applying the changes, and then writing the whole file back out. The makers probably don't feel the need to optimize for your case. Maybe you should try modifying one?
--
"[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
Re:Here, let me fix that for you ... by mrchaotica · 2008-02-18 09:33 · Score: 1

One field holds the parent node record id, or is blank if its a top-level node.

But if the field is blank, the delimiting nulls (as per your previous post) are adjacent and it becomes indistinguishable from a record delimiter. You've just added an ambiguity! If it's that easy for you, the inventor of the schema, to screw it up, how the heck is any other random dumbass going to manage to deal with it?

Extend your schema as required.

Okay, but how are you going to notify the arbitrary number of unknown third parties that are using your data?

--
"[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
Re:Here, let me fix that for you ... by trolltalk.com · 2008-02-18 10:15 · Score: 1

One field holds the parent node record id, or is blank if its a top-level node.
But if the field is blank, the delimiting nulls (as per your previous post) are adjacent and it becomes indistinguishable from a record delimiter. You've just added an ambiguity! If it's that easy for you, the inventor of the schema, to screw it up, how the heck is any other random dumbass going to manage to deal with it?

Wrong - if you know how many fields there are in each record (that's what headers are for, right?), then two nulls side-by-side won't be mistaken for the two-null record delimiter ...
Also, you can use an ascii 0x20 (a space) for a blank field ... there's no problem with semantics there :-)
Third, you can always replace a blank (if you're being anal) with a "TOP_NODE" identifier
Fourth, if you're using a fixed-width field, the problem never occurs.
Its a LOT simpler than xml.

--
Kevin Smith on Prince
Re:Here, let me fix that for you ... by Anonymous Coward · 2008-02-18 10:28 · Score: 0

CSV is fine if you want to store data that has no quote marks, commas, carriage returns or linefeeds. For everything else, please use a better specified format, preferably one that has a formal definition. Like XML, for example.
And in XML you need to escape or encapsulate them, just like ASCII CSV....
XML does not belong everywhere. And I don't view it as much of a "standard". It is free for all SGML/HTML is what it is. Not terribly innovative.
XML has the same problems as does CSV. Just that it bloated beyond belief. Imaging looking at a 1GB (small) database dump in XML....
There are plenty of places XML does NOT belong. And most people who propel XML as a standard are full of it. It, like CVS is a loose definition at best.
Re:Here, let me fix that for you ... by mrchaotica · 2008-02-18 10:55 · Score: 1
Wrong - if you know how many fields there are in each record (that's what headers are for, right?), then two nulls side-by-side won't be mistaken for the two-null record delimiter ...

If you know how many fields there are in each record, then why did you need a special record delimiter to begin with? Sounds like a design mistake, which isn't surprising since it was ad-hoc...

Also, you can use an ascii 0x20 (a space) for a blank field ... there's no problem with semantics there :-)

Yeah there is: What if you actually want your node to contain a single space?

Third, you can always replace a blank (if you're being anal) with a "TOP_NODE" identifier

What if I want to name a node "TOP_NODE," too?

Fourth, if you're using a fixed-width field, the problem never occurs.

But now I have to:
- know the maximum length of my data fields before hand, and
- never want to distinguish between blank-padded and non-blank-padded data!
Boy, this schema of yours really isn't working out so well, is it? Not only is it not capable of representing everything it might need to represent, it gets more complicated ever time you try to work around the bugs I find in it. Maybe you ought to think it through some more <secret>like the folks who made XML did</secret>. ; )
--
"[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
Re:Here, let me fix that for you ... by trolltalk.com · 2008-02-18 11:20 · Score: 2, Informative

If you know how many fields there are in each record, then why did you need a special record delimiter to begin with? Sounds like a design mistake, which isn't surprising since it was ad-hoc...

Wrong - the special null delimiter is needed only for variable-length (and zero-length) fields and records. For fixed-length fields and records, no delimiter is needed.
For example: First Name\0x00Last Name\0x00Age0x00\0x00
Joe\0x00Blow\0x0042\0x00\0x00
Mary\0x00Doe\0x0024\0x00\0x00
\0x00Cowboyneal\0x00\0x00\0x00
In the above example, Cowboyneal has no first name and no age.
What's so hard to understand about that? For a fixed-length field?recordset, just include a header ... FirstName:10:LastName:10:Age:3\n
Joe_______Blow_______42\n
Mary______Doe_______24\n
__________Cowboyneal___\n
Both are human-readable, both are easy and intuitive to parse out, the second one is self-documenting and fully supports random access, etc (and neither one is new - the first is used on most *nixes, with either a : or | instead of a null, databases have been using the latter format for decades).
By contrast, xml is an abortion. Heck, I'll go further - xml is the ultimate triumph of navel-gazing over real-world experience.

--
Kevin Smith on Prince
Re:Here, let me fix that for you ... by glesga_kiss · 2008-02-18 11:40 · Score: 1

The problem with that approach is that the objects are generated by a serialisation framework that normally uses it's own document writer. Writing my own might be an option, but it throws away part of the benefit of XML in that you can use standard libraries that do the dirty work and it also creates a dependency between my code and the schema generation.

What may work however is a writer that uses the serialisation framework at a node level to append it's output to an existing output stream. I may give that a go as CSV has it's limitations.
Re:Here, let me fix that for you ... by glesga_kiss · 2008-02-18 11:47 · Score: 1

Are you actually only appending?

Yes, the data is large amount of polling information that will be consolidated later offline.

Maybe you should try modifying one?

-kyz also suggested that and I might give it a go, using what I mentioned in reply to his post.

Thanks for the replies!
Re:Here, let me fix that for you ... by einhverfr · 2008-02-18 12:46 · Score: 1

CSV is a row-based format.

Better to replace it with an RDBMS than with XML (which is a hierarchical format).

--

LedgerSMB: Open source Accounting/ERP
Re:Here, let me fix that for you ... by syousef · 2008-02-18 16:11 · Score: 1

It amazes me how something that looks so simple can have so many corner cases, and how they can be solved so differently by different implementations.

CSV is fine if you want to store data that has no quote marks, commas, carriage returns or linefeeds. For everything else, please use a better specified format, preferably one that has a formal definition. Like XML, for example.

Or perhaps just agree on a standard for quote marks, commas, carriage returns, line feeds and any other common corner cases? If you don't have the discipline to do this in CSV, any custom XML schema will turn into horse doo doo very quickly.

--
These posts express my own personal views, not those of my employer
Re:Here, let me fix that for you ... by Anonymous Coward · 2008-02-18 16:52 · Score: 0
XML has awful corner cases too, like
- Does my parser know U+0085 NEXT LINE and U+2028 LINE SEPARATOR are now whitespace?
- Does the element name normalization match between the schema and my documents?
- Can this document really be flagged standalone or does it use default attribute values anywhere?
- Am I exchanging data with anyone who doesn't handle or at least ignore namespace prefixes?
RFC 4180 is the definition of text/csv. Commas, quotes, and newlines are all supported in quoted values. Has it been done wrong? Of course. I also routinely need to talk people out of processing XML using hand-written regexes.
Re:Here, let me fix that for you ... by IntlHarvester · 2008-02-18 20:58 · Score: 1

My real world experience indicates that using an xml library is generally a lot nicer than trying to parse some random shithead's undocumented homebrew delimited format.

--
Business. Numbers. Money. People. Computer World.
Re:Here, let me fix that for you ... by mrchaotica · 2008-02-19 02:52 · Score: 1

Wrong - the special null delimiter is needed only for variable-length (and zero-length) fields and records. For fixed-length fields and records, no delimiter is needed.

No shit, Sherlock! My point was that you changed your mind between posts, because you have no design. How the heck are third parties supposed to read your data if you're doing that?!

--
"[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
Re:Here, let me fix that for you ... by trolltalk.com · 2008-02-19 04:09 · Score: 1

Grow up - none of these is an "original" format - they've been in use for decades, well before the xml crap. You might want to look at the various system delimited formats - oh, wait, that predates the Internet by several decades, so its not as "cool" as xml.
The right tool for the right job - and xml is rarely the right tool.
Case in point - the code I'm working on at work - totally unusable because of bad decisions - the use of xml, the stl, an in-memory tree that is really stupidly implemented, other badly designed classes, reference-counting for garbage collection, locking of the whole process instead of individual threads, etc., - none of which are necessary in what is supposed to be a high-performance server. Its like theyy were allergic to malloc / free and simple data structures ...

--
Kevin Smith on Prince

Is XML just SGML redux? by OrangeTide · 2008-02-18 05:34 · Score: 1

XML doesn't seem like a big deal. SGML was around since the mid-80s, making it over 20 years old. XML is stricter in many ways, and layers some useful concepts on top of SGML. But otherwise it seems to have a lot of the same uses and syntax as SGML itself.

As a side note, I dislike it when people use XML inappropriately, like using XML-RPC when something based on ASN.1 might be more appropriate. (How many wannabe MMORPG projects have I read that are "XML-RPC" based? too many). I'm sure there are good uses for XML, but there are a lot of people out there who apparently aren't aware that there are bad uses for XML.

--
“Common sense is not so common.” — Voltaire

Re:Is XML just SGML redux? by PinkPanther · 2008-02-18 14:37 · Score: 1

XML doesn't seem like a big deal. SGML was around since the mid-80s, making it over 20 years old.

Mid-80s...or perhaps the 60s. Anyways, "ancient history"....
Oh, and notice that if you look at the above link, and follow the link to the OED project...then you'll see references to our friend Mr. Bray yet again.
There is sanity to these progressions....

--
It's a simple matter of complex programming.
Re:Is XML just SGML redux? by OrangeTide · 2008-02-18 17:10 · Score: 1

Mid-80s...or perhaps the 60s. Anyways, "ancient history".... reading wikipedia apparently doesn't make you an expert. GML won't look familiar to anyone used SGML. It's like trying to argue that ALGOL and Pascal are the same thing or that BCPL and C are the same.

Anyways that's not important. what I'd really like, is some clarification of your point. Because I don't see what you're getting at.

--
“Common sense is not so common.” — Voltaire

Re:Java and XML, bad tastes that are worse togethe by bckrispi · 2008-02-18 05:36 · Score: 4, Insightful

I'll take an Ant XML build file over an "is that a tab or a space" Makefile any day...

--
Xenon, where's my money? -Borno

Re:10 Years and still waiting by EMN13 · 2008-02-18 05:36 · Score: 3, Informative

I use it in web development constantly, and have for about 8 years. It's great for documents mostly since it's much easier to process than a home-grown set up.

You want to transform the document, you can use any of a number of techniques, and trivially guarantee that the resulting document is at least syntactically valid. If you use a home-grown format (or HTML), you'll need to resort to regular expressions, or a custom parser - which works fine up to a point. Regex's are error prone (it's quite difficult, for instance, to make an untrusted HTML document safe with regex'es), and parsing is difficult, and doesn't solve the transformation step very elegantly - wheras XPath and others are absolutely brilliant for quickly distilling the stuff you need from a document.

But on the parsing side... take a look at ANTLR, it's just great :-).

Your comments seem tainted with inexperience. by sidragon.net · 2008-02-18 05:39 · Score: 3, Insightful

In general, if you have data to be structured and serialized, XML is one way to do it. If you think XML a poor choice, then could you suggest an alternative? Incidentally, that suggestion should not imply that everyone reinvent their own formats (again).

[N]ot only does Java represent a poor trade off between the annoyances of a strongly typed language and the speed of a dynamic interpreted one ...

Would you provide evidence aside from personal anecdotes, and possibly consider evidence to the contrary?

[Java] has a horrible mess of dependency issues that nobody really solves besides.

Perhaps you meant “modern software” instead. Any complex application these days relies on dozens of libraries and services to perform tasks. Not quite sure where exactly you are having difficulties, so I cannot elaborate further.

[XML] is too ugly and unreadable ... But as a general tool for Internet plumbing it's awful.

XML is intended for consumption by machines first, people second. You might also argue that in-memory data structures are ugly and unreadable.

Re:Your comments seem tainted with inexperience. by hoggoth · 2008-02-18 07:18 · Score: 1

> If you think XML a poor choice, then could you suggest an alternative?

YAML for the win!
YAML is concise, easy to read, easy to write, easy to parse, easy to edit.
It has high signal-to-noise ration, and is effortless for the human eye.
It can represent any data structure I can imagine.
It has libraries for any popular language I can think of.

--
- For the complete works of Shakespeare: cat /dev/random (may take some time)
Re:Your comments seem tainted with inexperience. by argent · 2008-02-18 09:22 · Score: 2, Informative

If you think XML a poor choice, then could you suggest an alternative?

Depends on the problem you're trying to solve.

A hell of a lot of the stuff I'm seeing in XML these days would be better off as token-separated self-describing tables (tables where the column names are the first row), or a modestly extended token-separated format like CSV.

For binary data something derived from Electronic Arts semi-self-describing interchange file format is good, examples in current use are MIDI File Format and Portable Network Graphics...

For arbitrary self-describing data there's always ASN.1.

For tagged arbitrary chunks of data descendants of RFC-822 are common.

For shallow-nested keyword-value data there's Microsoft's INI files.

And, of course, Lisp S-Expressions do absolutely everything XML does, more compactly, and are easier to parse.

Incidentally, that suggestion should not imply that everyone reinvent their own formats (again).

But XML doesn't solve that problem. I've found that the amount of code it takes to extract data from an arbitrary XML file even with an XML parser at hand is not significantly less than the amount of code it takes to parse and extract data from any other self-describing format.
Re:Your comments seem tainted with inexperience. by Stu+Charlton · 2008-02-19 01:35 · Score: 1

CSV is a false sense of security. Unless you're using a battle-tested parser, many (lazy) programmers are killed by corner cases.
ASN.1 is a beast to parse.
Lisp S-Expressions are out of fashion -- I believe many are actively hostile to them, actually, as retaliation for having to (pretend to) learn it in school. (Sad but true)
INI files -- I don't believe there are many multi-platform parsers.

The one I agree with: RFC-822 is a simple, multi-platform alternative.
JSON probably too.

But XML doesn't solve that problem. I've found that the amount of code it takes to extract data from an arbitrary XML file even with an XML parser at hand is not significantly less than the amount of code it takes to parse and extract data from any other self-describing format.

The DOM shouldn't count! There are better parsers out there.

XML does solve the problem of having an interoperable and expressive markup language. It's being used a bit too violently, for sure, but that's because the industry is slow at figuring out the difference between syntactic and semantic interoperability.

--
-Stu
Re:Your comments seem tainted with inexperience. by argent · 2008-02-19 08:31 · Score: 1

CSV is just a special case of token-separated files, and both TSV and CSV parsers do have well known failure modes, BUT so do XML parsers, and with XML you have whole classes of failure modes that are absent in strict table formats... like parsers that don't distinguish between nested tags and attributes, or that *do* distinguish between them when the file creator didn't intend them to be distinguished, and so on. There are so many optional features in XML where more than one alternative is reasonable, and if you don't pick the right ones you're hosed.

The XML advocate then comes back and says "use a battle-tested parser". But:

* Somehow "use a battle-tested CSV parser" never occurs to them.
* Even with a battle-tested *parser*, you have to deal with things like:
<key name="foo">bar</> <key name="foo" value="bar"/> <foo>bar</> <foo value="bar"/> <key>foo</><value>bar</> <cell><key>foo</><value>bar</></> ... ... ... ... lots more ellipses ...
All of which may fit some DOM and all of which produce something that you or I can see are meant to be the same thing... and I've seen all of them used to mean the same thing. Apple's property list files are by no means the worst abuse of XML out there.

But a good XML parser has to be able to deal with them all, and produce hooks to pull useful information out of any possible DOM, because some asshole programmer is going to throw them all at it some time. And once it's parsed you STILL have to deal with it... and if there are multiple implementations generating the same file you know some damn fool is going to pick a different plausibly equivalent option for the above lines (especially if they're using a 'battle tested XML generator') so you have to handle that...

If you only have one generator it's simpler, until they reimplement it and pick a different option.

Matching quotes in CSV is a lot easier. Really.

ASN.1 is not my favorite, though I've dealt with worse beasts (DCA containing mixed EBCDIC and ASCII, for example), but at least there are battle-tested parsers for it.

S-expressions have some unfortunate history, but trust me... there are people who hate XML at least as much.

INI files: there's open source multi-platform parsers for it. There's one in Samba, for example.

XML does solve the problem of having an interoperable and expressive markup language.

Indeed it does, and if it was only used for that purpose without trying to push it as a general solution for information transfer where it's NOT appropriate I would have no problem with it.
Re:Your comments seem tainted with inexperience. by Stu+Charlton · 2008-02-20 03:51 · Score: 1

My partial hope is that some non-XML syntax of RDF eventually catches on as a way to serialize data.

It's pretty simple in some ways, but capable of much sophistication.

--
-Stu
Re:Your comments seem tainted with inexperience. by argent · 2008-02-20 06:02 · Score: 1

Interesting. RDF still seems a little verbose, even in N3 format. For tabular data a token-separated layout (of which CSV is an elaboration) would seem better, and would work for RDF... for example:
<http://www.slashdot.org> type bookmark <http://www.slashdot.org> visited "somedate" <http://example.com> type bookmark <http://example.com> visited "anotherdate" <http://example.com> password-hash "8193787837893" ...
Even in N3 format that's longer than this:
URN,type,visited,password-hash <http://www.slashdot.org>,bookmark,somedate, <http://example.com>,bookmark,anotherdate,8193787837893 ...
Re:Your comments seem tainted with inexperience. by argent · 2008-02-20 11:55 · Score: 1

Looking further, RDF seems to be a rather specialized language, and honestly except for the simplest examples N3 doesn't seem to be significantly better than XML - it's more complex and actually seems less human-readable, based on the examples I've found. And for the cases where N3 is really a win, token separated values are a bigger win.

Why is editing XHTML "doing it wrong"? by tepples · 2008-02-18 05:39 · Score: 1

Here's one basic, freakin' obvious rule: if a human, at any time at all, has to read or manually edit an XML document, you're doing it wrong. So am I doing it wrong by posting this comment using Slashdot comment markup, which appears similar to XHTML? Which XHTML editor do you recommend, and is it really faster than manually adding <p>...</p>, <em>...</em>, and <a href="...">...</a> tags using Notepad++?

Re:Why is editing XHTML "doing it wrong"? by colmore · 2008-02-18 05:55 · Score: 2, Insightful

xhtml is one very small dialect of xml.

when you are entering html style markup tags, you are using xml. but xml is a much much larger subject than that. hand editing a website is fine. (if the documents are getting huge, it should be split into smaller files and automated somehow, anyway) hand editing, say, Open Office's xml format or any of the fairly arcane XMl formats used for interprocess communication.

XML is sort of designed to be the second best data format for any application. There are a lot of times when something like /etc/passwd is more legible and appropriate. And there are times when the volume of data requires binary. XML is good because it is widely known and when the originating application is lost, the data can still be (with moderate difficulty) understood.

It's very similar to Java really. It got hyped for a specific web use that didn't really materialize, but it's ability to be generic, widely-spoken, and safety-checked means it has found widespread use across the entire computer industry in places that aren't quite as visible to end-users as simple web application or document formats.

--
In Capitalist America, bank robs you!
Re:Why is editing XHTML "doing it wrong"? by einhverfr · 2008-02-18 12:27 · Score: 1

XML is sort of designed to be the second best data format for any application. I disagree.

XML is excellent at two tasks and sucks at everything else.

1) Human-readable object serialization format with possibilities of transformation into other object models (notice I didn't say "formats").
2) Interfaces between applications based on the above advantage (I would argue that XHTML is an example of this).

As soon as you start going away from this (say, to try to use it for database-related tasks), or try to make your application natively work in XML, you are doing it wrong. XML should be used where it is the right tool for the job and not used anywhere else.

XML is probably the second-worst format for things like:
1) Mass data storage
2) Complex information management

Most things like "master document management" really come down to attempts to do the legitimate two uses but a lot of other things go way outside what the format can reasonably handle.

--

LedgerSMB: Open source Accounting/ERP
Re:Why is editing XHTML "doing it wrong"? by gullevek · 2008-02-18 14:01 · Score: 1

100% agree. XML rocks to talk between apps (eg php -> flash), but sucks big time in beeing a database (been there, tried, and learned a lesson)

--
"Freiheit ist immer auch die Freiheit des Andersdenkenden" - Rosa Luxemburg, 1871 - 1919

Re:Java and XML, bad tastes that are worse togethe by nguy · 2008-02-18 05:48 · Score: 1

Incompetent use of a language/API doesn't equate to a bad language/API

No, but incompetent design of a language/API does, in fact, equate to a bad language/API.

Regex by tepples · 2008-02-18 05:48 · Score: 1

To me that says that XML handles a problem that wasn't there. Parsing problem for pretty much everything is almost universally solved by regex... But not all languages are regular languages to be described by regular expressions. Is there a standardized form of confrex or consenex?

I wish we could use something like XML (in that it could use DTDs as schemas, and had support for DOM methods along with XQuery and XPath), but with a more effecient format (binary), and with the ability to encode references. Your wish is W3C's command.

Re:Regex by fireboy1919 · 2008-02-18 06:00 · Score: 1

But not all languages are regular languages to be described by regular expressions. Is there a standardized form of confrex or consenex

This is a red herring.

No, not all natural languages are regular, and even most computer languages are not regular.

But I'm pretty sure that all languages (or to go more primitive, algebras) that can be expressed as XML can be parsed by a regular expression.

Can you disprove this?

--
Mod me down and I will become more powerful than you can possibly imagine!
Re:Regex by msuarezalvarez · 2008-02-18 06:48 · Score: 1

You cannot use regular expressions to decide whether a string is an instance of the following DTD or not: <!ELEMENT a (a | b) > <!ELEMENT b EMPTY > This is quite basic language theory.
Re:Regex by the-matt-mobile · 2008-02-18 06:58 · Score: 1

You are confusing the difference between whether an XML document is "well-formed" vs. "valid". Assuming that your Regex engine supports the "Depth" syntax for balanced sets, you can in fact parse any well-formed XML document with a not-so-"regular", regular expression. Once the document is parsed (ie: the XML is well-formed), THEN you can use DTDs or XSDs to determine whether the XML is "valid". The validity check cannot be done by a regex - you're right about that.

Of course, this is not an endorsement of useing regex's for this purpose - for the sake humanity, please don't do this! But, from a regex functionality standpoint, it is possible to write a regex to check for well-formed XML.
Re:Regex by TheRaven64 · 2008-02-18 07:02 · Score: 5, Informative

You fail Computer Science 101. Regular expressions are exactly as expressive as finite automata. A finite automaton is incapable of solving the matching brackets problem, since that requires a potentially infinite number of states in order to keep track of the number of open brackets in an input stream. Because of this, a regular expression can not be used to parse any XML schema that allows an arbitrary depth of nesting, since parsing such a form with would require counting the open and close tags to make sure they match, which is not possible with a regular expression.
This is why regular expressions are typically used for lexical analysis (tokenisation) not syntactic analysis (parsing).

--
I am TheRaven on Soylent News
Re:Regex by WilliamSChips · 2008-02-18 07:04 · Score: 2, Informative

No, you cannot with a regex. If you can, it's not really a regex, it's something different.

--
Please, for the good of Humanity, vote Obama.
Re:Regex by the-matt-mobile · 2008-02-18 07:30 · Score: 0, Troll

Yes, thank you for the unnecessary terminology lesson, but I believe that I made it pretty clear in my post that I was referring to Regex technologies, not a formal DFA/PDA engine for a "regular" language. Thanks for playing though.
Re:Regex by the-matt-mobile · 2008-02-18 08:42 · Score: 2, Insightful

How is this insightful? Yes, from a strictly comp-sci definition of a "regular expression", you are exactly right. But this is not a comp-sci class and this is not a theory lesson! In the real world where real programmers write real (crappy) code, a parser that parses only regular languages is not very useful. All modern regex parsers handle more than just regular expressions - back referencing, depth parsing, lookahead/lookbehind are all common features of modern regex engines that violate the rules of parsing a "regular language" using a simple memory-less DFA/PDA state machine. Real regex parsers use (GASP) *memory* to do their parsing. So, while you wallow in semantics and theory, people are out there are doing real (and granted silly) things with regex parsers because they can. For the purpose of this discussion, the original poster is right that it is possible (through incredibly unholy) to determine well-formed-ness of XML via a modern regex parser even through XML is not a regular language.
Re:Regex by tepples · 2008-02-18 10:01 · Score: 1

For the purpose of this discussion, the original poster is right that it is possible (through incredibly unholy) to determine well-formed-ness of XML via a modern regex parser even through XML is not a regular language. Or, shorter: "Regex has become a misnomer in modern libraries." Do I understand your point correctly?
Re:Regex by the-matt-mobile · 2008-02-18 10:04 · Score: 1

yup (how's that for terse?)
Re:Regex by Flagran · 2008-02-18 12:19 · Score: 1

Except that even most modern "regexp" libraries only allow a limited amount (usually nine, in my experience) of back references. To check that closing tags matched opening tags you'd need arbitrarily many back references. Is there any regexp library that does this?

--
Make love, not sigs
Re:Regex by the-matt-mobile · 2008-02-18 13:08 · Score: 1

You can used named references instead of numbered, which has the syntax (?). This would give you practically unlimited back references. However, to deal with arbitrary tag nesting, you'd not want to use back references. You'd want to use balancing groups (http://blog.stevenlevithan.com/archives/balancing-groups). Not all regex libraries support this functionality. Actually, you'd really just want to use DOM or SAX and not reinvent the wheel, but there's no accounting for taste.
Re:Regex by Anonymous Coward · 2008-02-18 14:29 · Score: 0

Can you teach a hick XML? Yes.
Can you teach a hick Regex? No.
Do you work with a team of hicks? Yes.

Problem solved.
Re:Regex by syousef · 2008-02-18 15:18 · Score: 1

Can you disprove this?

The guy who got modded up telling you that you fail Comp Sci was rude, but then that seems to be what is rewarded on slashdot these days. He's right in as much as you are displaying your ignorance, but it's possible to be more helpful.

Regular expressions certainly aren't enough to parse XML.

You need to read the following book and take a course on compilers.
http://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools

I could give you a bunch of rules to google, but frankly I'm rusty on this stuff, and didn't get as deep an understanding as I wanted in the time I had when I studied this. In fact I'd love to go back and re-take my compiler course. It isn't simple stuff, but it is the very heart and soul of the computer science you use every day. You certainly don't need to understand it to write most business software, but it helps.

--
These posts express my own personal views, not those of my employer
Re:Regex by Flagran · 2008-02-18 15:40 · Score: 1

That's pretty interesting. Expression matching is what my company does, so seeing how other implementations work is always interesting. I had no idea that .NET had their own Regexp world. We started with a system that was all Perl regexps, and then Jakarta, and then java.util.regex but found that our large set of expressions got unmaintainable in all of these systems. Now we have our own, in house expression matching engine. This has worked out very well for several years, but now it, too, is experiencing major growing pains. Why can't human language all be in Polish notation? The Malagasy http://en.wikipedia.org/wiki/Malagasy_language/ come pretty close, though.

--
Make love, not sigs

Re:Java and XML, bad tastes that are worse togethe by Omnifarious · 2008-02-18 05:49 · Score: 1

OFX servers for financial institutions. Without name dropping, check out the list of banks, brokerages, tax services, and credit card providers (Quicken) out there successfully serving up client data.

I'm aware of OFX, and it is something I consider a non-evil use of XML. It is all about the data, and the data is high-volume, structured and text-like, so something like XML makes sense for representing it.

OTOH, name dropping gets nowhere with me. Large institutions routinely adopt very stupid technologies for the most ridiculous of reasons. I'm much more interesting in what a small, nimble high-tech company like Automated Trading Desk is doing than what Chase-Manhattan is doing. Of course, ATD appears to have gone to an all-flash homepage, which is an impressive level of stupidity, so maybe they've gotten all grown up now.

I have to admit, I'm clueless about your Java dependency issues. The only way I can see that ever happening is if you're dumping all of your classes into the default top-level package; and that's major user error if you are.

I do a build with Maven and it pulls down at least 20 different Java libraries and packages them all up with my program for even the most innocent of dependencies. Not only that, but then when something is deployed it tends to get deployed with all of its dependencies. No sense of a standard place to put libraries or trying to make sure that you don't have 20 different versions of a library around for the 10 different apps that use it. It's a nightmare.

And when I complain to Java people they tend to tell me "Oh, enterprises like it that way, it means they can stay in crufty code land forever and never have to upgrade anything if they don't want to!" which I read as "We don't really want to actually spend any time trying to make our development process vaguely reasonable, we just want to toss code on the wall and wait for things to stick.". It's pathetic and makes for intolerable integration issues for larger projects. I guess it all fits with the idea of Java being for programmers who don't actually want to think about the code they write.

--
Need a Python, C++, Unix, Linux develop

Re:Java and XML, bad tastes that are worse togethe by Black-Man · 2008-02-18 05:52 · Score: 1

That's because OFX IS A DEFINED STANDARD - a standard driven by Intuit. I guess you're too young to remember NPC - a competing standard? Or having to support BOTH? Oh yeah... that was great fund.

You tell me what is a standard in Ant? Nice taking his comments out of context.

Re:Java and XML, bad tastes that are worse togethe by fartrader · 2008-02-18 05:54 · Score: 2, Informative

Java is clearly moving away from the massive over-use of XML in everything from configuration to messaging. From Java 5 onwards, annotations are rapidly becoming the configuration mechanism of choice, where infrastructure configuration is placed in the source code directly, in a way thats significantly less obtrusive than writing code to manage things like persistence and transactions yourself, and significantly easier to follow than placing it in many XML files. Anyone who has migrated from EJB 2.1 to 3.0 for example should be much happier now that the various XML files needed to get it to run are going the way of the dodo. This use of annotations to replace XML is an emerging trend popular in many frameworks, from EE 5 through to Hibernate and Spring. On the messaging side there are a slew of code generation tools and XML-to-POJO (annotation-based) mappings that keep you away from raw XML - yes its another layer of abstraction but it keeps you away from the coding horrors of SAX, DOM, and yes even the comparative simplicity of JDOM.

Re:10 Years and still waiting by somersault · 2008-02-18 06:07 · Score: 1

I thought that was caused by people adding comments boxes to webpages? You don't need XML to do web 2.0 type stuff :o

--
which is totally what she said

Re:Java and XML, bad tastes that are worse togethe by bcrowell · 2008-02-18 06:12 · Score: 1

Java and XML are similar in that both of them got over-hyped. They're also similar in that sometimes they really are the right solution -- just not as often as PHBs seem to think. I've had exactly one application where I started designing the file format, and realized, "Oh heck, I'm reinventing XML," so I went with XML and it was the right choice. For config files, the advantage I can see is that although XML may not be optimal for every type of config file, it does provide an alternative to the traditional Unix philosophy of having a different, goofy syntax for every single program's config file. Re Java, what was really a disaster, in hindsight, was applets. They were overhyped, the CPUs weren't fast enough to give acceptable performance, the VM and its libraries are still too huge to give attractive startup times, AWT was a botch and had to be replaced, and implementations of browser plugins still suck -- in fact, my browser crashes every single goddamn time I visit this applet. Because Sun blew it so bad with applets (with a little help from MS), we've ended up instead with the de facto standard being flash, which is basically a totally proprietary system. (Yeah, I know about Gnash, Haxe, etc. Let me know when you can buy a Flash book and make the examples work using a totally open-source software stack.)

--
Find free books.

XML lite by hey · 2008-02-18 06:12 · Score: 1

There needs to be some description of an XML lite.
For config files and such.

- No doctype needed
- tags are case insensitive
- Can do comments with # character instead of
- Etc

Re:XML lite by k8to · 2008-02-18 07:04 · Score: 1

I like that your example of an xml comment turned into a web comment, and thus could not be read. I think it underscores the "rock solid" nature of both the software that processes this stuff in the wild, and the robustness of the web all at once.

--
-josh

not XML by Anonymous Coward · 2008-02-18 06:13 · Score: 0

Will someone tell me if this is stupid, smart, or both?
http://www.syntaxerr.org/~daniell/sss.html

That's easy to disprove... by warrax_666 · 2008-02-18 06:14 · Score: 1

Try writing a regex for parsing documents consisting of arbitrarily deeply nested elements. Say, documents of the form

<x><x><x><x>...</x></x></x></x>

See?

--
HAND.

Re:That's easy to disprove... by fireboy1919 · 2008-02-18 07:46 · Score: 1

I'd do: .* ...and run it recursively if I knew that this was a possibility built in to the language that I needed to address.

Of course...that almost never comes up.

regex=FSM. Languages that include regex!=FSM.

Regex+Languages that include regex=>parsing made extremely simple.

--
Mod me down and I will become more powerful than you can possibly imagine!
Re:That's easy to disprove... by Hatta · 2008-02-18 10:59 · Score: 1

What do regular expressions have to do with the flying spaghetti monster?

--
Give me Classic Slashdot or give me death!

Inferior to S-expressions by 5pp000 · 2008-02-18 06:17 · Score: 1

TFA is a fun read. Too bad XML sucks. As Jerome and Philip Wadler write, "[T]he essence of XML is this: the problem it solves is not hard, and it does not solve the problem well."

Lisp had the same problem solved 40 years earlier. While a lot of people find S-expressions verbose, XML is quite a bit more verbose. Slava Akhmechet has a nice essay on the relationship between the two notations.

--
Your god may be dead, but mine aren't!

Re:Inferior to S-expressions by 5pp000 · 2008-02-18 06:39 · Score: 1

Whoops -- the authors of the linked paper should have been given as Jérôme Siméon and Philip Wadler. Sorry for the error.

--
Your god may be dead, but mine aren't!
Re:Inferior to S-expressions by Shados · 2008-02-18 07:00 · Score: 1

Really, XML does solve the problem. The only issue with it, is that its designed to solve ALL problems, instead of using the usual 80/20 rule... instead of being optimized for most problems, it uses the lowest common denominator to try and catch the 20 other %...and everything that makes XML suck comes from that extra 20%.

If it was easier to handle dates in JSON without schemas, we'd have one heck of a winner there though.

Re:10 Years and still waiting by iamacat · 2008-02-18 06:30 · Score: 3, Insightful

Here is another obvious rules: If a computer, at any time at all, has to parse or generate XML in large amounts, you are doing it wrong. There is really no need to resend the same string 100000 times, encode multi-megabyte binary data as BASE64 or lose floating point precision by encoding to or from strings. If need be, an efficient binary format can represent the data with an arbitrary schema. Communicating parties can exchange their schemas at runtime and avoid sending attributes that the other end is not going to use.

Re:Java and XML, bad tastes that are worse togethe by AlXtreme · 2008-02-18 06:32 · Score: 1

I don't know about Thrift being a real contender in the web/internet-based services area. Really, code generation? How 80's. Haven't we learned enough from Sun RPC that this is a PITA, give me a proper library dammit! And AFAIK D-Bus is for local IPC, good luck sending messages over a network without a couple of hoops to jump through.

I can see your viewpoint, if you want to squeeze as much performance out of your application you might want to investigate Thrift, D-Bus or simply write your own TCP protocol. It's not rocket science. However in a world where companies expect to exchange data and organizations want to link databases from different vendors together, I'd rather have a poor-but-workable standard than none at all.

SOAP is a complete pile of bloatware. It puts OpenOffice to shame on this front. However I'd rather have a nice Python library that lets me throw around objects and gets the job done than a performance improvement of 50% and a lot of extra work. It's simply not worth it for most of the time, premature optimization.

Having said that, I prefer XML-RPC and REST-style interfaces. The simpler, the better.

--
This sig is intentionally left blank

Piling on with my 2 cents of XML hate by Anonymous Coward · 2008-02-18 06:33 · Score: 0

I like XML as a data format, but I am sick and tired of lazy people using it as a programming language format. If you want to design your own language, do it properly, and don't drown other folks in angle brackets, double quotes, entity train wrecks and so on.

Only maybe XSLT gets a pass on this, even though XSLT is a godawful horrible mess.

Thanks, Tim... by Anonymous Coward · 2008-02-18 06:39 · Score: 0

..for 3 years of optimism... ...followed by 7 years of bloated files, bandwidth increases, unenforced constraints (hell, *undescribed* *non-existent* constraints), duplicate "unique IDs", unreadable "human readable" documents, unenforced constraints, ambiguous schema, confusion between syntax and structure, wasted stack space, stupid whitespace issues, stupid encoding issues, infinite numbers of documents representing the same data, setting data management theory back about 40 years (hierarchical, text-only), and angle brackets. Lots of fucking angle brackets.

XML: lets incompetent people feel smarter by making their tools more limited.

PS: I had an issue last week parsing a 4GB XML file.. I solved it with "grep", isn't that funny?

Re:Java and XML, bad tastes that are worse togethe by shutdown+-p+now · 2008-02-18 06:41 · Score: 1

Some bad uses for XML: Unbounded data streams; e.g. streaming media

The success of XMPP, which is entirely centered around the concept of an XML stream, seems to disprove this.

Databases

Normally true, but for small catalogs of a few hundred records at most, one may consider XML for its ease of handling and recoverability.

Also, for tree-like structures, XML/XQuery databases can often beat relational (once you start getting into 10+ joins in the latter, that is). Of course good XML databases don't really store XML in text; they merely use the XML Infoset as their data model. Still, XQuery is pretty convenient, and much more readable than SQL.

Re:Java and XML, bad tastes that are worse togethe by fedtmule · 2008-02-18 06:44 · Score: 1

From parent:

Some good uses for XML:

Ephemeral representations of atomic, structured data; usually for transport.
Config files. More verbose and the syntax is far better at keeping you from fat fingering a setting and blowing up your app. If you can't clearly read XML, you need glasses.

I am to understand this as if verbosity is a good thing? If you really think so, then try looking up the definition in a dictionary.

And people do not have trouble reading XML because the lack glasses. They have trouble XML because its signal-to-noise ratio is so low. In other words, people have trouble reading XML because it is so verbose.

Re:10 Years and still waiting by TheRaven64 · 2008-02-18 06:44 · Score: 4, Informative

Does anyone still use latex2html? All of the TeX users I know who care about HTML output switched to tex4ht years ago. It produces a variety of XML formats, including XHTML (with MathML) and OpenDocument.

--
I am TheRaven on Soylent News

Not so fast... by the-matt-mobile · 2008-02-18 06:46 · Score: 1

What you want is called a balanced group. The .NET flavor or regex's have the ability to parse those (Depth keyword). See here.

Of course, by definition, an arbitrarily nested structure is not "regular", but regex's have been adapted to do all sorts of things that really fall into the realm of what a CFG should do.

Re:10 Years and still waiting by youthoftoday · 2008-02-18 06:46 · Score: 1

mod -1 we-know-what-he-meant

--
-1 not first post

Re:Java and XML, bad tastes that are worse togethe by farnsworth · 2008-02-18 06:51 · Score: 1

I have to admit, I'm clueless about your Java dependency issues.

Usually this is because the container you are using depends upon version X of XML Library A (usually to read it's own config files, or other boring stuff) while some your own code or some third-party API you use depends upon version X+1 of that same Library A. It's not an impossible problem to get around, but it's a problem that exists in almost every non-trivial app I've ever worked on.

--

There aint no pancake so thin it doesn't have two sides.

10 years eh? by tristian_was_here · 2008-02-18 06:58 · Score: 0

What was I doing 10 years ago? Well I was kissing girls and such (being 10 years old then) I wish I could say the same again.

Re:10 Years and still waiting by Planesdragon · 2008-02-18 07:10 · Score: 1

Here is another obvious rules: If a computer, at any time at all, has to parse or generate XML in large amounts, you are doing it wrong Depends on what you're doing.

One computer storing temporary data? XML is worthless. A computer storing data for use on said same computer? XML brings little to the table.

One computer program writing something that a different computer program will read from a file system at a later date? Look at XML. If you save a non-trivial amount of processor or developer time, go with it.

And let's ignore the fact that AJAX really doesn't work without XML, will we? Because that kind of defeats the original whiney argument.

Devolution by Ilan+Volow · 2008-02-18 07:19 · Score: 1

Because the only thing that's more scary and complex than the overly-complicated RDF we have today is the under-planned, overly-extended JSON and YAML that we'll have five years from now, whose original form is twisted and contorted beyond recognition in an attempt to make it do things in the future that XML was designed to do from the get-go.

--
Ergonomica Auctorita Illico!

Re:10 Years and still waiting by jmorris42 · 2008-02-18 07:21 · Score: 1

> Here's one basic, freakin' obvious rule: if a human, at any time at all, has to read or manually edit an XML document, you're doing it wrong.

Amen! Which is why I absolutely HATE HATE HATE XML config files. Because they aren't human readable and editing one is an invitation to disaster. There are no editors so XML is only useful for apps to communicate with each other. And there are equally useful ways for that to be implemented.

Seriously, there is no editor. I'm told you can buy them for Windows if you spend insane quantities of cash, but I don't do Windows. Comglomerate claims to be working toward the ability to edit XML for *NIX but I only tried it once. Installed an RPM and fed it a Fedora comps.xml file.... and waited. Until the OOM killer put it out of my misery.

--
Democrat delenda est

What's happened with XML reminds me of... by AmazingRuss · 2008-02-18 07:21 · Score: 1

what happened with DBASE and its kin. It's easy enough to use that any idiot can...and you end up with schema that reflect that idiocy.

XML isn't the problem. Idiots writing XML is. I'm beginning to think that a certain level of difficult is necessary as a screening device.

Re:What's happened with XML reminds me of... by einhverfr · 2008-02-18 14:36 · Score: 1

I dunno.....

In the LedgerSMB community (and even the core team) the merit of XML have been a matter of relatively perpetual debate. In general this means that it becomes important to justify the value that XML brings to things.

Personally, I have fought against the idea that our documentation should be moved to DocBook XML as the master format. My major concerns are that:
1) It doesnt buy us any format support since LaTeX (the current format of the master) can be converted into DocBook XML 5 using tex4ht.
2) Signal to noise ratio in XML vs LaTeX is *far* lower.
3) Verbosity introduces opportunities for error.
4) Semantically, LaTeX is richer than DocBook XML is, and although you can convert Docbook to LaTeX losslessly, you can't convert even the semantic elements from LaTeX to DocBook losslessly. For example, LaTeX tables and floats both map to Docbook floats.

--

LedgerSMB: Open source Accounting/ERP
Re:What's happened with XML reminds me of... by tomhudson · 2008-02-18 15:46 · Score: 1

". In general this means that it becomes important to justify the value that XML brings to things."
That 10 years later this is still such a flame issue shows that xml is simply the wrong tool for SO many things ...
In the beginning, there was toggling hex codes on the front panel ... punch tap ... card stacks.
Nobody complained about formatting, etc.
Then we had assembler ... again, not too much complaining about the actual format.
Then there was c ... and holy brace wars
Then there was java ...
andWeGetAllTheseLongFactoryMethodsAndOtherRidiculousShit()
ToThePointThatWeCantUseAn8ColumnTab()
... and ...
everythingIsAClassEvenIfItShouldntBe()
soWeHavetoAutoBoxUnboxPriimitiveTypes()
... so we end up with the 4-column "tab" to help make up for too many levels of nesting by artificially forcing everything into "it has to be a class" ...
Then there was xml ... sort of like giving cans of spray paint to kids - they go around tagging EVERYTHING ... the more tags the better ... no wonder xml is really ghetto ...
It seems the less rigour required to use a language, the more likely the end result is going to look and act fugly ...
Re:What's happened with XML reminds me of... by einhverfr · 2008-02-18 16:42 · Score: 1

Ok, the issues I mentioned above are limited to documentation.

On the issue of web services, yes, XML brings some value and this can be justified (we are looking at RESTful web services).

So I am left to conclude that XML is good as:
1) A program-neutral method of structured information interchange
2) A program-neutral format for object serialization such that it could be transformed into other programs' object models.

I dont think it is really useful for anything very far outside of either of the above issues.

--

LedgerSMB: Open source Accounting/ERP

Re:10 Years and still waiting by mini+me · 2008-02-18 07:27 · Score: 1

I thought that was caused by people adding comments boxes to webpages?

The comment boxes are part of Web 1.0, but the RSS feed to those comments is Web 2.0. Web 2.0 defines the machine readable web. Documents designed for computers instead of humans.

You don't need XML to do web 2.0 type stuff

While there are some people using technologies like JSON and YAML, for the most part you do need XML for Web 2.0 stuff.

Re:Java and XML, bad tastes that are worse togethe by MBCook · 2008-02-18 07:31 · Score: 1

That's Maven's job. It's supposed to get all the JARs you tell it to.

It's not Maven's job to figure out if you actually use a JAR (which gets complicated when code depends on JAR A, which depends on JAR B, which....).

The usual way to handle something like this is to use Maven to keep things up to date on your machine. You can deploy all those JARs with your program (as you seem to be doing) or you can keep them somewhere else on the server and update them manually. Maven makes sure you have the requisite stuff when you checkout someone else's project, and once you put it on the server that code is already there in the classpath so you don't have to upload all those JARs. If you open other projects all the time, having Maven pull random JARs for you can be a real plus compared to hunting them down yourself.

That said, I'm not a Maven fan. Maybe I'm just too old-fashioned. Maybe it's because I don't know the tool very well.

I guess it all fits with the idea of Java being for programmers who don't actually want to think about the code they write.

I could say about your "shoehorn everything into Maven and hope it works" mentality.

--
Comment forecast: Bits of genius surrounded by a sea of mediocrity.

Sure by warrax_666 · 2008-02-18 07:39 · Score: 1

That was just the standard trivial example -- it stands to reason that some people have hacked around it since it's such a common practical limitation. There are also other examples, say, anything requiring arbitrary amounts of (token) lookahead to resolve ambiguities.

--
HAND.

Re:Java and XML, bad tastes that are worse togethe by Omnifarious · 2008-02-18 07:40 · Score: 1

I don't know about Thrift being a real contender in the web/internet-based services area. Really, code generation? How 80's. Haven't we learned enough from Sun RPC that this is a PITA, give me a proper library dammit! And AFAIK D-Bus is for local IPC, good luck sending messages over a network without a couple of hoops to jump through.

The environment has changed. Dynamic languages allow the code generation to be done at runtime. I think Thrift has a good chance of succeeding in this sort of environment. Of course, IMHO, in order for that to really come into its own, Thrift must insist that any Thrift service support a standard API that allows downloading the API description.

I too prefer REST-style interfaces. I prefer technologies that encourage things to be done this way. RPC technologies almost universally try to make things 'easy' by making network messages look like function calls. And I think this is all the wrong approach for a variety of reasons, one if which is that it tends to lead to very non-RESTy interfaces.

--
Need a Python, C++, Unix, Linux develop

Re:10 Years and still waiting by bytesex · 2008-02-18 08:03 · Score: 1

"Here's one basic, freakin' obvious rule: if a human, at any time at all, has to read or manually edit an XML document, you're doing it wrong."

I hate to break it to you, but that was one of the /points/ of XML; the human read/editability. In fact, if that is /not/ the point of XML, there is no reason to use it at all; you would just use some binary format.

--
Religion is what happens when nature strikes and groupthink goes wrong.

Re:Java and XML, bad tastes that are worse togethe by CoughDropAddict · 2008-02-18 08:14 · Score: 3, Informative

So you're the guy who shits tabs in random places in source files, because you haven't figured out how to set up your editor to show you the difference. Please stop doing that. Tabs and spaces are different characters, even if the language you're using today treats them the same. If you're a VIM user, please learn to use "list" and "listchars."

so the second coming was 10 years ago? by spectro · 2008-02-18 08:32 · Score: 0, Flamebait

I mourn the day XML was born and I would put out a bounty for the heads of anybody related to it (along with millions of suffering developers).

I long for the day something simpler, cleaner and prettier is created although pretty much anything you can come up with will do (except perl of course)

btw: Why the hell are we still using HTML?, can somebody please come up with a better markup language for the new generation of browsers instead of patching and complicating HTML even more?

--
HTML is obsolete. It's time for a new, simpler and richer markup language.

Re:Java and XML, bad tastes that are worse togethe by rukidding · 2008-02-18 08:39 · Score: 0

Amen to that!

--
...

Re:Java and XML, bad tastes that are worse togethe by glesga_kiss · 2008-02-18 08:43 · Score: 1

"Oh, enterprises like it that way, it means they can stay in crufty code land forever and never have to upgrade anything if they don't want to!" which I read as "We don't really want to actually spend any time trying to make our development process vaguely reasonable, we just want to toss code on the wall and wait for things to stick."

For very large systems it works well that way. You don't want to have to retest every module due to a library upgrade in one, the ideal situation is where you can unit test the module and it's interface and just know that the rest of the application works based off that. With a J2EE container you can easily deploy several related but abstracted services that use different versions of their libraries. Once written, a module can be left untouched as long as it meets your needs.

RFC 4180 by eknagy · 2008-02-18 08:44 · Score: 1

RFC 4180

Re:10 Years and still waiting by Anonymous Coward · 2008-02-18 08:51 · Score: 0

You should be ground up and fed to the pigs, you fucking faggot.

Re:10 Years and still waiting by jalefkowit · 2008-02-18 08:52 · Score: 1

And let's ignore the fact that AJAX really doesn't work without XML, will we? Because that kind of defeats the original whiney argument.

I agree with most of what you wrote, but this assertion is just incorrect. Plenty of "AJAX" systems use non-XML formats to ship data around. One obvious alternative is JSON, but others exist too.

(Unless you're talking about "AJAX using XML" in the sense of "AJAX manipulating the DOM", but that's not really accurate either, since most sites don't provide well-formed XML as output and they still use AJAX techniques just fine.)

--

Read my blog.

RFC 4180 is not an official standard by kyz · 2008-02-18 09:06 · Score: 1

It's an informational guideline on what MIME data of type text/csv should contain, and it's ignored by the majority of CSV implementations.

--
Does my bum look big in this?

Re:10 Years and still waiting by Coelacanth · 2008-02-18 09:08 · Score: 1

Well, it is convenient for the developer, rather than the end user, to be able to read the stream. But I agree, there should be a standard binary format for proven applications. I believe standards have been developed for binary XML, but nothing in widespread use. Also, because of the structured format, XML is incredibly compressible, and I use xmill to save my XML data files in a few percent of their expanded size.

Re:10 Years and still waiting by Coelacanth · 2008-02-18 09:17 · Score: 1

Funny person. Of course there are editors, but then the human is editing the content, not the XML. That's fine, if the structure is sufficiently complex. But how often are you searching for one bloody key-value pair in a config file that's 10 times longer than it needs to be, when a properties file would do just fine?

255 chars should be enough for anyone by Serious+Callers+Only · 2008-02-18 09:36 · Score: 1

"All the time. Its not that hard. Also, if you're worried about such things as quoting, etc., you can always use fixed-width fields - makes indexing, looking up, and modifying values REAL FAST. Compare that to the mess of xml." I know, I use 255 chars al

Re:255 chars should be enough for anyone by trolltalk.com · 2008-02-18 10:19 · Score: 1

It works with fields of any size and any content, unless the language you're using is retarded and has strings of 255 chars max length, or character sets of one byte per character. In c, you don't care - treat it all as data bytes.
It works with wide-character and unicode character sets, so again, no problems with more than 255 characters.

--
Kevin Smith on Prince
Re:255 chars should be enough for anyone by Serious+Callers+Only · 2008-02-19 22:26 · Score: 1

The point being that if you define a max size for a field, you're bound to meet some data larger than you expected which needs to be truncated to fit.
Re:255 chars should be enough for anyone by trolltalk.com · 2008-02-20 04:09 · Score: 1
"The point being that if you define a max size for a field, you're bound to meet some data larger than you expected which needs to be truncated to fit."
If you're really worried about that, reserve the first 8 bytes for the data size. 2^64 bytes ought to be enough for anyone ;-)
This isn't new - pascal pStrings reserved one byte at the head for the actual string length . The pString is 256 chars long, with the "real" first byte being the actual length. This is also why you were limited to 255 characters - not because they were null-terminated, but because one byte was reserved for the width. Then when Borland implemented longer strings, they reserved the first bytes for the length.
This is actually a decent implementation for two reasons:
1. you no longer have to worry about embedded nulls;
2. you get the length of the string without having to scan it for the terminating null
--
Kevin Smith on Prince

Re:10 Years and still waiting by ninkendo84 · 2008-02-18 09:38 · Score: 1

Do you maintain a website? XML has been a godsend for those who want to maintain web and print output side by side.

Or you can just use a print stylesheet like you're supposed to. You know, that thing that browsers support by default?

--

$ make love
make: don't know how to make love. Stop

Re:10 Years and still waiting by narcc · 2008-02-18 09:45 · Score: 1

Here's one basic, freakin' obvious rule: if a human, at any time at all, has to read or manually edit an XML document, you're doing it wrong. Human readability is one of the oldest "selling points" for XML (and SGML, for that matter). If the format isn't indented to be human readable, why not a binary format instead?

The fact that XML is often difficult (sometimes impossible) for humans to read and manipulate is a failure of XML to meet it's design goals.

--
Required reading for internet skeptics

Re:10 Years and still waiting by CRCulver · 2008-02-18 09:48 · Score: 1

CSS print stylesheets have many limitations. And as far as I know, it's impossible to print (X)HTML with appropriate alignment and hyphenation based on language, which with transforming XML to LaTeX or FOP is easy to get.

Re:10 Years and still waiting by CRCulver · 2008-02-18 09:52 · Score: 1

Oops, that should read "...to LaTeX or XSL:FO is easy to get".

Re:10 Years and still waiting by iamacat · 2008-02-18 10:05 · Score: 1

One computer program writing something that a different computer program will read from a file system at a later date? Look at XML. If you save a non-trivial amount of processor or developer time, go with it.

And let's ignore the fact that AJAX really doesn't work without XML, will we? Because that kind of defeats the original whiney argument. What kind of alternative format were you thinking about that switching to XML actually saves processor time? As for the rest of the argument, you are mixing up cause and effect. It is only because other people chose to use XML that web browsers and many development tools include a parser. Otherwise, an efficient schema-driven binary format would work just as well in AJAX and Visual Studio. In fact, your example places undue restrictions on the program that will read the data at a later date. For example, you can not have a command line tool that finds and prints out one record out of a million because XML makes no provisions for indexing or even locating a record by number.

Ant make by toby · 2008-02-18 10:26 · Score: 1

Firstly it's easy to never make the tab/space error. I've used make heavily for years and I don't make that error. What's wrong with your tools?

Secondly Ant and make aren't even comparable in power or capabilities. Ant files are large, hard to read, and are the "training wheels" version of makefiles: There's so much you just cannot do, or cannot do with similar ease.

The Makefile is an exemplary manifestation of the UNIX philosophy: a concise, powerful DSL that does one thing well.

(OTOH, at my day job we do build our Java applications with Ant but are not afraid to use make where it makes sense (haha).)

If you can't learn to use make, maybe you should stick to Visual Studio...

--
you had me at #!

Re:10 Years and still waiting by David+Gerard · 2008-02-18 11:09 · Score: 2, Insightful

It Depends. We have systems that are arranged in a long content chain. One machine sends data to the next machine, maybe by pull, maybe by push. Next machine does ... something ... with it, passes it to next machines. Maybe the developers talk to each other, or remember why their predecessor made the system do that, or maybe they don't. XML is really Just The Thing for the job. And the fact that it can be tweaked by a human (e.g. the sysadmin who has to fix a broken thing) is fantastically useful.

--
http://rocknerd.co.uk

Re:10 Years and still waiting by Hatta · 2008-02-18 11:16 · Score: 1

Just because it's ASCII doesn't mean it's human-compatible.

If it's not supposed to be human-compatible, then why is it ASCII?

--
Give me Classic Slashdot or give me death!

Re:Java and XML, bad tastes that are worse togethe by Omnifarious · 2008-02-18 11:16 · Score: 2, Insightful

The answer to one particular parsing stupidity is not to introduce a different, altogether different set of parsing stupidities to fix it. XML is not a programming language, and making it into one is a pretty distressing and contorted thing to do.

--
Need a Python, C++, Unix, Linux develop

XML == the CSV of Y2K by DulcetTone · 2008-02-18 11:31 · Score: 0, Troll

Is there a real difference between the two when you get right down to it?

--
tone

Re:Java and XML, bad tastes that are worse togethe by Anonymous Coward · 2008-02-18 11:31 · Score: 0

I'll take a Makefile (whose biggest problem can be solved with find-and-replace in any text editor) over an Ant XML build file any day.

I could go off on the problems with Ant for pages, but instead, I'll just point out that the problem you have with Make has a corresponding problem in Ant. What happens when somebody uses Latin-1 smart-quotes in an Ant buildfile? (I've seen this exactly as many times as I've seen tabs in Makefiles.)

Answer: If our BuildBot ever fails due to spaces in the Makefile, we can all see right away who did it, and ask him to fix his editor. Your editor is screwed up, so fix it. We could do exactly the same with Ant and smart-quotes.

Ant and Make are equal on bad characters. Make is far easier to read and write, and more powerful. End of story, as far as I'm concerned.

XML Acid test by joke_dst · 2008-02-18 11:52 · Score: 1

You make a good point. I know I've tried (and failed) to make a "good enough" XML parser in the past...

Is there anything like an Acid test for XML? Some XML document (or set of) with a bunch of pitfalls that you can test against?

Re:Is XML just SGML redux? Yup, but simpler. by refactored · 2008-02-18 12:03 · Score: 1

XML _is_ just a simplified subset of SGML

Almost the whole ruddy point.

Personally I'm certain somebody could take a Big step back and say stuff all backwards compatibility with SGML and hence XML and do all (worthwhile) things that XML does simpler and in a lot fewer bytes.

Mod up parent by goombah99 · 2008-02-18 12:13 · Score: 1

nice post.

--
Some drink at the fountain of knowledge. Others just gargle.

Re:10 Years and still waiting by einhverfr · 2008-02-18 12:19 · Score: 1

Personally I disagree with you. XML *is* good at exactly two things:

1) Object serialization format with transformation possibilities, as long as you don't mind the verbosity
2) Interfaces using the above benefits between programs.

To be fair you are basically talking about using XML as a serialization format for hypertext and then transforming to other formats, but in the end it suffers from:

1) Verbosity (like all SGML dialects) compared to something like LaTeX (which is, I believe, better at multi-format document maintenance). Verbosity is an issue because if you have a human editing it, this increases the likely error rate.
2) People think of XML as an information storage device (out to replace the RDBMS). This is just wrong.

The further you get from the two uses I outlined earlier, the worse XML does...

--

LedgerSMB: Open source Accounting/ERP

Re:10 Years and still waiting by einhverfr · 2008-02-18 12:33 · Score: 1

I agree that it depends on what you are doing. But reading/writing XML just for a 1-app format makes no sense. Nor does it matter that the data is going to be read later. In that case, I would suggest an RDBMS for many sorts of things.

XML is very good as an object serialization format when you need the ability to transform the object model into that used by another application. So XML in your application would only be a good idea if:

1) Application A was writing files for application B to process
or
2) Application A was trying to write data directly to application B's interfaces.

Beyond that, XML is worthless.

--

LedgerSMB: Open source Accounting/ERP

Use an alternative. by jd · 2008-02-18 12:40 · Score: 1

Freshmeat lists something like a dozen systems that will allow you to deliver data between two systems in a format-neutral way. Some are file libraries, some are full-blown client/server systems. There are probably ten times as many such methods out there in the world that aren't getting listed anywhere. Of those I do know about, these are not what you'd call hobbyist experiments. They're often funded by huge numbers of industral leaders, Governments, etc. Nor would I consider the use trivial - I see such software heavily used in high-end markets such as for MRIs, weather system mapping and other fairly heavy-duty scientific projects. Doesn't mean that you'd want to use NetCDF 4, PACT, the Storage Resource Broker, the Local Data Manager, OPeNDAP, TAO CORBA, Sun RPC, or any of the billion of other ways of ensuring type neutrality is preserved. But it does seem... odd... that researchers with little time and less budget would spend time and effort on developing non-XML answers unless XML simply isn't complete enough, consistant enough, and/or scalable enough.

From other comments here, I'd say the consistancy is a biggie, but I'm going to guess all three problems exist somewhere in the range of possibilities, or special-case solutions would never have got out the door.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)

Re:10 Years and still waiting by einhverfr · 2008-02-18 12:40 · Score: 1

It depends on what development environment you are in as to whether XML is efficient to parse.

In Perl (and probably C/C++) I would think that the verbosity of the format would be a limiting feature but that this would not be too bad. I wouldn't think that processor time would be saved by moving to a more terse format, but I/O time might be...

In Java, the fact that the language does not efficiently handle text strings is a major limiting factor and the verbosity only makes this worse. Hence XML and Java is one combination I would try to avoid... I would think that a custom binary serialization format in Java would be *way* faster and use maybe 20% of the memory that an XML format would use in the parsing stage.

The major advantate to XML is that it is a useful language for interchange between applications of structured data. I.e. one application can serialize its data into a form which can be transformed into an object model of the other application. However, it still trades efficiency for human readibility and the fact it is based on older standards (SGML). In other words, it is accepted as the method of choice for such interchange, is human readable, and reasonably familiar, but is inefficient.

--

LedgerSMB: Open source Accounting/ERP

Re:Java and XML, bad tastes that are worse togethe by Omnifarious · 2008-02-18 13:19 · Score: 1

And then you're left with software all kinds of weird little glitches because someone fixed a problem in a library and nobody ever bothers to upgrade to a newer version. Or somebody uses one version of a library to build a data structure or update a database and somebody else uses a different version and they get all confused about what the data really is.

Either you publish interfaces that are not based on any programming language at all and stick to those or you upgrade your libraries. Having a whole ton of different versions of various libraries wandering around your organization seems a recipe for disaster.

--
Need a Python, C++, Unix, Linux develop

Re:Java and XML, bad tastes that are worse togethe by einhverfr · 2008-02-18 13:59 · Score: 1

My concern about Java and XML have to do with the way Java internally represents text. Yes, I know it is popular with the buzzword-driven businesses and those businesses with historic ties to Sun, but it is also grossly inefficient and as you say something to be used as sparingly as possible in that environment.

I am not sure about config files, however. I think the .ini file format was actually good because it was simple and didn't raise the semantic issues that XML does. However, I have also seen issues where people use .ini files where XML would have been more appropriate (I saw someone try to do arbitrary depth menues using a .ini file).

--

LedgerSMB: Open source Accounting/ERP

Re:Java and XML, bad tastes that are worse togethe by tomhudson · 2008-02-18 14:02 · Score: 1

"So you're the guy who shits tabs in random places in source files, because you haven't figured out how to set up your editor to show you the difference."

OMG there's more than ONE of them??? I've got the same problem at work - a guy who uses windows and the MOUSE to cut-n-paste c code. NOTHING lines up.

... and then there's his php and javascript code ... a list of parameters a mile long, each on its own line, in the 140th column ...

If this keeps on, I'm going back to assembler. At least its clean-looking, and I have yet to see anyone who writes assembler f$ck up the formatting TOO badly! (And no holy brace wars ...)

And yes, I'm serious about assembler - I've been playing around with it for the first time in 15 years this weekend. For some things, its just so much easier than c.

Re:10 Years and still waiting by Maury+Markowitz · 2008-02-18 14:35 · Score: 1

Wait, let me be sure I read this correctly...

simple XSL stylesheets

Wow. No, I wasn't just imaging it.

Maury

Re:Java and XML, bad tastes that are worse togethe by syousef · 2008-02-18 15:32 · Score: 1

Yay! Nothing like the combination of XML and Java to bring out the haters

The word is critics not 'haters'. I'm guessing you're in your late teens or early 20's by your use of such pathetic slang.

I can show you plenty of crappy C/C++ code freely browsable in some open source libraries. Does that mean C++ sucks?

It sure does when it becomes the standard.

Some good uses for XML:

* Ephemeral representations of atomic, structured data; usually for transport.
* Config files. More verbose and the syntax is far better at keeping you from fat fingering a setting and blowing up your app. If you can't clearly read XML, you need glasses.

Using XML for transport is laughable and is a bad use for XML. Binary transport is much more efficient, and doesn't require the time or complexity of a modern parser. If the content is human readable, the binary will also be human readable. If not, you don't waste cycles converting back and forth just so a lazy incompetent programmer has an easier time debugging. Any good programmer doesn't have a hard time printing a binary value from any decent debugger.

Now config files. A good portion of the code I have to deal with every day (probably 20%) is in goddamn XML config and with the brilliance of Aspect Oriented programming Java style even infrastructure level code intercepts are now XML resulting in a fine mess to try to trace anything. Unlike printing a binary value, tracing through layers of XML is not easy. What's worse it ruins your type checking and config errors come up at runtime instead of compile time.

--
These posts express my own personal views, not those of my employer

freedom to innovate.... by mevets · 2008-02-18 15:53 · Score: 1

It would be great if someone came up with a way to attach a little bit of executable to the data/reference that could activate an object within an application. That way, you would merely access the data, and it would appear like an active object inside your program, or OS if appropriate.

Re:Java and XML, bad tastes that are worse togethe by syousef · 2008-02-18 16:07 · Score: 1

Thank you. It's good to occasionally run into a Java programmer that realizes just how bad it's gotten. It's all being driven by consultancies selling their own brand of programming religion....and like a cult it makes me sick how many intelligent people fall for these methodologies and frameworks hook, line, and sinker.

--
These posts express my own personal views, not those of my employer

Re:10 Years and still waiting by syousef · 2008-02-18 16:13 · Score: 1

There is nothing simple about XSLT. It is a nutty and extreme idea. Unless your HTML and XML are so incredibly simple as to render the format duality useless the style sheets start reading like gibberish.

--
These posts express my own personal views, not those of my employer

Re:Java and XML, bad tastes that are worse togethe by jrentona · 2008-02-18 18:05 · Score: 1

XML not good for unbounded streams?

Why? You must be confusing XML with DOM. In fact, XML!=DOM.

Expat works just fine for streams. While the folks at Sun/apache found an infinite number of ways to wrap Expat so that it no longer works with streams; it doesn't mean YOU have to be a lemming and just use whatever crap is most available. Simply wrap Expat using the JNI. You get the speed of C in your comfortable little pointerless java womb.

James
Beverly, MA

No, I'm that guy by patio11 · 2008-02-18 18:25 · Score: 1

I wrote a custom filter for Eclipse which inserts tabs in place of any whitespace. Except when it doesn't, because we all know variety is the spice of life. It also replaces as many characters in 1iteral strings as possible with Unicode which looks the same but is different, which will teach that lazy bastard in the next cubicle why we do not use string literals as hash keys. For the finale, it rewraps long lines so that anyone editing the file and then using Eclipse's auto-format will see every long line shifted one character or token to the right, which borks diff something fierce.

I also considered replacing all ls used in literals with 1 but even I'm not that evil.

Signed,
That Guy

P.S. Who caught the 1? Yeah, like I said, evil.

--
Help poke pirates in the eyepatch, arr.

Re:10 Years and still waiting by frn123 · 2008-02-18 18:29 · Score: 1

Save yourself a lot of trouble and use CSS @media instead..

Re:10 Years and still waiting by Anonymous Coward · 2008-02-18 20:11 · Score: 0

For emergencies. So that when it all goes wrong you fire up a text editor instead of a hex editor.

Re:10 Years and still waiting by bhaak1 · 2008-02-19 01:12 · Score: 1

Do you maintain a website? XML has been a godsend for those who want to maintain web and print output side by side. By keeping your data in an XML format,

Show me a large website that keeps its data in XML and I show you a slow website.

For large amounts of data you need a database. Although there are now databases that have an xml datatype.

you can use simple XSL stylesheets to generate multiple types of output. See e.g. O'Reilly's XSLT Cookbook for dozens of very real-world examples (it's probably in your library).

"simple XSL stylesheets" LOL

XSL is unfortunately a functional programming language done wrong.

Most XML is parsed with real programming languages and converted to some specific output format.

How would one convert XML to PDF? Obviously not with XSL-FO if you want more than some simple text (Wikipedia has a rather detailed paragraph about its drawbacks).

Moreover the implementations are so lacking that I'll take LaTeXs quirks anytime (which are not that bad at all if you don't force LaTeX to do things it just can't do).

That's just one example of how XML technology has made coding easier. Others I'm sure will point out others.

Well, I thank $god that I don't have to mess around with binary formats generated by bad programmers. It's awful enough what they do to XML.

--
UnNetHack: NetHack Improved!

Re:10 Years and still waiting by bhaak1 · 2008-02-19 01:45 · Score: 1

LaTeX is restricted to certain types of print output.

Last time I needed multi format output, LaTeX provided PDF, Postscript, DVI (the more or less "native" output of current LaTeX-compilers) and with minimal work HTML, Text, RTF and Palm-Doc.

It emphatically cannot output HTML easily.

That's just wrong.

TeX4ht does this with "htlatex file.tex".

Additionally it supports outputting DocBook and ODF.

--
UnNetHack: NetHack Improved!

Re:Java and XML, bad tastes that are worse togethe by aevans · 2008-02-19 10:06 · Score: 1

You probably like python, too.

Re:Java and XML, bad tastes that are worse togethe by aevans · 2008-02-19 10:18 · Score: 1

So Java EE has discovered hard coding? Isn't that wonderful! Because annotations are basically hard-coded variables for the code-generator (preprocessor) to run before compilation. I used to think they were cool too, until after about a week of XDoclet I saw Rod Johnson's preview of J2EE without EJB. And then he went on to invent (or at least popularize) XML programming. Now he's pushing pre-processor directives to generate XML to generate Java code to be compiled and then have bytecode injected to the compiled code. What's next? GOTO implemented as coantinuations stored in a OO-database (probably with a distrubuted associative array for caching)?

Slashdot Mirror

Tim Bray on the Birth of XML, 10 Years Later

260 comments