XML Co-Creator says XML Is Too Hard For Programmers
orangerobot writes "Tim Bray, one of the co-authors of the original XML 1.0 specification has a new entry on his website explaining why he's been feeling unsatisified lately with XML and says his last experience writing code for handling XML was 'irritating, time-consuming, and error-prone.' XML has always a divided response among the technical community. The anti-XML community has several sites stating their positions."
Sounds like visual basic programmers are complaining or something.
This is my sig. The post is over.
TROLL... Goatsex link Somebody please mod parent down.
They should only be glad not to be coding cobol, intercal or befunge!
Note to self: get smarter troll to guard door.
Well, programming *is* a hard task, and simplifying it is about building layers and layers of better abstractions to machine code and binary data.
Without XML, what would you normally do? Create a flat text file and read it using whatever syntax you'll like that day. I agree XML is ugly as hell to type in manually, but at least it's a standard, and every programming language in use today can handle it in a standard way - DOM, SAX, whatever.
First of all IDNRTA (I Did Not Read The Article).
Writing XML by hand sure is no picnic. But I don't see writing XML by hand as something we should strive to do.
XML is great for file formats. It's waaay better than binary formats. It's not as compact, but that is rarely an issue these days. Having a standard, structred, text-based, and editable-by-hand-when-necessary format is a godsend. Period.
.: Max Romantschuk
Sure it sucks, but it's a *standard* that everyone can use, and there are many libraries for it so you don't need to write your own parsing code
I mod down anyone who says "I will be modded down for this", regardless of the rest of their comment
i'm no programming guru, as a matter of fact i would classify myself as one of the 'visual basic'-programmers the guy above mentioned. (no i don't know VB). However, the XML i have come into contact wth has struck me as very easy to learn. The one thing I experienced as too complicated was the situation that arises when several XML types are mixed...
The problem with XML seems to be that the formats change too fast, and many never seem to be backwards compatible. I wouldn't mind coding for XML if I knew that an application would viable for more than a few months.
-Cnik
Well, first he chose a bad tool (Perl regexp) for XML processing, and then complains about his tools being insufficient.
Using Perl regexps to parse XML is silly, because there's too much variability (e.g. attributes in any order, elements covering multiple lines) that regexps aren't good at handling. You can do it, of course, but it quickly gets messy.
There's a number of tools and libraries (with Perl or other languages) beyond plain DOM and SAX that use proper XML parsers and are reasonably easy to use. He should use one of those, and stop complaining.
The last book on XML I read and understood was XML for Dummies.
While I do not think XML can be called 'difficult', I certainly would not want to undergo a large project based on it, especially given it's formlessness. (Yes, it can prove extremely useful for support, but I cringe on using it as a strutural base.) Languages like Java are/have become very intuitive, with a great community support, and a java developer can usally understand a java program he has never seen quickly. I have yet to see the xml come out of the dark ages, and until it decides to define exactly what it is or what it wants to be, I don't think it will.
"I only speak the truth"
Karma: null(Mostly affected by an unassigned variable)
I wonder if there is an XML standard to describe internet trolls...
-Cnik
XLM parsing (just like the TeX language) has a problem that when there are problems in the input files, the situation diverges into two different caes, one requires an infinite memory and the other infinite time to deal gracefully with errors.
None of this would have ever been needed had CS been tuaght properly. There are other concepts to describe how files are to be organized. Some of the systems date from the 1950's. BNF (which seems to work very well for programmers to describe file formats to other programmers) dates from the early 1960's. What was needed is a BNF type grammar that is machine readable.
Would XLM have ever taken off if the web used something sane and not a hacked version of a nasty text formatting system from decades ago?
XML isn't intended for web pages. That's what you missed:
It's biggest use right now is data interchange. Moving bits between one magic widget and another. And for that, HTML sucks. It just can't represent arbitrary data. Programming languages (C++, Java) are for instructions, not data.
XML fits in perfectly where it's at use-wise. Tim Bray is talking about programming for it: The available interfaces are very counter-intuitive, and that's what Bray's getting at.
When an author says his work was not well done, that should be a sure fire red-flag that perhaps the whole thing should be aborted like an unviable fetus.
You should try reading O'Reilly's upcoming "Undocumented XML Hacks". From what I can tell, it will be really insightful.
Yeah, we don't need no steekin' standard! We can design our own protocols, we can design our own configuration language, we have all the free time we need to think of those things! They're *fun* to do!
When you're writing an application and you have to decide what format messages should be written in, or what type of file configuration data should be stored in, most people say, "Why, XML, of course. That way we're guarenteed that it is extensible, transformable, and readable by anyone who would ever need to read it." Granted, there are lots of other document formats in which that is the case, but they are not industry standard. As long as there is a schema, everyone will accept it. And if it's not in the format that they would like, they are free to run it through an XSL transformation. Easy as pie.
XML is not hard, but it is a discipline. It requires a lot of reading and a fair amount of practice, but once you have it down, that's it. And from now on, your document storage design decisions (barring any space/memory constraints) are made for you.
Perhaps you should go out and buy some duct tape then? Hmm?
I've been a programmer for 22 years and XML has never interested me. I've seen the "great new thing" come and go over the years. APL, PL/1, OS/2, etc. XML looks amazingly like the old "great new things" that went onto the heap.
Note: PHP and Linux look a lot like the things that DIDN'T go on the heap. Simple to understand, easy to use, powerful. If a non-programmer can grasp it easily, it usually doesn't go on the heap.
If you aren't part of the solution, there is good money to be made prolonging the problem
On the web, a big problem is that the content of the page is mixed in with the formatting. So, this content cannot be displayed easily on a PDA, phone or even across different browsers to an extent.
By separting the content from how it is displayed makes it easier to display it in pretty much any format. By taking a single XML document you could create a page that looks great on Mozilla, great on IE, a WAP enabled phone, Opera, Microwave, Fridge - whatever!
XML is NOT a programming language. It is more like a way of describing data and one MAJOR benefit in my opinion is that it is human as well as machine readable. I can ask my 'pointy haired boss' to make an ammendment to an XML document and he will pretty much be able to read it quite easily.
It has plenty of uses such as a way of sharing data. There is no reason, for example, why a XML source could not be used in other webpages, as an input source for a database, or even as a way of getting output from your C++ program into my Java app, my ASP.NET page or even another C++ program!
XML is ASCII with tags and content between the tags...
This sounds simple, but experienced programmers (unlike that one in the topic) do not NEED to go deeper into the theory since XML programming *IS* pretty stereo-type.
Think of a XML file like a filesystem where tags are directories and content are files.. and you'll know how to get ANY job done with only a handful of functions..
But now the inventor himself flamed XML.. so what should I care about it..
Since you apparently know nothing about XML, try reading the article. You'll learn something new, and you won't have to talk out your ass on this topic.
XML's not a language -- it's a grammar, a guide of sorts, for hierarchical data storage. You design file formats that conform to XML. The goal is that it's easy to read that file format in any language or platform (given a XML processor/parser for that platform), since your data is stored in plain human-readable UTF8-encoded text.
Might as well poke fun at the rest of your idiocy -- as it happens, HTML 4 is pretty close to being XML-conformant, and the W3C's now pushing XHTML which is fully conformant.
Granted, a lot of people treat XML as another buzzword, the way that OOP once was. It's not a magic bullet -- it's just a guide to making cross-platform file formats, and it works pretty well for that.
Don't take life too serious... or you might become a terrorist. :))))
I would think that loading the whole thing into memory wouldn't be a problem for the 'to the iron' guys he mentions. The best use I can think of there are configuration files, in which case you want the whole thing anyway, and you usualy only load it once at startup.
The problem is that a vocal minority either haven't taken the time to understand how XML works, or aren't trying to use it appropriately.
I'm not saying XML as it is today is perfect, but it's broadened my skill as a web developer and has allowed me to help my company do things it was only imagining 2 years ago.
[ Reply to This ]
you know how you php/perl/python weenies make fun of "HTML Programmers?" That's how real programmers feel about you.
XML isn't intended for web pages. That's what you missed:
It's biggest use right now is data interchange. Moving bits between one magic widget and another. And for that, HTML sucks. It just can't represent arbitrary data. Programming languages (C++, Java) are for instructions, not data.
XML fits in perfectly where it's at use-wise. Tim Bray is talking about programming for it: The available interfaces are very counter-intuitive, and that's what Bray's getting at.
Sounds more like his web/flash stuff are clients to the data, and he's trying to avoid doing transformations to accomodate them ahead of time.
Naw, whiners like that don't vote. They can't be bothered to vote. They'd rather let a judge do their rearranging.
Tim Bray thinks that callback based XML apis are a bit awkward to use. He would prefer to use something like a pull parser (see for example http://www.xmlpull.org for examples in java) to the current perl xml apis.
And he would probably want to be able to parse parts of documents ("XML Fragments"), rather than whole documents.
I agree with his views (not using perl too much, though). But this is *not* the end of XML or anything. Tim just has some thoughts about how the xml api could be better in perl. Not very exciting, perhaps...
You mean BNF is for humans!?
Among other things ...
(1) They need to eliminate the doctype can of worms. Unfortunately, this cries out for an alternative solution for character entities.
(2) Namespaces need to be simplified and better integrated into the core of the language. Expanding on this, there need to be much better mechanisms for modularizing parts of the markup so that it isn't necessary to parse and hold everything in memory to make sense of it.
(3) There needs to be clean-up and standardization of element id's and references, integrating it with (1) and (2).
Do others have more? Should this be done compatibly with XML?
I believe that we really need a standard for arbitrary abstract data models, with XML as just one syntactic representation, but I would have to go into long details to justify this.
ahem...
In SOVIET RUSSIA, XML standardizes YOU!! Let's bomb the french! Anyway, XML is for loosers!
Personally, I don't want to go back. At least XML is a bit more regular.
In SOVIET RUSSIA, XML standardizes YOU!!
Let's bomb the french!
Anyway, XML is for loosers!
He's stating that he'd basically like others coders write more code the way he sees fit.
[quote]
while () {
next if (XX);
if (X|||X)
{ $divert = 'head'; }
elsif (XX)
{ &proc_jpeg($1); }
# and so on...
}
[/quote]
Repeat after me: I will never leave parsing XML up to a regexp especially if my xml may contain CDATA and Comment sections. I will never...
Unless you are 100% certain the file you are parsing is directly under your control, ie: no comments, no cdatas, params always in the same order, same indentation, same bloody encoding [pardon my french], well, you just will have to acces the data using some kind of DOM or abstract tree representation.
I don't think he thinks no one uses XML, he seems to deplore the fact that some people don't get it at all and resort to heavy duty tools for trivial tasks [thus justifying his example above].
Basically XML is quite simple, but that's not the matter, the problem is that XML bundles ACTUAL DATA, it's all about the complexity of those data, not the API used to access it [although writing a DOM implementation is a real pain]
XML was never intended to be a replacement for HTML or anything else.
XML is fundamentally very simple and easy to understand. It is only DOM and other such atrocities that make it hideous. DOM is a prime example of how to make a technology designed for simplicity and flexibility and turn it into a hideous morass.
I am not a lawyer but my sister is, so don't mess with me
No, I plan to kill him.
The documents are generally displayed as HTML on the web, but they're also read by a couple different programs for different purposes. When I first started here, it was mostly a mess of poorly hand-written HTML, but thankfully there were *only* about 20k documents at the time.
I was charged with the task of writing said programs to read these damn files. Unfortuneately, they weren't all marked up the same...
Now that we have XML and standard libraries for reading XML, it makes handling these documents a snap. Any program that needs to read them can simply have an XML parser plugged into it. The integrity of the documents themselves is maintained by the fact that they don't work if they're not properly marked up. So all these documents work, 100% all the time, and writing programs to read said documents is very simple and not prone to errors.
Yay for XML! :)
So, to sum up, XML is doing what it was meant to do, no less. Unfortuneately, it's also probably doing a bit more as well, XSL anyone? Yeck, why not just have a stand XML scripting language, why the need for the language to be valid XML itself?
Sticking feathers up your butt does not make you a chicken - Tyler Durden
arggh!!! fuck'in XML tags!! lol
<?xml version="1.0" encoding="bork">
<troll>
<sovietrussiathing>In SOVIET RUSSIA, XML standardizes YOU!!</sovietrussiathing>
<offtopic>Let's bomb the french!</offtopic>
<flamebait>Anyway, XML is for loosers!</flamebait>
</troll>
XML isn't a replacement for Java or C++. Neither is HTML. You're looking at three seperate areas there.
HTML is a page description language.
C++ and Java are data processing languages.
XML is a data description language.
You can certainly describe a page using XML, and I see no reason why you couldn't construct a programming language using XML syntax, but how on earth are you going to store data in C++ or Java?
My Journal
Why is IDNRTA an excuse? This is 100% irrelevant. He's talking about XML being hard to access in programs, not it being hard to type in.
XML is a tree structured language. Like Lisp SEXPs, only much more hyped and more of a pain to type. It's not simply ASCII with tags. If it was, then there would not be bad-nesting and stuff like that, it would be a true markup language, and I'd be able to do [b]this[i] sort [/b]of[/i] thing.
XML is NOT A MARKUP LANGUAGE. It's Lisp-reinvented-badly. Again. Sigh. Only this time, it's not other-scripting-language-becomes-lisp, it's other-data-format-becomes-lisp.
Java Programmers: Take a look at the Java Architecture for XML Binding (JAXB), available in the Java Web Services Developer Pack V 1.1 (see article here). From my basic understanding of it, it "binds" XML to a set of Java content classes, saving you the time and effort of traversing a DOM tree or dealing with SAX. I have yet to use it, but it looks perfect for my application, which uses an XML-based configuration file.
Actually, I'd be interested if anybody here has used this yet? Is it ready for prime time?
Then please tell me what language I should be using to clone Japanese people !
XML is just one of the tools in our collective toolbox. Use it where it helps you solve a problem. Don't bother if it doesn't.
... it's a convenient format to store and retrieve hierarchical information, that's all.
Writing an XML document is easy. I looked at a sample document and was able to produce xml documents without reading any books on the subject.
Parsing is another issue. Last night I spent some time parsing XML data in perl that was being retrieved from a daemon I wrote in C. producing the XML output was easy. Parsing it in perl was hard. I think maybe the author is talking about the lack of really good, easy to use libraries (abstactions) for parsing XML data. I'm a bliever that a a lot of work in the backend produces ease of use in the front end. In other words, I'd like to parse XML data with ease in just a few lines of code in the application. All the work will be done in the library. XML::Parser proves that this is just not the case.
Now, I have to say: a universal syntax for tree-structured data is very useful: experience since the 1970s with one such universal syntax, Lisp, has shown that. It is unfortunate that XML is about the worst imaginable implementation of that idea. XML combines being a nuisance to type with having comparatively complex semantics and lots of redundant features.
What is ironic is that the same "real world programmers" who wax ecstatic about XML also condemn Lisp as too complicated and too difficult to read. The universal syntax that XML aspires to, Lisp syntax delivered many decades ago. It's just that prejudice and ignorance caused people to re-invent the wheel (and in square form, too) in the form of XML.
I am pretty torn between whether XML is a blessing or a curse. We really need something like it, but XML is so bad that it may not even live up to the level of "poorly designed industry standard but better than nothing".
IMHO
Solved this problem using flat files long long ago, of couse it has to be re-invented using lots of extra data, aka "tags".
Wrote my first EDI parser in COBOL (110, 810, 850 btw, first production EDI program at boeing...1990), it wasn't that difficult. we had implementation docs that layed things out logically and you could grok the files without having to view tag hell. XML compared to X12, X12 wins.
XML isn't any better, but it's sold lots of new fancy parsing software, and it looks like html...cool
I could go on, but the waters boiling....
On the 1st of January, 2003, Bjarne Stroustrup gave an interview to the IEEE's 'Computer' magazine.
Naturally, the editors thought he would be giving a retrospective view of twelve years of object-oriented design, using the language he created.
By the end of the interview, the interviewer got more than he had bargained for and, subsequently, the editor decided to suppress its contents, 'for the good of the industry' but, as with many of these things, there was a leak.
Here is a complete transcript of what was was said, unedited, and unrehearsed, so it isn't as neat as planned interviews.
Interviewer: Well, it's been a few years since you changed the world of software design, how does it feel, looking back?
Stroustrup: Actually, I was thinking about those days, just before you arrived. Do you remember? Everyone was writing 'C' and, the trouble was, they were pretty damn good at it. Universities got pretty good at teaching it, too. They were turning out competent - I stress the word 'competent' - graduates at a phenomenal rate. That's what caused the problem.
Interviewer: Problem?
Stroustrup: Yes, problem. Remember when everyone wrote Cobol?
Interviewer: Of course, I did too
Stroustrup: Well, in the beginning, these guys were like demi-gods. Their salaries were high, and they were treated like royalty.
Interviewer: Those were the days, eh?
Stroustrup: Right. So what happened? IBM got sick of it, and invested millions in training programmers, till they were a dime a dozen.
Interviewer: That's why I got out. Salaries dropped within a year, to the point where being a journalist actually paid better.
Stroustrup: Exactly. Well, the same happened with 'C' programmers.
Interviewer: I see, but what's the point?
Stroustrup: Well, one day, when I was sitting in my office, I thought of this little scheme, which would redress the balance a little. I thought 'I wonder what would happen, if there were a language so complicated, so difficult to learn, that nobody would ever be able to swamp the market with programmers? Actually, I got some of the ideas from X10, you know, X windows. That was such a bitch of a graphics system, that it only just ran on those Sun 3/60 things. They had all the ingredients for what I wanted. A really ridiculously complex syntax, obscure functions, and pseudo-OO structure. Even now, nobody writes raw X-windows code. Motif is the only way to go if you want to retain your sanity.
Interviewer: You're kidding...?
Stroustrup: Not a bit of it. In fact, there was another problem. Unix was written in 'C', which meant that any 'C' programmer could very easily become a systems programmer. Remember what a mainframe systems programmer used to earn?
Interviewer: You bet I do, that's what I used to do.
Stroustrup: OK, so this new language had to divorce itself from Unix, by hiding all the system calls that bound the two together so nicely. This would enable guys who only knew about DOS to earn a decent living too.
Interviewer: I don't believe you said that...
Stroustrup: Well, it's been long enough, now, and I believe most people have figured out for themselves that C++ is a waste of time but, I must say, it's taken them a lot longer than I thought it would.
Interviewer: So how exactly did you do it?
Stroustrup: It was only supposed to be a joke, I never thought people would take the book seriously. Anyone with half a brain can see that object-oriented programming is counter-intuitive, illogical and inefficient.
Interviewer: What?
Stroustrup: And as for 're-useable code' - when did you ever hear of a company re-using its code?
Interviewer: Well, never, actually, but...
Stroustrup: There you are then. Mind you, a few tried, in the early days. There was this Oregon company - Mentor Graphi
The hype and promise of XML has gone too far. It's a boon for document type data. Semantic content like documentation, on-line content, even spreadsheets and email. (e.g., why isn't there a standard address book format based on XML that any application on any platform can use interchangeably?)
But using XML to build relational databases is slipping a round peg into a square hole. You need something to putty the corners.
Research shows that 67% of those who use the term "research shows", are just making shit up.
The criticism on XML is accurate, correct, valid, if only for the simple reason that the code needed to interface with the libraries is 90% plumbing-work and 10% business-solution. That 90% plumbing-work leaves oppertunity for _a lot of bugs_ to be created and for any solution using XML to become a resource-hog.
Having a standard interchange format like XML is a fun-thing, and "good", as it allows standardized processing of these formats. However, the article identifies a clear gap in the tooling and that gap needs to be addressed for XML to become a widespread success, instead of another buzzword hype.
Admitting something is too hard is too hard for programmers.
Now I'll go read the article.
Try the SSAX XML parser- has the streaminess of SAX, the objectiness of DOM.
Also neatly illustrates the essential equivalence of XML to a small subset of Lisp.
We use XML heavily in a project I'm working on at my company. Some genius decided that everything should be in xml, and that we would use XSLT for a lot of the data manipulation. Naturally we also make heavy use of DTD and SAX. Lots of XML related technologies.
I can tell you now that XML is a Bad Thing. It strives to excel at too many things at once, and becomes inefficient and complex as a result.
XML tries to eliminate the step of writing parsers for data, although writing parsers has never been a significant part of application development to begin with. Its rigidity instead forces you to waste time taking the output of the parser (a complex tree) and putting it into meaningful form. XML document tree traversal = 10000x more complex than getting column data out of a ResultSet... Unfortunately it is also a billion times slower to parse XML than it is to perform a medium compexity database query.
The real problem is that XML only partly addresses the problems that relational database solved years ago (organizing and data accessable), but it does it without any of the efficiency benefits of a well designed database server. In my opinion, 90+ percent of the places where XML is being used today would be better served by using columns in a relational database table to store object fields. You get indexing, you get universal, simple and efficient searching, and you get speed.
XML has too many faults to really list in one short post. The truth of the matter is that it tries to do too many things and DOESNT DO ANYTHING WELL. Sort of like if someone tries to be skilled in all musical instruments but ends up being, at best, mediocre in a few of them.
We did once use XDR as a middleware, but this we replaced with XML. It is 100 times bigger and 100 times slower. Use XML as an import/export format, don't try to use it as a middleware.
One good thing, it doesn't crash when both ends don't agree on the content of the data and the bytes are not aligned. Instead it just silently ignores the data. (Did I say this was a good thing?).
...and for doing generic markup in a relatively simple way, it's good.
For storing arbitrary data, and use as a message format (as in SOAP), it's not so good because it has markup-like features, such as the distinction between attributes and elements and the distinction between text and element nodes. (The latter in particular is a huge pain, I wish people would agree to only use text nodes in leaf elements.)
This is why XML parsers/generators, once they get into entities and DTDs and so on, become really a lot more complicated than they would need to be if XML just stored a tree of elements.
However, it's the standard, so we might as well just shut up and use it.
My opinions have no special importance but it *is* important to remember that XML is a markup format that is being used mostly for things other than markup.
Whence? Hence. Whither? Thither.
That would sort out a lot of the mess.
The use of "XML" in any kind of media article, press release, or from the mouth of anybody not immedieately involved in software development should be banned.
I can't help thinking that most of XML's image problems are a direct result of dot.com fiasco (itself, primarily a media induced f*** up) where anything remotely "interweb" was blown out of all understanding and proportion.
In fact, whilst we're at it, why don't we just abolish technical journalists. By definition they don't know what they're talking about, otherwise they'd be doing it.
Go on. Off you go.
Troll or broke? lol
For any sentence, substitute "tab-delimited" for "XML" and see whether what's being said still makes sense.
It might be too late to correct some things in XML.
Good about XML is, that whatever will emerge in the future,
it will always be possible to convert old documents into any
new form, using simple tools.
There is a point with critics: Unlike Latex or HTML which
can be written easily by hand, XML can become too bloated to
be authored directly by humans.
Similar problem with MathML:
Latex: $x^5+3x-9=0$
MathML:
<mrow>
<mrow>
<msup>
<mi>x</mi>
<mn>5</mn>
</msup>
<mo>+</mo>
<mrow>
<mn>3</mn>
<mo>⁢</mo>
<mi>x</mi>
</mrow>
<mo>-</mo>
<mn>9</mn>
</mrow>
<mo>=</mo>
<mn>0</mn>
</mrow>
You can write complicated formulas in Latex directly but it is
almost impossible to do so in MathML, where one has to rely
on tools to generate it (i.e. export it with Mathematica or
TeX -> MathML converters). Wouldn't it be nice if browsers
would understand a basic version of LateX? (That it is possible
has been shown with IBM's texexplorer plugin).
Let's use RegeXPath!
while(){
if(X/*/body/(?h([1-4]))X){
echo "found header $header\n";
}
}
These were the specs, gentlemen, start your hacking!
I made a xml parser in half of dozen lines in a functional language (haskell, was not hard..But nowadays there's no need for a programmer to build it's own parser? Why reinvent the wheel if there're already good library to parse the thing .. Gnome has it .. Java has it .. Why waste your time doing another parser?
I fuse with Mercer every single day...
What's going on? I think the problem started because pure XML is semantic free. It's just syntax. Semantics are added on with other layers of software. And therein lies the problem. Anybody could add these layers. And it was sort of a race. So anybody who spent a lot of time trying to design a simple intuitive api lost out to those who rush out half baked, over complicated and inconsistent apis.
When I design an api, I put in a huge amount of though to it. How do you fit it into and exploit the current abstraction model? Are the features absolutely necessary? It it intuitive, i.e. are the semantics straightforward and understandable?
I am one of the few people who've actually simplified api's on 2nd releases. If I'd been in charge of Java, Swing would have never happened. I'd have taken the AWT and simplified and fixed that up.
Fight bloatware! Put minimalists in charge of api design.
(troll
(sovietrussiathing "In SOVIET RUSSIA, XML standardizes YOU!!")
(offtopic "Let's bomb the french!")
(flamebait "Anyway, XML is for loosers!"))
Unfortunately XML alone doesn't guarantee data interchangeability between programs. And XML Schema doesn't do it either. Knowing whether or not Tag1 can be in Tag2 doesn't tell you what Tag1 or Tag2 mean or if they correspond to a data structure that you need or can use. For that you need data modeling.
For data modeling in XML I've looked at a huge number of languages: RDF, Iso step 28, and XMI were my favorites (though in my opinion XMI first starts getting interesting with ver. 2.0 which isn't even finished yet). Each has a few advantages and disadvantages. And of course there are lot more than just these. But the problem is that these are all very young standards and APIs which would make them useful are not abundant.
So maybe the author's right that XML is not yet good enough, but I think a lot of progress is being made.
I see the article's gripe as another instance of a growingly-common problem: in all common languages, complicated loop structures aren't reusable. In the article, he wants to have a library (the XML parser) provide an efficient method for iteration over the tree structure in his XML file, and he rightly notices that the language doesn't support that very well.
There are 2 basic ways to reuse a loop in languages such as Perl or Java or C. Way number one is to use callbacks: package up the loop body in a function and pass it into the library. As the author notes, this is syntactically annoying. It can also be inefficient: compilers usually can't optimize out the function call, so if the amount of work per iteration is small there can be a lot of overhead.
Way number two is to use iterator-like syntax (a la Java iterators): provide a function which returns you the next object in line and then write a simple for-style loop. This is syntactically somewhat less annoying, but still subjects you to some overhead.
The closest I've seen to a solution to this problem is compile-time computation such as templates in C++ or macros in LISP. These have not been particularly popular for people to use (probably because they're hard to use), and they're not available in many common languages. Does anyone know any better answers?
Why is there no standard XML DTD to express DTDs?
alternate rendering of question: I understand that XML was trying to keep as close as possible to SGML but ... if either language is a good choice for representing structured data, and a DTD is itself structured data, why is XML not a good choice for representing DTDs?
"There are four boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order." Ed Howdershelt
One of the biggest problems with XML is that it's information model is quite complex; far more complex than what the average problem needs. When looking for an alternative, you should look for solutions which have addressed this fundamental problem.
Those who like Python should look at YAML
What was his argument again? Reading the whole thing into memory is too slow? Ok, agreed, hence SAX. When you're a perl programmer everthing is a regular expression. Look Perl was the first language I learned. I'm all for perl it's wonderful, poetic and fun. And it handles XML perfectly. Are you telling me that using relational databases is easier than XML? That you can just sit down and start doing it without reading some books or at least a couple online tutorials? That's nonsense. The benefits of XML outweigh it's shortcomings IMHO. Especially Schema validation. I love knowing the fact that I don't have to rewrite the same goddamn code to make sure my input is sane! I make a schema for it and voila. Yes the schema spec is big. But have you read the full SQL spec? Of course not. You use a nice little subset and get your work done. Same with the schema spec. I use about 4 tags for 90% of the documents I need to create. So let's summarize XML in a couple rules (there is one caveat, see below): 1. Every element is in between angle brackets 2. Close every tag you open in the reverse order (like a stack but this is far too complicated a subject for people programming, there are NO stacks in computers....right). Does anyone force you to use XML? Of course not. That's a weak argument but it's true. XML gives you the choice to not reinvent a structured data format. I'm not a programming guru by anyone's hallucination. I've been working with XML for a while now (3 years) and it's been terrific. Yes you have to learn some stuff and yes some of the API's are a bit terse but show me something that isn't. What I've come to realize is that if you want to move forward you do have to change. Programmers bitch and whine about how end users don't want to change their UI. Well this sounds like programmars that don't want to move their brains a little and stop seeing things as regular expressions and start seeing them as XML. Stop trying to reinvent the wheel everytime you need to parse a document and move up an abstraction. And it strikes me as odd that one of the cocreators doesn't seem to "get it". The whole point of making a standardized format is so that you can abstract the parsing, transformation and validation functionality. Just my 2 cents CAD. Andrew
The largest problem I've seen with XML is that the content people who often create it do not have the technical knowledge to properly do so.
It's even worse when they've had a little bit of training and try to mark up the data themselves, then not validate the document.
But it's still a young. Easy to use tools are being developed that will let technical people be technical and content people be content people.
I'm in shear horror at the number of you that "have never tried XML", or think it seems "too confusing".
Go pack to your legacy languages and methods. XML solves hundreds of problems for my company on a daily basis. It is the well from which the hope of web services springs.
Let's not be scared of something just because we don't understand it. I teach a class to complete newbies every week. It's 4 hours, and no one has left it without understanding XML. Buy a book and quit your fucking whining!
The problems with XML are in areas of the standard that are complex *and* rarely used, as with every software system. Problems start with the correct handling of entity references, and the correct implementation of xml-schema has not yet been achieved by any implementation I have tested recently. Even worse, it is almost impossible to write an xsd for a complex case that will validate correctly on a second xml processor, even if it works perfectly with a the chosen first processor.
And I think there will not be a conforming svg or smil browser in the next ten years because these specs are too complex to be understood by different programmers in the same way.
XML is great, but some guy at w3c went way too far too fast, making standards that are too complex to be properly understood by mere mortals who need to think about more than just XML.
p.
Without order, nothing can exist. Without chaos, nothing can be created.
If I understand it correctly, the author is lamenting that neither of the standard ways of parsing XML in a scripting language fit the straightforward model of scanning for something relevant and then acting upon it, where the two models are: 1) read in whole file and make a tree (take sup too much memory, is slow, etc.); or 2) use a callback interface.
The style of perl script he was seeking was a simple loop model: /ignorable/; ... } ... }
...
while () {
next if
if (/thing-one/) {
elsif (/thing-two/) {
}
To me the thing that distinguishes this the most from the provided XML parsing interfaces is that it has a minimal amount of state.
So isn't what is needed a corresponding structure to the while () above that iterates over the tree-nodes of the XML-encoded data structure, in a depth-first preorder traversal (to avoid having to build the whole tree first)? One could imagine a parser object that scans through the XML file returning nodes (and their parent history) while maintaining an absolute minimum of state. If one wanted to build an in-memory representation of a subtree given a node, then one can always do so when one finds the node one wants.
Such an interface wouldn't be good for integrity verification or the like, but for the sort of application the author was talking about, it would seem ideal. Much less flexible than the normal models, sure, but much easier to work with when the problem fits this sort of description. Perhaps I'm underestimating the difficulty of the task, but it doesn't sound too hard to write, given that it is doing so much less than the fully-featured XML parsing interfaces.
The other problem is the awkwardness of the use of XML in O-O languages such as addressed in the article linked-to by Tim Bray in his article. Though I haven't used this particular program, this seems to be the problem that FleXML is trying to address. When you don't need all of the flexibility that XML can provide, but instead have a fixed schema that your XML-representation follows, why not have your parser automatically built to read it? People have used lex/flex for scanning text files for decades --- in these days of XML Schema, it should be even easier. If FleXML lives up to its promise, it will be. Has anyone here used FleXML and are willing to comment on how well it addresses these sorts of problems?
I've been working on EDI applications for many years now. I view XML as another attempt to solve the same problem as the ANSI X12 standards. The problem is, 'that problem' was never *the* problem.
In the old days (in my industry), there was a COBOL oriented file structure called the National Standard Format (NSF). It was typically documented as a set of maybe 10-20 hierarchical record formats. The mechanics for reading the files were immediately obvious. The problem was understanding what needed to be done with the data. Of course, there was often a need for a new data element and it got shoved into some filler field, resulting in the National Standard Format becoming the Nearly Similar Format.
To resolve this issue, the industry jumped on the ANSI X12 bandwagon. ANSI X12, like XML offered a flexible, platform-independent standard for representing hierarchical data structures.
Platform-independent means that it's equally difficult to use on all platforms. The 10 pages or so of NSF COBOL record layouts were replaced by a couple of binders worth of standards. One for X12 and one containing the various industry-specific transaction sets. Expensive tools emerged to read the new files and cram them back into the familiar and more workable structures.
'Flexible standards' turned out to be an oxymoron. There are so many options that it is extremely difficult to anticipate what sort of odd interpretations you'll be forced to deal with. And deal with them we must, because the Feds have mandated the way in which we must exchange data (HIPAA).
And still we find ourselves needing extra pieces of data for specific trading partners that we put into places that are beyond the standard.
I'd rather use XML than ANSI X12, but I'd rather not use either. They add much complexity and infernal flexibility in order to 'solve' what used a trivial task - agreeing on a data format.
If we want something truely useful, we'd forget about markup languages and specify an open database format similar to Access that actually has value beyond the narrow problem being addressed.
Programmers would still puzzle over the meaning of an XMLized resolv.conf file. The flat file is so intuitive in this case. XML may be the replacement for "the comma delimited flat file" (btw, that's :tab delimited flat file" for you Windows freaks), but that doens't mean it not overkill. Sometimes a byte is a byte and sometimes a line of information sepearated by a field seperator is all you need.
The hash that is returned could contain all the information that can be determined from the XML doc (and maybe the DTD as well), such as type, etc.
I don't know what's going on in Perl 6, but it seems like Perl needs some kind of built-in way of running through an xml file by tags, in a way similar to the standard line by line file reading operator. Rather than grabbing a single line at a time, or having to slurp in the whole file before whacking it up, you should be able to pass a regex to the input operator so that it will stop when it gets to the end of a chunk of text defined by an end tag.
Obviously, there are ways of getting around this by using a line-by-line approach, but I'm pretty sure that if such a thing existed and was easy to implement, it would get used a lot and would make Perl far more xml friendly.
Congratulations! Now we are the Evil Empire
I managed to get XML handling working fine using libxml++ for my project. It was easy, quick and painless, and that was with using source code examples aswell!
For it to have been any easier, it would have most likely required magic!
The problem with slashdot is that most of its users were bullied and stuffed into lockers as kids!
From the article:
The O-O factory, now chiefly represented by Java and C#, where the Big Company Programmers building Big Systems on Big Iron live.
So someone is actually using C#? In a big company, building big systems? And most surprisingly, on big iron?!?
Save your wrists today - switch to Dvorak
This is a serious question. What is there to understand about XML that cannot be explained in about half a sheet of A4?
XML got one thing right over unadorned S-expressions - document packaging, specifically versioning and character-set labeling. XML inherited this from SGML, and it's one of the few things it took from there that was actually worth keeping.
For a good laugh, read the Origin and Goals section of the XML spec. Of the ten goals for XML listed there:
XML shall be straightforwardly usable over the Internet.
XML shall support a wide variety of applications.
XML shall be compatible with SGML.
It shall be easy to write programs which process XML documents.
The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
XML documents should be human-legible and reasonably clear.
The XML design should be prepared quickly.
The design of XML shall be formal and concise.
XML documents shall be easy to create.
Terseness in XML markup is of minimal importance.
I'd say two of them were met, but were bad ideas (SGML compatibility, terseness unimportant), and five of them were completely missed (ease of use, human legibility, quickly designed, formal and concise, ease of creation).
Thirty per cent is a failing grade, folks...
To a Lisp hacker, XML is S-expressions in drag.
"Computer - Holodeck program number 5"
(7 of 9 nekkid, he he he...)
From excellent karma to terible karma with a single +5 funny post...
Very true. The article mentioned a similar problem with SOAP, using XML to encode parameters for RPC calls and transmitting them using HTTP. Utter tripe, but it'll probably become accepted by force since Microsoft uses it in .Net :\
Part of it is also the fact that XML's strengths are in hierarchical data. If I write something that works with tabular data, you know what I'm gonna use? CSV. Simple and works, and if it ain't broke, don't fix it. *grin*
(My biggest beef with XML actually came once I tried to write my own processor -- writing a fully compliant parser prevents cheap stupid recursive descent parsing; you have to use LR or LALR. And that's after you've put together the required code to handle UTF-8 and the other myriad encodings... I'm tempted to write a program that takes as input a XML schema and target encoding, and outputs a table implementing a NFA/DFA that blindly parses a data file and just chokes fatally on any errors.)
Ha ha, didn't see the redirect.php? Fag.
I agree with this, to an extent. If you don't like/need all the fluff, don't use it. XML is only as complicated and inefficient as you want it to be.
XML tries to eliminate the step of writing parsers for data, although writing parsers has never been a significant part of application development to begin with.
It's not just about writing parsers for a single program. What happens when you have several programs that read the same type of file? What if said file-type is somewhat complex. XML keeps things simpler and easier for these cases.
Its rigidity instead forces you to waste time taking the output of the parser (a complex tree) and putting it into meaningful form.
What on earth are you talking about? YOU define the format of your XML data. If it doesn't need to be complicated, don't complicate it!
XML document tree traversal = 10000x more complex than getting column data out of a ResultSet...
Again, what? Keep the XML simple, and it will be just as easy.
Unfortunately it is also a billion times slower to parse XML than it is to perform a medium compexity database query.
Then XML isn't the proper solution for your problem. Just because some dipshit tries to force XML to do things it isn't optimized for doesn't make XML any less useful.
*snip* the rest of your comments comparing XML to relational databases.
XML files are not high performance databases... Use the right tool for the job, and you will be much happier.
It sounds to me like XML isn't your problem. Your problem is the "genius" at your company that needs to be beat over the head with a clue stick. If I were you, I'd be sure to beat him hard.
Sticking feathers up your butt does not make you a chicken - Tyler Durden
OK, I don't find XML a challenge, but there is really a sharp learning curve in trying to describe even a simple Web service in WSDL.
Of course, if you write in in C#, it will make the WSDL for you, but writing WSDL descriptions of "legacy" Web services is quite painful.
This is why Physics Majors will always make fun of computer science majors. Heaven forbid, any of you think.
Well.. maybe. Or Maybe not. But Definitely not sort of.
So to be clear, XML is here to stay. (An example of XML penetration: there is a working schema for using XML in the farming industry!) Just imagine the chaos that will insue once MS Office saves all documents in true XML.
My take on the problem Tim's really talking about: inconsistency and the proliferation of people who want to be the next prodigy in their area of expertise. There are so many parsers and interfaces, even within a language domain, because vendors want to put their own spin on everything. The alphabet soup that results confuses the hell out of people. This has even happened in the open source world, where I can do a Google search on "php xml parsing" and read articles on no less than 10 different approaches. For the average guy who has been told by a project manager, "We need to take these XML files from our business partner, extract and store the data in our database," you need a standard approach. Not to stifle thought and innovation, yes, you should take the initiative to understand whether an event-driven approach (SAX parser) or an in-memory object model approach (DOM parser) is right for the job. After all, you do get paid to do this, so earn your keep! But the XML community hasn't done a good job of specifying best practices and leading people by the nose to a solution. Every XML book I've seen furthers the confusion, with each other offering his opinion with a slight variation of how to do things, leading programmers/scripters/whatevers to use the approach they most recently read about, and not necessarily the one that time has proven out to be the most efficient.
Part of this is the divide between the .Net guys, the Java camp, the Perl/PHP folks, etc., but in the spirit of interoperability, maybe the XML promoters just need to dumb things down a bit to get some simple concepts and best practices into the hands of Joe Sixpack Programmer. Maybe a central authority, a la java.sun.com or php.net?
XML isn't intended for web pages. That's what you missed:
Clearly it IS intended for web pages. The only future of HTML at W3C is XML-based. The only modular form of markup today that allows combination of web standards in a web page is XML.
Now I don't really agree. I've written my share of SAX parsing code ranging from simple to shockingly complex. The real problem is that as your problem becomes more complex, the state machine you build in your SAX parser is going to get more and more outrageous. Lots of booleans, or integers, or other miscellaneous state flags sitting around. It tends to make code that's unreadable by anybody except the author, and even then, I don't know if I could sit down and read some of the massive SAX-based Java-from-XML code generators I built a few years back without serious headaches. No, please don't tell me I'm a bad programmer, I am not interested in hearing unfounded criticism. My code is well structured and well documented, as much so as it could be, given that the structure of an event oriented parser is just plain convoluted.
Obviously if you are doing something really simple, none of this likely matters. And if you are doing something non-time or non-resource critical, you can generally get away with using DOM/tree-based parsing.
But it would be nice to have an alternative syntax that describes what you are ACTUALLY looking for when you are parsing a document. Something more readable than a big ole' event/callback state-machine mess. Alternative syntax (and semantics) for stream-based XML parsing. And that's what this guy is proposing, though his proposal is a bit strange, since it sort of looks like an oversimplified version of an event-callback parser, but maybe I just need to see a more complete concept or prototype of it than that one example. As for me, what I'd like to see is some "state-machine"ish way of describing what you are actually doing in the event parsing, that is compact and hopefully readable in a logical, linear way, so you don't have chunks of code in different methods all over the place flipping little state flags manually. But perhaps in the end any system ends up reducing to a variant of the existing event/callback parsing model, and you just can't gain any syntactic simplicity without major loss of expressivity. I just haven't thought about it enough.
Really.
There's *still* nothing out there that can take my structs', parse them out to XML, then load them back again when needed, seamlessly.
The embedded sphere - where XML is *USEFUL*, and where *C* is *ALSO USEFUL* - has no chance with XML right now.
It's either libexpat and a monster callback module, or bust.
; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --
Wrong. This is precisely what XML was intended for. Go and read the Spec.
Where we went wrong was in using XML for spreadsheet/database-style rectangular data, for which it was never designed, and for which is it grotesquely unsuited.
As a twist on this, I know people who use XML to describe the syntax of configuration text files that are mostly just full of
specifications. The text files themselves are left as short, easy to edit by humans, but the computer learns the syntax from the XML.What would be nice is an emacs mode for automatically shifting between "simple text file mode" and "fully packed in XML air bubbles mode". The former might have fancy highlighting, electric indentation, etc. based on the underlying XML. The latter could show you all the gorey detail, such as dates split up into microscopic elements that can be checked exhaustively in the XML Way for validity.
"Provided by the management for your protection."
There arn't any.
XML is bad like Democracy is bad. It's just better than the alternatives.
.xls. Without ever looking at our system's BOM files before I wrote a program that read the .xls and built a proper XML BOM file our system could read. If our system wasn't using XML, who knows how long it would have taken me to figure out the intricacies of a proprietary file format.
I had a problem at work when we switched from AutoCAD to Solidworks. Our manufacturing software couldn't read the new BOM files, which were Excel's
OddManIn: A Game of guns and game theory.
Early in the history of AI, there was a lot of argument about procedural versus declarative knowledge representation - whether it was better/more powerful to represent knowledge as code or data structures. The consensus they finally came to was that it really doesn't matter - any sufficiently complex declarative knowledge representation becomes something you can embed procedures in, and procedural systems need to structure their code (or else you can't reason about it) so much that it starts to look declarative.
The Lisp 'code is data' philosophy is just the acceptance of this consensus.
To a Lisp hacker, XML is S-expressions in drag.
Your document is not well-formed!
<?xml version="1.0" encoding="bork">
should be:
<?xml version="1.0" encoding="bork"?>
RTFA = Read The Fucking Article.
....here's a document:
<foo>
<bar>
baz
</bar>
</foo>
Here's the XPath expression to get all "bar" nodes:
Nice and concise.
Over on the PMD project we're replacing many of our Java rules (find empty catch blocks, empty if statements, etc) with XPath expressions. For example, here's the XPath expression that finds empty if statements:
Sweet, eh? Props to Dan Sheppard who came up with this excellent technique.
Tom
The Army reading list
Before XML there was (and still is) RFC822 which describes how headers are formatted in e-mail, HTTP and a slew of other protocols.
I've been down the route where I tried to use XML where something as simple as "key: value" would do, and before I knew it, my program became a bloat relying on third-party XML libs, the config files were only marginally human-readable and a lot of time was wasted thinking about virtues of DOM vs SAX. In the end I learned that using XML for sake of XML isn't worth it.
I think XML is OK if used appropriately - for example I think XML is perfect for something like storing word processing documents. But the idea that every config file and every bit of network traffic should be XML is stupid IMHO.
grisha.org
((//PROGRAMMER[@KnowsXML='true']/@Skillz) >
(//PROGRAMMER[@KnowsXML='false']/@Skillz))
Now what's so hard about that?
describing datafiles with a determined structure. E.g. fixed length files, delimited files. I use XML in this way to configure my DB import/export program.
I like using XML for certain things. I just find that it is really only useful for tasks that don't need to be fast and/or small. Working with XML can be annoying if you try to squeeze it into something it isn't ideal for - like trying to race an elephant against a cheetah.
There are some odd things afoot now, in the Villa Straylight.
I have to agree, most XML APIs are incredible counterintuitive. The only one I like so far is Ruby's REXML (based in Electric XML for Java). If you like Ruby, and you're drowning in useless SAX/DOM code, give REXML a try.
Prescriptive grammar:linguistics
We use XML extensively as file formats; we have good DTD, and almost all the features of XML. I don't know what the big deal is; I could teach it to any experienced Computer Scientist in a few hours.
Maybe the folks who think XML is "too hard" aren't hiring well! You know what you get when you simplify a programming language because your hires are too stupid to understand C++? You get Java, a crippled language that's all hype and no substance.
Best Buy can have you arrested
But wasn't the entire point of XML for data exchange. You use XSLT to transform incoming data into the format your software wants, your software doesn't NEED to be able to read an XML format, but it's alot easier to knock off an XSLT file to transform data coming in to work with your app, than coding your app to handle more than one file type.
You create another XSLT for outbound data to transform your proprietary format to XML so it can be consumed by another application, company, etc.
XML isn't made to be used as the be all, end all of file formats, it's made to be a simple, yet robust, generic format for transporting data between disparate systems running on any OS, in any programming language.
The other advantage is XML is self describing, I can glance at an XML file and see what all the data is and write an XSLT to get what I need out of the XML for my application alot easier than glancing at a flat text file for the same information.
And considering there is an XML implementation for nearly every language out there that can be had for free why are people bothering to write there own parsers? What a waste of time.
Gibble: Descriptive of an emotional state in which one's mind is scrabbling for some purchase on reality
How many projects have you worked on when the interfaces change as frequently as the business requirements?
the plumbing may be hard and buggy, but it's easy to test (it's easy to produce manual input and output tests and use diff), and normally only has to be done once.
thank God the internet isn't a human right.
> Without XML, what would you normally do?
How about Apple's plist format.
He is letting Perl get in the way of writing clear code. Serveral Python packages are available for processing the stream as it arrives.
The following example from an article by Uche Ogbuji, "Simple XML Processing With elementtree" [1], shows something akin to his perl examples coded using a Python iterator approach. Note that the examples has no regular expressions for recognizing XML syntax. That layer is abstracted out of the object processing. The articles is worth a read if you want to see how easy XML programming can be.
import sys
from elementtree.ElementTree import ElementTree
root = ElementTree(file=sys.argv[1])
#Create an iterator
iter = root.getiterator()
#Iterate
for element in iter:
#First the element tag name
print "Element:", element.tag
#Next the attributes (available on the instance itself using
#the Python dictionary protocol
if element.keys():
print "\tAttributes:"
for name, value in element.items():
print "\t\tName: '%s', Value: '%s'"%(name, value)
#Next the child elements and text
print "\tChildren:"
#Text that precedes all child elements (may be None)
if element.text:
text = element.text
text = len(text) > 40 and text[:40] + "..." or text
print "\t\tText:", repr(text)
if element.getchildren():
#Can also use: "for child in element.getchildren():"
for child in element:
#Child element tag name
print "\t\tElement", child.tag
#The "tail" on each child element consists of the text
#that comes after it in the parent element content, but
#before its next sibling.
if child.tail:
text = child.tail
text = len(text) > 40 and text[:40] + "..." or text
print "\t\tText:", repr(text)
[1]http://www.xml.com/pub/a/2003/02/ 12/py-xml.html
Programmer A: "I have a neat, simple idea. You see, if we do [A] then things become simple."
Programmer B: "Cool! That makes [B] and [C] much easier! Thanks!"
Programmer A: "You're welcome!"
Programmer B: "So, it may as well make [B] and [C] part of your standard, too, since they are also great ideas."
Programmer A: "Well, erm..."
Programmer C: "Yeah, and since [D], [E], and [F] are also possible, we should incorporate that as well. You DO want to be a team player, don't you?"
Programmer A: "Yeah, but, well..."
Programmer B: "Yes, and to ignore [B][C][D][E][F] is just plain ignorant."
Programmer A: sigh
In this case A = XML, BCDEF = XSL, XSLT, XPath, XHTML, XSD, XML Schema, yadda yadda yadda
Of course not. It is a markup language, hence its name.
I've only just thought of this so before that "oh no, we can't do that 'coz" moment hits me, why do I have to close with ?
Surely if XML dictates that the tags must be balanced, then why can't I close anything with </> ?
That would at least reduce the size of XML files a little (or a lot in some cases).
A simple question - just begging for someone to point out the bleddin' obvious flaw... if there is one!
A lot of people do find XML quite scary, which is why a module was created that can be compiled into mod_php which has a non-threatening interface. It's quick, Open Source, and free to use for any purpose. A quick example of opening a config file, changing a value, and saving it again:
m ldoc, "server.httpd.domain", "localhost");
;-)
$xmldoc = xml_load("myconfig.xml");
xml_setelementvalue($x
xml_output($xmldoc, "myconfig.xml");
There, that's not too scary is it?
Phillip.
Property for sale in Nice, France
I've never tried to use XML in a programs, but from a user's standpoint, I hate it.
I've seen many programs start using XML config files. I often have need to automate things, which sometimes involves writing scripts that alter a config file. If the config file is of the VAR=Value type, this is pretty easy to do, but if it's XML, you can either try to fudge it, or have to parse the &#$# thing, which makes the task more complex than it needs to be.
I've also seen a movement to replace EDI with XML. Now EDI standards already do everything that XML is supposed to do, but EDI is compact, and relatively easy to debug. XML explodes the size of these files from a few kilobytes to several meg!
By reading this sig, you agree to the terms of my sig license.
See also StAX, a stack-based API for XML.
StAX extends the SAX api providing support for the delegation of XML sub-trees to sub-tree specific handlers. It's great at pulling out useful bits from long XML streams, and is used extensively in bioinformatics applications.
Sounds to me like what you really need to do is replace fgets with fgetxml. fgets stops at the end of line. fgetxml would stop at the end of the next tag instead (i.e. stop at ">" instead of LF.) End-of-line has no meaning in an XML file, so why process it with line-oriented I/O?
Or has this already been tried?
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
1) DO use XML for configuration files for your configuration files if your configuration options aren't easily encompassed by a simple name,value pair model.r exml_sta ble/
2) DO use XML for data interchange wherever the producer is not necessarily the consumer.
3) DON'T try to perform "calculations" on large XML data sets. (this seems to be the pitfall that Tim Bray is falling into).
3) DO convert large datasets to a relational structure if that is a more natural form for manipulating them. (SQL is a QUERY language, XML is not).
4) DON'T dismiss the benefits DTD/Schema validation too readily.
5) DON'T assume that somebody editing XML data has to understand or even be aware of the XML model/underpinnings. Give them a *view* of the data appropriate to the task.
6) DON'T be complacent about the SAX vs. DOM (and never the twain shall meet) dilemma. Check out Ruby's REXML and Perl's XML::Twig and be happy.
http://www.germane-software.com/software/
http://www.xmltwig.com/xmltwig/
7) DON'T misunderstand XSL. If you don't understand why it's a declarative language, don't try to use it for arbitrary information manipulation.
8) DON'T dismiss too readily the value of named closing tags for validation/editing sanity(you know who you are, you s-exp people).
> However, the article identifies a clear gap in the tooling and that gap needs to be addressed for XML to become a widespread success, instead of another buzzword hype.
It takes more than a set of good tools for a technology to become 'a widespread success'. A clear justification why XML is better than existing standard marshalling techniques would be a good starting point. ASN.1 DER, simple container LSB serialization and others.
I'm probably beating the dead horse here but XML has at least two properties rendering it useless for any performance-aware application:
(a) unlike, say, TLV it does not allow effeciently skipping parts of the data you dont need or aware of. I.e. in order to skip the section, you need to read and parse it first.
(b) XML's is a lazy man ASN.1 DER. It's all there in much more compact and elegant form. The only 'drawback' in the eyes of XML crowd is that it's binary. Sure, everyone knows that encoding numbers as strings is a definite way to improve upon the performance and scalability of everything from network protocols (SOAP, BXXP, UPNP) to a basic document processing. Right on.
The bottom line is that XML has probably reached its acceptance limits. Whoever accepted XML for granted or stuck with it or is not willing to learn about alternatives will keep on whining about tools being sucky. That's life, but OTOH it's only the small part of it.
3.243F6A8885A308D313
NT
XML Technology has issues, but, more importantly, many of them are caused by the application community themselves, for the following reasons:
1) XML is designed for large organizations. It is not really designed for small businesses.
Large business generate MORE data. More data and large amounts of it, and its meaning is something that scales well with XML. However, XML doesn't scale well to the little.
Small businesses usually cannot afford the people, or the time required to meta data EVERYTHING they do.
It is also, probably impractical.
2) XML because of its attempt to make data portable, requires a great deal of work to maintain. Companies have to build "standards" dictionaries in thier organizations, to organize the definition of data, and someone is charged with making sure reuse of that dictionary is put to good use.
Otherwise your XML definitions becomes a junk pile, sort of like a Lotus Notes database of unimaginable complexity and abysmal organization.
Also, because of the complexity of the problem: Meta Data representation of your data, and the as it was hoped for STANDARDS, could be used to organize, industry wide these dictionaries.
Primarily so you don't have to make your own.
This has become a pipe dream however, because XML usually attempts to define the DIFFERENCES between two organizations data, and as a result a industry specific, multiple standard, that everyone could use, becomes a standard that everyone modifies to suit there own needs in thier respective industries. (i.e. You had tags for energy industry, dairy, etc.)
Which is what should happen. Right?
Technology should allow you to make your business process UNIQUE, not commoditize it.
But in short, industry lead XML dictionary standards have not been as widely applicable to business problems as many thought it would be.
Thats OK, though, really, as even if your business partner doesn't use the same tag, or the same tag to represent equivalent information at least you can build database relations to do the conversion for you. With very little writing of code. (See my message below)
3) Most companies fall short, even when they build XML applications, that they make the mistake of not building a general purpose object framework to support thier XML dictionary.
Oh, many companies I talk with THINK they are, but they really are not building a general purpose object framework. This creates the typical problems associated with building large amounts of code whenever they have to implement thier XML data dictionaries in applications.
XML requirements/specifications are particularly harsh on organizations that have software "coders" that do not understand the following:
a) Object Inheritance (i.e. it is understood...but POORLY so.)
b) Poor understanding of Functional Decomposition of objects. (i.e. Programmers have a poor grasp of how to combine objects.)
c) Poor understanding or NONE AT ALL of Pattern construction in software design. (i.e. Factory patterns, Singleton patterns, iterator patterns)
Without COMPLETE understanding of the above, an XML framework will quickly degrade into a "junk framework" of static functions and structured design principles, decreasing reuse to almost ZERO.
There is something about XML that causes enourmous coding issues if you attempt to solve business software problems with structured design techniques. That something is due to the data. Since the data is abstract to begin with, people writing software attempt to describe the data as descrete in functional method definitions. For example, writing a method to translate a specific tag. That is the wrong way to approach it, and normally you should be using PATTERNS, more sepcifically, Factory Patterns to create/define XML tag actions and values.
Otherwise, you spend WAY TOO MUCH TIME writing code.
Structured design is too feature poor to handle an XML framework of any sort of usefullness.
Got Geometrodynamics? Awe, too hard to figure out? Too bad.
Check out the XML Digester from the Apache Jakarta project, it makes parsing xml and populating data objects from xml very easy. The problem is just as someone else stated - people are using sledgehammers (SAX and DOM parsers) to swat at flies. Perhaps the Digester should be ported to Perl?
David Charboneau
Let's decompose the XML parsing "problem" (if one actually exists) into smaller components that we can reasonably discuss. XML parsing is too broad a topic to intelligently discuss, but if you limit it to XML parsing in Java you suddenly have a topic small enough to be manageable. So let's discuss Java parsing in XML.
When XML was first introduced, there were no standard libraries in the JDK to facilitate parsing. What's more, the few projects out there varied wildly in how you actually used their DOM tree or SAX callback mechanism. This isn't necessarily a Bad Thing (tm), it's the same problem every emerging technology faces: immature tools. This is basic biology - lots of competing implementations (life forms), each struggling for community (resources).
So, time goes by, and eventually a handful of implementations emerge dominant. Some dominate due to performance, and some dominate because of ease of use of the API. The victors in this game then sometimes go through a merging process of their own, where the performance victors lend technology to ease of use API victors. After a lot of merging (and flames usually), one or two projects emerge out of the XML kingdom as the dominant players. In my opinion, in the world of Java these are Xalan (Xerces) and Dom4J.
During the maturation process, Sun comes along and looks at the technology and says "Wow this XML stuff is really here to stay. What implementations are out there, and what similarities exist between them? How can we facilitate growth of these projects?" They realize that certain classes (like org.xml.sax.InputSource) are common entities in both projects (even if the class InputSource doesn't exist), and they standardize it. For a reference to all of the XML standards implemented in the JDK, do a search on java.sun.com for JAXP, JAXM, and JAXB (just to name a few).
At this point, the XML projects come back and work in support so that they can be "JAXP compatible" (again this is part of the biological process of evolution). This insures that the projects works well with whatever Sun ships in the JDK.
In the end (which is really where we are now) you end up with a pluggable architecture, where the JDK provides some common functionality or interfaces that are implemented by open source projects.
Java XML parsing was damn hard back in the day - you had to marry your code to a specific project. But these days with the standardization that has taken place (thanks Sun!), as long as you write code that makes use of the JAXP specification you can plug in any JAXP-compliant parser into your app and things *should* work.
The difficult problem is getting other entities (Application Servers for example) to get up-to-date with the standards. WebLogic 6.1 comes with a non-JAXP compliant parser, and thus doesn't work with the latest JDK, Xalan, etc.
Do it for da shorties
The problem in not XML as such, but programming parsers is hard, really hard. It one of the most difficult programming tasks in computer science, more difficult that Graphics, Compression even Crypto. So all the effort of the parser developer goes into getting the S/W right and not making the API's simple to use.
Though I'd say you're far too nice regarding goal 7...
..
..
If I were designing a better XML, here are the things I'd try:
* Dump attributes. The semantic difference between text/data and attribute/metadata makes some sense in SGML, but is hopelessly bogus for XML. Make everything elements.
* Replace closing labeled tags with a generic "close-element" tag like </>. This should get you back the terseness you give up by making attributes into elements.
This would turn:
<foo bar="baz"><mumble>grumble</mumble></foo>.
into
<foo><bar>baz</><mumble>grumble</></>.
which is close enough for my taste to:
(foo (bar "baz") (mumble "grumble"))
To a Lisp hacker, XML is S-expressions in drag.
And that about sums it up. We use XML only when we're integrating with some external system that talks via XML. Internally, we always put data into a database rather than an XML document.
Entropy sucks.
It might be a good idea to create a tool that takes a DTD and a binding between the XML structure and the in-programme data (such as structures, arrays, objects) and creates the necessary parser and interface.
Claus
This is the whole point. If you are trying to address the standard, you are dealing with a very complex set of details.
See www.yaml.org. YAML is an project that evolved from SML-DEV. SML-DEV attempted to define a subset of XML that would be both useful and simple enough to avoid XML's biggest headaches.
After much wrangling (this was about the same time XML came up with the namespaces rules that blew up any chance for a reasonable data model for XML), the best we could come up with was Common-XML (http://www.simonstl.com/articles/cxmlspec.txt). While it does avoid some of XML's built-in boobytraps, and I'd strongly recommend any XML user to read it, it doesn't solve the inherent problem - XML is not a good match for common programming data structures, and at the same time *data* XML files are not very human readable.
It isn't XML's fault, really; XML is a great mark-up language. However, it sucks as a data serialization languege, for the above reasons. So, figuring one should use the right tool for the right job, two of us SML-DEV people (Clark and myself) decided to give up on XML compatibility and try to design a data serialization language from scratch. We immediately combined efforts with Brian, the author of Perl's Data::Denter (and Inline::C).
The result is YAML (YAML Ain't Markup Language). After almost two years of working on it, the spec has stabilized and is as good as frozen (it is in "last call" and we plan on announcing a release candidate in April), there is healthy participation in the mailing list, implementations in Perl and Ruby, and active work on additional languages.
YAML is great for data serialization, configuration files, messaging, etc. Take a peek - you might like what you see. (OK, this is a shameless plug for my open source project. That's a valid use for Karma if I've seen any...)
And with XML, you create a text file and put in whatever tags you'll like that day....
Saying anything is "hard" is not really a challenge to a language. Languages per se do not solve problems alone, they convert an algorithm into text. The grammar changes, but it's not a big deal.
The pre-packaged libraries/modules/class objects you incorporate can make complexity encapsulated. I think this is what this guy needs. But since he addresses they already exist and there is no standard answer, it's because...
XML has been made so flexible that the "standard" is a large set. Much larger than, say, a programming language grammar. Everything starts with simplicity, and then Need/Desire expand it. So, we get C -> csh -> C++ -> Java -> C# etc. as the needs for programs and platforms become re-prioritized.
XML, on the other hand, imposes a model, but then leaves a grammar open to its use. There's elegance there. However, the need to process data in bulk from many parsers must go away. I'm ignoring the Much Sadness(tm) rant about callbacks. Callbacks are simply another programming model, and I consider it elegant when used correctly.
If any particular XML file is too large for processing, even OS's learned that runtime libraries were a happy addition. Break it into multiple files, for example.
Complexity be damned. It brings out the innovation in us.
mug
>>' XML has always a divided response among
always a long way to go until submissions are proofread reliably.
One of the co-creators of XML not only doesn't like it, but
1) Doesn't know what it's for. He's still looking for a reason to use it, but concluded that since he invented it, we all might as well use it anyway
2) Doesn't understand how it is parsed. He states that writing an XML parser "isn't that hard" and gives as evidence that many people have done so. (not him, of course.
3) Realises that the best thing about XML is that it makes searching regexps easier. It's almost as if it was some kind of "markup" or something, to help you find sections in a document.
4) Is naive, not just about programming (take a look at his code samples) but about many other things (see his rants on business and truth.) His attitudes seems best described as "hopeful optimism combined with wishfull thinking and oversimplification"
How many XML books and articles have you read that start out:
Wouldn't it be great if there was a uniform data format so that we wouldn't have to do anything. We could just smile at a computer and it would tell us how to hook up with that hot chick over in marketing.
And then progresses to:
Look, it's easy. If we put these tags around words -- it's just like HTML! -- then we know what they mean. 49.99 becomes the <price> of the <book> (in currency=USD), of course!
And then we're let in on the secret:
Well, XML is actually alot more arcane and complex than HTML. It is based on SGML, which was written in the 1970s--ooh! Aren't you sick of how many HTML weenies there are now. You can't even seriously demand <rate segment="hour" currency="USD">60</rate> for webpages anymore, damn frontpage!
And finally, we're left to speculate that what if XML were really like HTML, and you had links, and tags could have meaning, and heck, maybe it will even search itself someday.
Then the second wave of books came. Telling us how to use a parser. But of course never telling us what its for and why we'd want to parse something. Wave 2.5 had a minor footnote saying "Don't use DOM on really large documents. It turns out that you actually have to use physical computer resources, like memory, when parsing XML"
And now we're in the third wave, Which started out simply enough with "look, we can replace flat files with XML and still retrieve the data!" No more properties files, or reading by line. Now, instead of escaping whitespace, you can escape parentheses, quotations, apostrophes, ampersands, question marks, exclamation points, semicolons (and colons), and I'm sure I'm missing some, but who cares... it looks just like HTML! So now, the weenies who were afraid to edit config files can now edit config files that look just like HTML!
The other, more diabolical part of this wave is the illusion that those promises about linking, searching, and so forth have reached fruition. Look, all you have to do is brute force. Its almost as easy as regexp, and you don't have to remember all those silly little regexp characters that look like binary files -- you can use XML instead!
Don't know about the solution in other languages, but Tim should give XML::Twig a try. Memory efficient tree parsing, and a joy to use if you're used to thinking in Perl. You can get over your fear of callback-based APIs by using anonymous subroutines. The only thing it doesn't do is standards - which Tim seems to discard anyway. So go get it from CPAN, and be happy.
You look beautiful! Incidentally, my favourite artist is Picasso.
Your `xmldiff' example is ludicrous. It is most definately not 5000 times slower, xmldiff is obviously doing something extroardinarily stupid. Furthermore, diff is simply reading the files simultaneously byte by byte and comparing them, xmldiff has to do much more processing because it's comparing XML, not the raw data. That's like comparing diff to a C++ compiler.
Galeon handles it's own bookmarks file in the blink of an eye.
XML does not take so much more processing time than parsing any equally complex text data.
XML isn't meant to replace RDBMs, RDBMs aren't meant to replace flat text lists, etc. Comparing different tools for doing the wrong job is just ridiculously silly.
Use the right friggin tool for the job... jeez...
Sticking feathers up your butt does not make you a chicken - Tyler Durden
YAML: http://www.yaml.org/ YAML Cookbook: http://yaml4r.sf.net/cookbook Take a look at YAML. If you've done XML work, you'll see a million great uses for it. It's very simple to learn, rather speedy to parse, and gaining implementations in the Ruby, Python and Perl communities.
XML does not have namespaces. XML has a reserved character, the colon (:) that can be used in place of an underscore in an element name. XML parsers look for this colon, and treat any tag foo:bar, as if it were a separate tag from bar alone. Or even foo. Nevermind that foobar foo_bar fooBar and FoOBar are all different tags as well. There is no namespace in XML!!!
Just GREAT. ANd now, what to do with the f****
xhtml standard which totaly f***** up web page
authoring?
Would the author post a "mea culpa" to W3C, and
ask for a withdrawal of xhtml? I guess not. And
if that stuff catches on, there goes hand-crafted
html down the toilet.
Or any of you ^It is easy to spend a few days
getting used this java library to interface
with XML" would likes to go typing
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
instead of
<hmtl>
???
-><- no
This is amazing because I found parsing newline delimited text to be "irritaing, time-consuming, and error-prone" as well!
Isn't it intriguing how the we reached the same conclusion using such profoundly different technologies? Gee, the more things change, the more things stay the same...
And if you think MS is too entrenched to worry about them going away, just take a look at IBM. They used to own the industry, now they are just a big player; no longer the trend-setter. Even if MS manages to adapt, VB is unlikely to be a stable platform that can be relied on to run your enterprise apps without continually adapting to the changes they make.
One of the things that XML (to me) seems really suited for is what a lot of hype made it out to be in its early days: a document format for web pages.
/>
.name { style: whatever; } instead of name {style: wahtever;} (i.e. stick a dot before to change from tagname to class entry) in the stylesheets, pretty much the same effect can be achieved.
Nowadays everyone claims the XML should be machine processed, and rearely required to be written by humans, but I would love to just be able to type something like:
<blogpost>
<name>mivok</name>
<date year="2003" month="3" day="18"
<subject>XML sucks</subject>
<body>
Blah Blah
</body>
</blogpost>
and then have a CSS file apply styles to each of the tags as required, and just display it.
As it is now, with a bit of XSL to convert <tagname> to <div class="tagname> and wrap the document with surrounding html tags, and using
The advantage if this is all documents are validated as a requirement of being displayed.. no more invalid html because its not possible. (Yes I know about xhtml, and its very nice - at least xhtml 2 is, but if browsers were forced to choke on invalid data, then in the process of testing web page display - people do test their pages dont they? - they would discover an error and correct it).
And of course its then easy to convert the data into any other xml based format with a couple more stylesheets.
My only problem is that I've not found a decent stylesheet parser that will just take a file, run it through one or more stylesheets, and display it, that will run over cgi, and not require any weird libraries to be installed, or XXX version of php that my isp doesnt happen to run, and allow me to say something like 'yeah, but before you transform it, just include this xml file here for a header and footer'
But then again, I havent looked amazingly hard.
This really isn't all that hard a problem. Mangle the SAX processor to build a DOM tree for each record and when you get a closing tag call a doSomething function. Put it in a library and now all you have to do is write the proper doSomething functions for your tasks.
Or you could just url encode your XML record and put one per line. It's not very elegant or standard. But it'll work and should be dirt easy to impliment.
I thought I was the only one that thought XML sucks! As everyone has pointed out, the file format is not the problem. It is the APIs to parse them that are painful.
I work in a company that was dizzy with XML love. Some misguided tech leads evangelized it as the solution to all our problems. Anyone with half a brain knows that it doesn't solve problems. XML just provides a data format.
XML is just a file format that gives you a regular syntax and saves you from the chores of parsing. DTDs give you symantics, but they are not part of XML. They must be created by you, for your project.
I was so frustrated with Java SAX parsers, I wrote some Java classes that load the XML as a big String and then use String operations to get and set certain tags. I became a happy programmer. This of course only works on small files, but for my situation it was sufficient.
"No matter where you go, there you are." -- Buckaroo Banzai
I have found the standard Java Properties file format and API solves 90% of the problems that XML zealots would claim XML is good for solving.
"No matter where you go, there you are." -- Buckaroo Banzai
Repeat after me:
You don't overload whitespace.
You don't overload whitespace.
You don't overload whitespace.
You don't overload whitespace.
Lately, I have been using XML schema and JAXB to greatly simplify my life. JAXB can take a schema (grammar) and create objects that represent the elements and attributes in an XML document. Once these objects are created, and this is pretty darn easy, the XML can be unmarshalled into the objects with a couple lines of code. Then, navigating the XML document is simply like travesing any other document object model. There are no tag names to remember! No run-time errors that are hard to track down. Plus, you can always run the newly created classes through JavaDoc to get the API for the classes that JAXB creates and passes to you.
Now I no longer write parsing code at all. My code space is more centered around the domain model instead of utilities. Try it, you might like it ;)
The API that you describe exists. It is XPath. The next post of a C# example illustrate a possible use of XPath API.
XPath is especially great for getting a single value. It elminate that need to walk a DOM tree of or use callback. However it does not help on a more general case of stream processing a XML file.
Hard to diagnosis. Very hard to visualize the XSLT processing. And then there is the XMLized scripting language you have to learn. I wonder how many people uses XSLT vs printf() for generating XML.
Now that we have XML and standard libraries for reading XML, it makes handling these documents a snap. Any program that needs to read them can simply have an XML parser plugged into it. The integrity of the documents themselves is maintained by the fact that they don't work if they're not properly marked up. So all these documents work, 100% all the time, and writing programs to read said documents is very simple and not prone to errors.
I agree, I was waiting for some one from document land to chime in here. I think the big complaint seems to be applying XML to something other than documents.
For me the big advantage fo XML is as a simplification of SGML. The parsers are simpler. XSL and XSL:FO are significantly easier than DSSSL and FOSI (at least to me). Basically, XML has all the advantages of SGML while getting rid of a lot of the complications.
So, for everyone bitching about XML, please bitch about it possibly being shoehorned into apllications it isn't desgined for. And, if you don't like it as a document format, try writing a Perl parser for SGML.
Dastardly
I'm surprised to see only one posting about YAML. It seems to have several advantages over XML:
1) it's easy for humans to read and write
(and for that matter, if you're programatically generating YAML it's as least as easy to generate as XML)
2) It's more compact than XML which is important for serialization in an RPC or distributed object scheme. I suspect that YAML also compresses a lot more than XML, too (compress->serialize->decompress)
But XPath, at least its implementation in current languages, takes a string as its path. If you specify an element which doesn't exist in the XML then this error will not be caught until run time. Whereas if the compiler knew about the grammar of the XML file it could tell you immediately 'there cannot be a element at this level' or 'no such attribute'. You could even hit Tab in your editor to see what the available subelements are at the current point in the tree.
Also, knowing the grammar (DTD or XML Schema or whatever) of the XML will help generate more efficient code, better than an XPath implementation could be because the general XPath has to work with all possible XML files, not just those restricted to a certain grammar.
It's like the difference between the putative code
int x = a.b[6]->c["hello"];
which is checked at compile time and compiles down into efficient code, and
int x = tree_query("a/b 6/c 'hello");
which walks some data structure at run time. It's better if the language can help you with the data structures.
-- Ed Avis ed@membled.com
If you don't know WHY you need the data you plan on handling, or how to HANDLE the data you need then XML WILL do nothing but complicate things. If you do know what data you need and how you are going to handle it then XML is useless. Take the time to understand your data path instead of wasting time building ambigious data structures. I don't understand the idea of building applications to capture unknown data sources to do things with the data.. And if you do know what the data is then skip the XML parsing bloat and DO IT. You don't need to be a super genius to intuitively know that XML is shit. If you can make your project work without it (all projects can).. save yourself the hassle.
XML is not hard, it is just a big piece of toilet for many (several?) of the tasks for which it is being proffered. A fundamental example is that it is increasingly being touted as a messaging grammar, which is just bollox, it is bytey (ie bloated on the wire), expensive to encode, expensive to parse and it ain't a grammar which means that the touted benefits of "meta"information fail to materialise.
The thing that gets me is the whole "ties" disparate systems together crap. One system talks about the "colour of objects" and the other talks about the "hue of items" and there ain't nothing about XML that helps with mapping the fact that "colour = hue" and "object = item" other than a programmer and XML adds _zero_ value to that process, XSL is _just_ a toy version of a compiler compiler. Use a real grammar to solve real grammatical problems.
end rant.
"The first thing to do when you find yourself in a hole is stop digging."
No, you'd use xpath silly. For instance, I made myself a simple XSLT and shell script that uses Xalan to easily get access to different XML elements.
/etc/passwd.xml /etc/passwd.xml
$ xpath
usage: xpath [-sx] match-path file.xml [output file]
-s value-of
-x copy-of
$ xpath -s "/users/user[@name='shane']/@passwd"
foo
$ xpath -x "/users/user[@name='shane']"
<user name="shane" passwd="foo" home-dir="/home/shane"/>
1. C++ is clearly off topic.
2. This is a hoax. Worse, this is any old hoax. Someone has just filed off the original dates provided and changed them to 2003. The first mention I can find is this 1998 post which might represent the original version.
3. By failing to link to any original source or otherwise provide attribution (other than the incorrect claim that it's a from an interview with IEEE's Computer), you are at best infringing on someone else's copyright, or at worst misrepresenting the work as your own.
Three strikes, you're out.
The article does humorously point out some of C++'s shortcomings, but to just repost it here now is wrong.
Search 2010 Gen Con events
I know what type of calendar you got for christmas.
This is why XML parsers/generators, once they get into entities and DTDs and so on, become really a lot more complicated than they would need to be if XML just stored a tree of elements
Thats why I use small parsers like SmallXMLParser in Java when speed and simplicity are most important. Its free and open source.
Okay, it's C/C++ only, and non-validating, but my XMLIO processor was designed to avoid his concerns. It's a push/pull hybrid (you tell it what you want, so you pull only what you need, but it can push elements/data at you and automatically pack into your data-structures).
It's designed not to even bother parsing sections of the XML stream you aren't interested in, which makes it perfect for low-memory situations (it is used by various cell-phone and test equipment manufacturers).
I am working on a project right now in C++ that has to send / read back an XML file - I don't really see many issues with managing the data, after I parse the XML file using MSXMLDom objects and placing my needed data into my classes that manipulate the data. The most current DTD I am working on however is a bit overdesigned to the point of stupidity but having the foresight, I managed to eliminate writing alot of code by capitalizing on inherent propertyies of an xml node. As far as doing this in perl or some other language I am not sure how much time it takes to generate the structures to manage a DTD, but in my experience it only took me 1 1/2 days to create all the code / classes I needed to read and manage 60 different elements not counting the number of attributes / enumaritions / lists that each element may have. To me this is pretty much a fair trade off when you include the readibality of the XML file versus managing a TXT file or some other type of container scheme. Dont get me wrong Im just as lazy as the next programmer and i am kinda pissed that it took 1 1/2 days away from me when I could have been surfing pron or /.
You're argument could apply equally well to windows you know?
Perl was designed to be a powerful text processing tool. The core operations and expressiveness for dealing with text processing have been elevated to the level of first-class language features, in Perl, whereas XML support is provided via a library (module) and is much less mature.
XPath is an expression language designed to express structures and patterns in XML, similar to the way regexps were designed to describe patterns in unstructured text.
In his prescribed solution for the "Scripting Basket", I think XPath is precisely the Perl enhancement for which he was grasping.
I should add that I sympathize with his complaints about having to be forced into either a streaming or in-memory parsing model. For processing chunks of large, relatively flat data, it might be ideal to catch an event for the larger subtrees, and then be able to fetch all of the children for walking or other direct manipulation.
XML Journal published my article (renamed).
http://seapod.org/writing/markup-madness.html
Seems to be the case in practice as well.
Steve Klingsporn
steve at buzzlabs dot com
If we take his basic question, "XML is too hard for beginners", well you're correct there are lots of tools & libraries. Here are a few: http://castor.exolab.org/ From the website: "It's basically the shortest path between Java objects, XML documents and SQL tables. Castor provides Java to XML binding, Java to SQL persistence, and then some more. " Pretty amazing I say. Sun's JAXB is similar. As for Pull parsers, Dennis Sosnoski has some interesting articles at IBM Developer Works: http://www-106.ibm.com/developerworks/xml/library/ x-injava/#6
I am not the author of the post you responded to, but I felt compelled to comment.
:)
Plagiarism, in the most commonly used sense, is taking credit for someone else's words or ideas. Since he posted as an anonymous coward, he is unable to take credit. Therefore, he didn't commit plagiarism in the usual sense.
He deserves the lesser charge of failure to cite. As long as we are throwing out accusations, I would accuse you of libel http://dictionary.reference.com/search?q=libel
, but since he's an AC, I can't claim that it damages his reputation. Hmm, never mind.
XPath eliminates the need to walk a DOM tree?
That's hirarious.
Exactly. It should be noted that this is also the purpose of CSS. Of course it depends on the complexity of the page. If your page is complex and requires quite different content (maybe less menu items for PDAs or something), you way need to go XML + XSL -> XHTML + CSS. Otherwise you can just go XHTML + CSS.
while()
freedom...
...people stop referring to it as a language. It's not.
And any "programmers" that have trouble with XML probably aren't worth hiring or keeping around, anyway: it's a sure sign of a poseur. Businesses need to broom these idiots out of their payrolls, and let us real programmers get on with our work.
This mirrors what I've heard VB/Coldfusion dummies whine about C, Perl, and Java - it's "too hard". If it's too hard, pay me more, let me do it, and have the company fire your dumb ass.
Moderators on crack, the parent is not a troll, he's just about right.
Read any introductory article on XML, or the first chapter of a book - it's so plain and simple and inviting and looks like a great idea. By page 50 of the book you're crawling through a dense pile of industrial trash. A book on XML I bought lists over thirty classes in OpenXML implementation - over THIRTY classes, that's hundres of methods; do I want to to dig into this just to read and write a simple file of records? Where simple and robust alternatives exist? Hell, no.
"Only the small secrets need to be protected. The big ones are kept secret by public incredulity." - Marshall McLuhan
It's kind of funny how once we're occupying Iraq, something like 80% of the active army will be in use and we won't have the resources to deal with any new threats.
or any friends to help us.
That's funny isn't it?
That's an old april fool's joke.
Repeat after me: Before I freak out...
I will always check the date.
I will always check the date.
I will always check the date.
I will always check the date...
I also am not the author of the post you responded to, but I felt compelled to comment.
Plagiarism, in the most commonly used sense, is taking credit for someone else's words or ideas. Since he posted as an anonymous coward, he is unable to take credit. Therefore, he didn't commit plagiarism in the usual sense.
He deserves the lesser charge of failure to cite. As long as we are throwing out accusations, I would accuse you of libel http://dictionary.reference.com/search?q=libel , but since he's an AC, I can't claim that it damages his reputation. Hmm, never mind. :)
Be sensible, keep things simple, use a DB when it makes sense, use XML when it makes sense.
If I see another XML standard (take X-Links for example) that makes something simple like rocket science I'll go mad.
Oh, without question, it has potential. But I have a rude question to ask. Have you ever run an IT department? Running infrastructure for non-techies reliably means, "You should put in X moronic thing. My fourteen year old nephew, who's really smart, saw it on television and he says that it's *much better* then what you're proposing."
How things *could* be bears only the most trivial resemblance to how they end up when every little budget item has to be approved by people who say that "the floppy on my hard drive must have a virus".
I was making about the same amount that you were (>$60K) with great bennies and plenty of opportunity for advancement and I wouldn't go back to doing IT management without a commitment of $20K or more to be spent at my sole discretion per year and an assistant I would choose and train whose schedule was entirely mine to decide. Users are idiots. Or at least enough of them are to make a job like that look to me like the next closest thing to purgatory.
Rustin
Data is the lever, rigor the fulcrum, brains the force that drives it all.
My. Testy, aren't we?
Well, first of all, you posited a case where Another possibility is, we could be able to trade LAN admin skills for free rent, building-manager style. Apartment complexes might start building up their own hotspots and such, and they'll need someone to handle the tech support. Handymen at complexes get free rent, so does the super, why not the tech guy?
Hm. Handyman. Building Manager. If you really think that any position that can be compared to those will be free of luser bullshit then may I suggest an antidote.
"Can-do attitude?" If you really are so naive as to think that the right attitude is all that is needed to end up with a well architected system, then along with your ignorant, superior I Am A Real Geek, You Are A Mere Peon trash-talking, well, I'l take that as a big unambiguous "no" to my question. You clearly have no clue, let alone experience getting budget allocations, departmental approvals, and, hardest of all, the continuing support of people who see computers as magic boxes where anything that isn't what they want is YOUR FAULT.
"I erased my hard drive with Norton and now all my files are gone. Fix it."
"I installed unapproved, bootleg, security software, lost the manual, and forgot the password. Fix it."
"Somebody I don't like has a better computer then me. Fix it."
"We refused your budget allocations for five years running and now we can't use the current software or cool new web sites. Fix it. Oh, and don't spend any money or change any configurations or reduce any other existing capability to do it."
Silly?
Dime a dozen?
Arrogant and egotistical?
Fuck you and the government job you rode in on. I don't know quite how you turned a reasonable question asked with careful commentary on its sensitive nature into some penis measuring contest but, well, you clearly don't know shit about actual operations work, let alone operations management.
I ask you again, what experience do you have actually running a support department? How many users, let alone department heads have you ever had to report to re support? What is the largest organization or project supporting non-techies that you have ever been responsible for?
I don't want to hear about your filling in for a month or two once answering phone calls. Have you ever in your life been the senior person, the person at whose desk the buck stops, for any sort of operations? Any support of non-techies at all?
You actually think that your server would be some some sort of sanctum sanctorum? Whoever owned that building would most likely have keys, passwords, and overrides to everything you did. And when the owner chose a service provider whose bandwidth fell apart at key times you would get to kiss the ass of every influential tenant who felt like berating somebody.
You actually think that, as an employee of the building, you could just give tenants "an information sheet" for wireless and then be free from blame when *they* fucked it up? Yeah, right.
Look, I don't know much about you as a programmer. You clearly don't know shit about me.[1] But you have made it mighty clear that you don't understand what a senior tech support job is. I made a point of specifying that I personally would not take that sort of job. Why you so emphatically are displaying a stick up your ass the size of the federal deficit doesn't even interest me very much.
You want to show me how wrong I am? Go for it, baby. There are plenty of buildings these days that include "digital services" in the rent. Find me people holding the sorts of jobs we're talking about and get *them* to agree that I'm building a strawman. Until then, well, when discussing a subject that's already been declared fraught, try not to get snippy with people who know far more then you do. Sometimes we bite back.
[1] I'll give you a big hint: there's a reason that I could take your exposed conduit proposal
Data is the lever, rigor the fulcrum, brains the force that drives it all.
The problem in not XML as such, but programming parsers is hard, really hard.
Um, no, it's not. Parsing languages which you define has been basically understood for years. It's trickier to parse XML than it used to be, but only because XML has grown, not because the task is inherently difficult.
<flame>
Really ?
You are either very smart or very stupid and my money is on the latter since I stated that programmering parsers difficult and you contradicted me and stated that parsers are well understood.
Well I understand parsers pretty well, well enough to understand the distinction between LL, LR, LALR(0), and LALR(1) parsers. However I also understand them well enough to know that most programmers cannot deal with anything other than a simple recursive descent parser.
Well programming parsers is difficult so difficult it is actually at the edge what is possible by human programmers which is why parser generators such as yacc and bison are necessary.
However If you are really are as smart as you make out feel free to point out the obvious ambiguity in my statements above and prove me wrong!
</flame>
brainfuck!
brainfuck!
brainfuck!
Damn, and all this time I've been thinking that this "Anonymous Coward" person was one of the most brilliant (and most prolific) posters on this discussion board. Now you've demonstrated that he's nothing but a fraud. I think I've lost all of my respect for Anonymous Coward.
-CausticPuppy "Of all the people I know, you're certainly one of them." -Somebody I don't know
Comma delimited works fine, until you order some parts with description like "Bolt, Hex, Brass, 30mm, fixing for the use of", and all the other fields get shunted across, and the PHB is wondering why it doesn't work.
And yes, I've seen that happen.
One advantage of the comma is that it's easier to hardcode as a literal in your program than a tab, and on this ocassion that was why it had been done that way.
Then there's European number formats...
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Mr. Jones related an incident from "some time back" when IBM Canada
Ltd. of Markham, Ont., ordered some parts from a new supplier in Japan. The
company noted in its order that acceptable quality allowed for 1.5 per cent
defects (a fairly high standard in North America at the time).
The Japanese sent the order, with a few parts packaged separately in
plastic. The accompanying letter said: "We don't know why you want 1.5 per
cent defective parts, but for your convenience, we've packed them separately."
-- Excerpted from an article in The (Toronto) Globe and Mail
- this post brought to you by the Automated Last Post Generator...