XML Co-Creator says XML Is Too Hard For Programmers

Too hard? by Ledskof · 2003-03-18 01:11 · Score: 5, Funny

Sounds like visual basic programmers are complaining or something.

--
This is my sig. The post is over.

Re:Too hard? by Omkar · 2003-03-18 01:32 · Score: 2, Informative

blah blah blah...right tool for the right task...blah blah blah.
Seriously, don't knock VB until you need to code a quick dbaccess (or other simple) app in a couple of days for internal use. Easy languages have their places!
Re:Too hard? by hatchet · 2003-03-18 01:39 · Score: 1

Actually article says "write code" not programm. So in the matter of fact you write XML code and you programm in C.
I believe that explains some things...
Re:Too hard? by Anonymous Coward · 2003-03-18 01:41 · Score: 0

watch yourself being modded down now.. Tsk tsk i would think twice before trolling.
Re:Too hard? by Anonymous Coward · 2003-03-18 01:43 · Score: 0

I'd use Delphi over Visual Basic any day of the week. Despite being cast as a Microsoft shill on this site countless times, that won't stop me from proclaiming that Visual Basic was one of the biggest mistakes in the computing arena, and the easiest way to kill a programmer's career (by destroying their programming mindset) is to get them programming in VB.
Re:Too hard? by twitter · 2003-03-18 01:50 · Score: 1

Sounds like visual basic programmers are complaining or something.

Yeah, that ugly green text on a white background is a dead give away, har har har!

His complaints point to why free software works better. In the free software world you start with a library of functions and can share your improvements which last forever. In the closed software world, you pay for a library that gets tossed out when it's time to be sold another one. Before you get to move onto the next big closed source fad, you get to waste your time working around the faults of the tool you bought, and no one but the folks in your own particular company benifit from your work. It's a sad, sad song.

--
Friends don't help friends install M$ junk.
Re:Too hard? by Ledskof · 2003-03-18 01:57 · Score: 1

I've coded a quick db front end for internal use in less than a couple of days.

So I'm qualified for knocking it right?

I know VB has it's purpose :) It's just so fun to knock cause it's so bloated.

--
This is my sig. The post is over.
Re:Too hard? by WPIDalamar · 2003-03-18 02:08 · Score: 3, Insightful

If the full set of XML is too hard to use, then don't use the full set of features! I regularly write programs that read/write xml style documents, but with only the most basic xml functionality. The main benefit is so that other programs can also read & write these files. It's stupid to have a general purpose XML parser, when you only need a small subset of functionality.
Re:Too hard? by Anonymous Coward · 2003-03-18 02:18 · Score: 0

What are you talking about? XML is an open standard and there are multiple open source XML parsers.
Re:Too hard? by khuber · 2003-03-18 02:31 · Score: 5, Insightful

It's stupid to have a general purpose XML parser, when you only need a small subset of functionality.
Yeah, the world needs more half-assed barely functioning and noncompliant XML parsers.
Seriously I think it's much more robust to just use a normal XML parser. You get all the character set support. If someone hacked up their own parser at work I would reject it in a code review. There's no sense in maintaining your own XML parser these days; they are a commodity.
-Kevin
Re:Too hard? by Pxtl · 2003-03-18 02:39 · Score: 2, Interesting

Whoever modded this troll is a jingoistic zealot. The poster is just saying that VB, for all its faults, is good for database RAD. Which many people would agree with.
Re:Too hard? by Anonymous Coward · 2003-03-18 02:58 · Score: 0

XML is not hard in the sense that it is conceptually hard - it isn't - it's hard in the sense that it is extremely cumbersome to use, and often used for things for which it is too slow and not really appropriate, anyhow.
Re:Too hard? by kryonD · 2003-03-18 03:04 · Score: 1, Interesting

don't knock VB until you need to code a quick dbaccess (or other simple) app in a couple of days for internal use.

Maybe if you're a beginning programmer. My shop codes exclusively in C and I can even create rather complex apps in a few days because:

#1 I know what I'm doing, and..

#2 It's called libraries....be it STL, MFC, MyStack.h or whatever. Code re-use is the key to rapid and robust application development.

And my code is platform independant and usually weighs in at less than 100K (for a simple DB app). Web-based,Web-based,Web-based...can I make it any clearer? I'm talking about real-world, mission-critical data applications where bandwidth is paid for and no one gives a fsck if the button turns neon pink and spins in a circle when you mouse over it.

VB has its place in small businesses and first year programming courses where its not a big deal if the code is messy, non-portable, slow and bloated. If your company is paying full time salaried VB programmers who have no other skills, start familiarizing yourself with the procedures involved in signing up for unemployment. You company is eventually going to grow to a point where VB totally fails and you find that your job was the one cut in order to dish out the money for someone else's software that actually works. Either that or your company just goes tits up. Dot Com anyone?

--
I've dirtied my hands writing poetry, for the sake of seduction; that is, for the sake of a useful cause. --Dostoevsky
Re:Too hard? by Anonymous Coward · 2003-03-18 03:19 · Score: 0

NOOOO.... Don't subject first-year programmers to VB! It'll wreck their minds.
Re:Too hard? by arkanes · 2003-03-18 03:30 · Score: 5, Insightful

You know, using VB is just code reuse. It's just reusing more code than you're use to. It's got some serious strengths. The app you write in a couple days the VB programmer can toss out after lunch. How about data aware controls? Those are a pain in the ass in C/C++, although you can make it easier by using third party components. Like ActiveX controls. Which are a pain in C/C++, but are painless in VB. On the other hand, your code won't be small, and you'll be linking to a massive runtime, and you're using a language who's syntax makes me feel dirty.
Oh, and if you're making web-based apps, wtf are you using C for?
Re:Too hard? by sib183 · 2003-03-18 03:44 · Score: 1

Ledskof wrote:
Sounds like visual basic programmers are complaining or something
----

So this snippet from the article is in Visual Basic? That is news to me...

my ($state_var1, $state_var2) = (0, '');
my (%collector1, $collector2);
while () {
next if (/rexexp-for-something-I-ignore/);
if (/something-I'm interested-in/)
{ $state_var1 = &foo($1, $4, \%collector1); }
elsif (/something-else/)
{ $state_var2 = &bar($_, $state_var1); }
elsif (/yet another/)
{
$state_var_1 = $state_var2 + $collector1{baz};
}
else { print; }
}
Re:Too hard? by Xerithane · 2003-03-18 03:44 · Score: 0, Offtopic

Whoever modded this troll is a jingoistic zealot.

Did you know if you search google for jingoistic that the first link is to the dictionary definition of it. That cracks me up.

--
Dacels Jewelers can't be trusted.
Re:Too hard? by crazyphilman · 2003-03-18 03:48 · Score: 1

Hey! No fair picking on VB programmers! Our life is hard enough as it is, with the crappy syntax we've got to put up with, and the general flakiness of the language. Do you have any idea how hard it is to write something in VB that works well??? Or, worse, how hard it is to maintain something written by a novice VB person (usually something like a liberal arts major who just kind of floated into a programming position)??? Yeesh. You C++ and Java guys don't know how good you've got it. Show some sympathy!!!

God, I miss my Java days. Sigh... I keep telling myself, "It may be VB, but it pays the bills..."

--
Farewell! It's been a fine buncha years!
Re:Too hard? by Billly+Gates · 2003-03-18 03:50 · Score: 2, Insightful

Sounds like a similiar argument I hear for c++.

I do not know any programmer who uses all of the features of ansi. This may have something to do with the fact that no c++ compiler is actually %100 ansi compliant. There are just so many different kinds of templates that most programmers do not use most of them because less experienced programmers will not be able to read the code.

I never got into the xml hype. Soap is cool but xml otherwise is just an ascii text file with tags. I have not written alot of xml programs but sgml is fine for documents and is easier to read. Websites that need alot of information to be displayed can be gathered from a databse.

--
http://saveie6.com/
Re:Too hard? by Evil+Grinn · 2003-03-18 03:51 · Score: 2, Flamebait

My shop codes exclusively in C and I can even create rather complex apps in a few days because:

#1 I know what I'm doing, and..

#2 It's called libraries....be it STL, MFC, MyStack.h or whatever.

STL and MFC are C++, not C. Presumably you know the difference between C and C++, since you "know what you're doing". I must assume then that you are trying to gloss over the distinction between C and C++ so as not to further confuse the VB programmers among us.

--
where there's fish, there's cats
Re:Too hard? by Anonymous Coward · 2003-03-18 03:53 · Score: 0

neither you nor the previous poster "coded a quick db front end". what you both did was to grab a code sample and modify it for your purposes. you could have done this with a decent language too, but you didn't. now you both have left behind unmaintainable messes that are still being epanded, bloated, and maintained, bug-riddled as they are.
i once worked at a place that used old DOS machines running WordPerfect with keyboard macros to process their datafeeds. i'm sure they started with the same logic as you and your friend.
Re:Too hard? by crazyphilman · 2003-03-18 04:00 · Score: 1

Obviously all you know is C. It must be some kind of "geek pride" thing.

VB can be used to write some very complex client-server apps, which is really what VB is for. It integrates very nicely with databases using ADO and OLEDB, and makes it possible to write multi-tier database apps very, very quickly. The prototypical app here would be an n-tier app with web pages on a DMZ machine, dlls on internal machines, and a database backend (usually Oracle). The firewall guys set up the dlls so that only the website can access them, and in the dll, you only expose the specific interface you want to expose. If you do it correctly, it's very hard for someone to mess with your work.

Another nice feature is that VB is very nicely integrated with other services like email, Exchange Server services, and so on. Although some people like to use these capabilities to write virii and worms, the REST of us use them to write groupware.

Anyway, why the hell are you still writing in C? I thought Perl, Java, and PHP4 were the gold standard for web apps... Aren't you afraid of buffer overruns??? Lord knows half the system calls in C are vulnerable...

--
Farewell! It's been a fine buncha years!
Re:Too hard? by ckaminski · 2003-03-18 04:03 · Score: 1

God, I miss my Java days. Sigh... I keep telling myself, "It may be VB, but it pays the bills..."
I keep saying the same thing, but it goes more like this these days... "It may be a 4GL, but it pays the bills.". I'd *KILL* for some VB action right now... Sigh, when oh when will I ever be able to use Java or C++ again....
-Chris
Re:Too hard? by EriondII · 2003-03-18 04:06 · Score: 2, Insightful

Signing up for unemployment? Hardly! I know of many Industries that rely exclusively on VB. Fortune 500 companies including the one I work for. We are currently in the process of writing an ERP in VB, and with phase 1 rolled out, no such issues exist. This is a complete Sales Order Entry system that connects with and replaces old COBOL and Progress legacy systems. Speed is not even an issue and I would wager our code base including COM+ components and XML/XSL Views is more robust and useful than some shops C libraries.

And VB is not the only langauge I know or program in. I use Java, C, COBOL, and Progress(ever heard of it? Thought not.) for many other tasks within the organization. It's just a matter of using the best tool for the best job. I try not to be to tunnel visioned on one langauge and figure out how to make the best use of each.
Re:Too hard? by crazyphilman · 2003-03-18 04:15 · Score: 1

Life sucks these days. The market for *real* programmers has been destroyed by corporate America. I feel like I'm extinct.

We should hold a national Irish Wake for the "real programmer". Everyone will be required to bring their favorite intoxicating beverage and at least one good music CD. The wake would last for a week, and we'd do our best to remain totally plotzed for the entire period.

Hmm... How about getting about a zillion of us and cram into Wall Street with as much alchohol, fast food, and music as we can carry? Imagine that: a million-drunk-programmer-march. We'd take Wall Street and hold it as long as we can (we wouldn't FIGHT the cops; we'd get 'em drunk and let 'em join the party). It'd be like woodstock, only rowdier.

Sigh... Now, THAT would be cool.

--
Farewell! It's been a fine buncha years!
Re:Too hard? by Ledskof · 2003-03-18 04:28 · Score: 1

No. Whether or not the other guy did this I don't know, but I'm definitely not going to make a firm assumption that he did, because that'd just make me a jackass. I'm talking about writing code back in 97. Was there even that much sample VB code in 97?

I appreciate the judgement of my code (that you've never seen) as well.

I don't promote or use VB. It was a thing of the past for me. I was doing IT for a company that had a really crappy front end for a simple inventory database. So, I took a couple of Que books off a house dev's shelf and learned enough to write the front end. It was well documented, commented, and written from scratch. And it was simple. It's not like I was writing code for a big company's payroll software or something.

--
This is my sig. The post is over.
Re:Too hard? by Anonymous Coward · 2003-03-18 04:51 · Score: 0

Huh? You clearly haven't read the article and decided to jump on the /. propaganda bandwagon.

There's absolutely NO WAY any rational person could draw the conclusion you just did.

As for XML - it *IS* an open standard.

God, I hate newbies.
Re:Too hard? by murdocj · 2003-03-18 05:35 · Score: 1

You might want to try reading the article. It was pretty interesting.
Re:Too hard? by johnnyb · 2003-03-18 06:02 · Score: 1

SOAP is an overbloated implementation of XMLRPC. XMLRPC is nice for simple RPC, CORBA is nice for an OO architecture. SOAP is nice for document transfer but not much else.

I like XMLRPC because I can have Flash communicate with my Perl modules.

--
Engineering and the Ultimate
Re:Too hard? by EastCoastSurfer · 2003-03-18 06:37 · Score: 4, Insightful

The market for *real* programmers has been destroyed by corporate America.

I think that the *real* programmers that you have talked about all write libraries now. These guys all have jobs at the tool makers like MS, Apple, etc...

Businesses in general don't want (and generally don't need) *real* programmers, they want software engineers. They want someone who can sit down, work out some requirements and provide a timely, cost effective solution. It has taken me some time to fully realize this, but the right technical solution is not always the right business solution. The PHB could really care less if the app is written in VB, C, Java, as long as the application works to within their parameters. It is those parameters that are specified by the people paying for the software that will direct the language/technology you ultimately use.
Re:Too hard? by bpfurtado · 2003-03-18 06:40 · Score: 1

I will put it simple, VB is not even a OO language. Its doesnt have inheritance, one of the 3 pillars of the OO paradigm. Knowing this its enough for me.
Re:Too hard? by Anonymous Coward · 2003-03-18 07:01 · Score: 0

If people stopped using garbage like MS Access, VB would probably dry up and die. Access is primarily used by uninformed Windows loving sheep. The other windows loving script kiddies who work for them are then forced to use VB, which would disapear completely if MS dumped Access.
Re:Too hard? by blibbleblobble · 2003-03-18 08:36 · Score: 1

I regularly write programs that read/write xml style documents, but with only the most basic xml functionality. The main benefit is so that other programs can also read & write these files.

Yup, 100 lines of code, or a couple of extra libraries in your program. It's so much better than the 4 lines of code needed to read your entire INI file into a hash.
Re:Too hard? by 1g$man · 2003-03-18 09:16 · Score: 1

VB.NET does.

Not that it justifies using VB, any...
Re:Too hard? by Anonymous Coward · 2003-03-18 09:49 · Score: 0

The app you write in a couple days the VB programmer can toss out after lunch. How about data aware controls? Those are a pain in the ass in C/C++, although you can make it easier by using third party components. Like ActiveX controls. Which are a pain in C/C++, but are painless in VB.
I suggest you look at Borland C++ Builder. IT has all the nice fluff of VB (data aware controls, etc.) without the braindead VB language.
I'll be amazed if anyone actually reads this.
Re:Too hard? by whereiswaldo · 2003-03-18 09:53 · Score: 2, Interesting

This is the lamest story I've ever heard on Slashdot. I almost left for good after reading this. If the next week's worth of news doesn't get any less lame, I probably will.

Slashdot, don't be fucking lame. This is news for *nerds*, not for simps and wannabees. XML too hard? Then you shouldn't be a programmer cause that's about as easy as it gets unless you're just a hobbyist.
Re:Too hard? by pyrrho · 2003-03-18 10:08 · Score: 1

> The app you write in a couple days the VB programmer can toss out after lunch.

I have not found this to be the case, ever. Claimed a lot, but not true, in my experience.

Especially since it is the CASE tools that accomplish this for VB, and they "work" for C++ too. Of course CASE tools introduce bloat and inefficiency, but the CASE generated C++ is still more efficient than the VB equiv.

imnsho

--
-pyrrho
Re:Too hard? by Anonymous Coward · 2003-03-18 10:25 · Score: 0

I did!
Re:Too hard? by kryonD · 2003-03-18 11:02 · Score: 2, Interesting

Obviously all you know is C. It must be some kind of "geek pride" thing.

I've been programming for 16 years...here is a short list of the languages I have used in real-world (i.e. I got paid) applications:

C, C++, COBOL, VB (eventually rewritten in C when it hit the scalability wall), Intel x86 ASM, Motorolla 6809 ASM, and Motorolla 6502 ASM.

The list of languages I have worked with either in private, or an academic setting is quite large and are not listed above because I either wouldn't trust them for real work, or my employer wouldn't trust them.

ADO and OLEDB...Oracle

Proprietary. Proprietary. Proprietary, but at least somewhat portable; however, waaayyy too expensive unless you are dealing with massive amounts of data/users or are coding for government/businesses that require namebrand stuff.

some people ... write virii ... the REST ... write groupware.

This is true. However, I have yet to run into anything that I couldn't replicate in C/C++ using RFC standards. Some of the more nifty features of Exchange would need some reverse engineering, but I've never had the need to provide them.

why the hell are you still writing in C? I thought Perl, Java, and PHP4 were the gold standard for web apps... Aren't you afraid of buffer overruns??? Lord knows half the system calls in C are vulnerable...

Don't get me started on the gross mis-management job Sun has done on JAVA. It has never lived up to Sun's promise of being platform independant. Security is another problem depending on whether you are talking about client side, or server side. What happens if you have a customer whose security policy disables JAVA on the browsers? For server side, I challenge you to name something you can do in JAVA that you can't do just as easily in C/C++. The language has its advantages, but most of them can be reproduced in other languages with minimal effort.

Perl and PHP are very nice for simple straight forward page production. However, I code for US DOD and the security issues with both of those as well as a general distrust of anything open source has prevented their use on a general basis. I have seen some stuff done for DOD in those languages, but it was either in violation of policy, or contracted out and not on a .mil server. Additionally, they are interpreted languages. If you need to pull 4 million items into memory, consolidate the duplicates, calculate usage stats over multiple time periods, then filter out those that don't meet a usage to property hit list, Scripted languages are either way too slow, or simply incapable of doing that kind of complex filtering on a large quantity of data. The above process can be done in about 400 lines of C code, most of which is copy and pasted loops and if statements and it's fast.

Buffer overruns are easy....don't rely on the server to feed your script data. Write the code to pull the data from the server and set a cutoff limit where extra data is ignored. Write a simple filter command to break attempts at embedding malicious SQL commands in data and your done. You can do this in any language, but yet you still occasionally see AIVAs about buffer overflow vulnerabilies in everything under the sun.

System calls? Don't know what to tell you there. Been coding web based stuff for two years in C and never had to make one. Or are you referring to anything that handles I/O as a system call? If so, read your input one character at a time and COUNT them...stop when you hit your buffer's pre-defined limit. If you do hit a limit, have the app make a log entry. Either your code has failed to expect a wierd user need that requires sending large amounts of data, or someone is trying to attack your script....the latter is far more likely. I'd rather have a random user complaint once in a blue moon for lack of flexibility, than all my users pissed because someone rooted the box and defaced the web site.

--
I've dirtied my hands writing poetry, for the sake of seduction; that is, for the sake of a useful cause. --Dostoevsky
Re:Too hard? by Anonymous Coward · 2003-03-18 11:04 · Score: 1, Interesting

If you think SOAP is nice for document transfer you should check out HTTP. It's great. And most firewalls let it through. You can use HTTPS to encrypt, too!
Re:Too hard? by arkanes · 2003-03-18 11:51 · Score: 1

Assuming that by CASE you meant RAD, I'd still disagree with you. I hate VB and can certainly do GUI layout in it faster than I can in C++. Maybe if I used C++ builder, but I detest it, so no.
Re:Too hard? by Caine · 2003-03-18 12:58 · Score: 1

I've been programming for 16 years...here is a short list of the languages I have used in real-world (i.e. I got paid) applications:

C, C++, COBOL, VB (eventually rewritten in C when it hit the scalability wall), Intel x86 ASM, Motorolla 6809 ASM, and Motorolla 6502 ASM.

You've been programming for 16 years in all those languages? Mighty impressive. Strange you don't know things like it's Motorola (since you wrote it twice, I assume it wasn't a mis-spelling).

Proprietary. Proprietary. Proprietary, but at least somewhat portable; however, waaayyy too expensive unless you are dealing with massive amounts of data/users or are coding for government/businesses that require namebrand stuff

Yes, people use proprietary programs, and no, they aren't that expensive. Shocking huh?

This is true. However, I have yet to run into anything that I couldn't replicate in C/C++ using RFC standards. Some of the more nifty features of Exchange would need some reverse engineering, but I've never had the need to provide them.

I don't know what to say. None of those sentences make any sense. "in C/C++ using RFC standards"...wtf are you supposed to mean by that? You can implement things totally unrelated to any RFC by using RFC standards? That would be impressive. "Here I'm using IP over Aviation carrier RFC standard to access the database".

However, I code for US DOD and the security issues with both of those as well as a general distrust of anything open source has prevented their use on a general basis.
First you complain about proprietary software and now open source? You're one fun troll aren't you :)

most of which is copy and pasted loops and if statements

Why the hell would you want to copy loops and ifs? The reason you ever copy code is to AVOID branching.

If so, read your input one character at a time and COUNT them

Never ever fucking read something one character at a time.

Seriously, fun troll.
Re:Too hard? by chris_mahan · 2003-03-18 13:02 · Score: 1

I'll second that.

In general, PHBs don't care what it's programmed in, and since intel servers ar cheaper than programmers, they don't even care too much if they're scripting languages.

The key is that the software integrator needs to be a good analyst, have good business acumen, give the users what they want quickly, and be willing to use the best tool for the job.

--
"Piter, too, is dead."
Re:Too hard? by kryonD · 2003-03-18 13:50 · Score: 1

You've been programming for 16 years in all those languages? Mighty impressive. Strange you don't know things like it's Motorola (since you wrote it twice, I assume it wasn't a mis-spelling).

I wrote that response in a hurry, so it may have been a bit unclear, and yes, I did mis-spell Motorola. I have not been using ALL of those languages for 16 years. In the past 16 years, those are the languages I have used to write code that I got paid for. I started on Basic like most students did back then. Started using assembly about 12 years ago. still occasionally use it now to write device drivers for folks doing hobby work.

Yes, people use proprietary programs, and no, they aren't that expensive. Shocking huh?

Last time I checked, the per processor fee for Oracle 9i was almost $25K. I call that expensive when MySQL and PostgreSQL are basically free.

wtf are you supposed to mean by that? You can implement things totally unrelated to any RFC by using RFC standards?

I was responding to integrated services such as email. It is actually quite easy to open a socket on port 25 to a SMTP(RFC based) server and send a MIME(RFC based) compliant email(RFC based). Perhaps reading the parent post would help you out.

First you complain about proprietary software and now open source? You're one fun troll aren't you :)

Again, was writing the response quickly. I am a huge proponent of open source. The DOD, on the other hand, generally mistrusts anything that is freeware, or not backed by a credible company. Thus I am stuck coding for M$ based products.

Why the hell would you want to copy loops and ifs? The reason you ever copy code is to AVOID branching

Branching is a result of decision making no matter how you code it. If there are a lot of deciding factors, you can either use multiple decision statements, or squeak out a few extra percentage points in performance by making one giant, hard to read decision statement. Loops are unavoidable when reading data from a DB connection unless of course you always knew the number of returned rows at compile time.

Never ever fucking read something one character at a time.

OK, 2 at a time, 10 at a time, 10000 at a time. The point was if your code is depending on an outside source limmiting it's input, then your code is likely vulnerable to a buffer overflow attack. There are other methods to avoid it, but controlling the input in your code by doing the count yourself is a sure bet on avoiding overflows.

Seriously, fun troll

I think you missed the point.

--
I've dirtied my hands writing poetry, for the sake of seduction; that is, for the sake of a useful cause. --Dostoevsky
Re:Too hard? by twitter · 2003-03-18 14:40 · Score: 1

A mindless coward writes: Huh? You clearly haven't read the article and decided to jump on the /. propaganda bandwagon.

There's absolutely NO WAY any rational person could draw the conclusion you just did.

As for XML - it *IS* an open standard.

God, I hate newbies.

Let me help you, again, with my reasoning. ASCII is open to, but programs to manipulate characters may or may not be. Witness Notepad, a dinky closed source editor with much room for improvement. The author said that writing functions to deal with XML was difficult, so sad C# does not come to his rescue. Just as VIM and KDE's advanced editor are superior to Notepad, free XML libraries will be superior to non-free ones.

Be nice to newbies, trolls and dumb animimals. They might be brighter than you, might come around and can be put to useful work in anycase.

--
Friends don't help friends install M$ junk.
Re:Too hard? by dwsauder · 2003-03-18 14:42 · Score: 2, Insightful

This is the lamest story I've ever heard on Slashdot. I almost left for good after reading this. If the next week's worth of news doesn't get any less lame, I probably will.
Slashdot, don't be fucking lame. This is news for *nerds*, not for simps and wannabees. XML too hard? Then you shouldn't be a programmer cause that's about as easy as it gets unless you're just a hobbyist.
Somehow, I think you don't understand what the story is about. Something can be easy, but for lazy programmers (and if you understand Larry Wall's Perl culture, then you know that laziness in a programmer is a virtue) it ought to be simpler so that we can enjoy our work more. There are some programming techniques that are just too repetitive, and doing them over and over and over can make a programmer go crazy, no matter how easy it is. Well, that's the way it is with XML. Sure, XML is as easy as it gets. But if you have write so much repetitive code, you look for ways to automate it all. A major point of Tim's complaint about XML is that apparently no one has done anything to make programming with XML less boring and repetitive.
Re:Too hard? by Anonymous Coward · 2003-03-18 15:23 · Score: 0

Me too!

BTW: Borland Delphi also fits into this area, but no one in the US seems to remember that.
Re:Too hard? by Anonymous Coward · 2003-03-18 15:28 · Score: 0

[GrammarNazi] Than You're Used To [/GrammarNazi]
Re:Too hard? by pyrrho · 2003-03-18 17:28 · Score: 1

By case I meant Computer Aided Software Engineering, e.g. the system within Microsoft Visual Studio where C++ code (using MFC, is generated for you by the "wizard"). So yes, I guess something like C++ builder.

Any CASE tools do tend to be detesbable. However, more so than VM based languages? That messy generated code is, at least, accessible, the messy code in a VM is just your environment, so live in it.

--
-pyrrho
Re:Too hard? by jd_esguerra · 2003-03-18 17:36 · Score: 1

The app you write in a couple days the VB programmer can toss out after lunch.

To get an idea of just how poorly I code, you should replace "toss" with "crap." In fact, feces could probably generate more reusable code than I do.
Re:Too hard? by jd_esguerra · 2003-03-18 17:50 · Score: 1

The PHB could really care less if the app is written in VB, C, Java, as long as the application works to within their parameters. It is those parameters that are specified by the people paying for the software that will direct the language/technology you ultimately use.
In the area I work (I'm actually a MechEng), the language is often dictated by the customer as one of the design requirements. There is usually a dialog with the customers about "what language will be used where," but ultimately it is the customer who has to specify which language will be used. This does help to lock in some of the performance parameters that might be language dependent. So, yes, in many cases, the PHB doesn't have to care what language the app(s) are written in.
Re:Too hard? by Anonymous Coward · 2003-03-19 02:16 · Score: 0

funny yet wise, but hardly relevant to the discussion at hand michael
Re:Too hard? by Tet · 2003-03-19 03:13 · Score: 2, Informative

Motorolla 6809 ASM, and Motorolla 6502 ASM.
Of course, while the 6809 was indeed a Motorola chip, the 6502 was made by MOS (a company started by former Motorola employees). The initial 6501 was pin compatible with the 6800, and Motorola sued, resulting in the 6502, which had a different pin layout.
Other than that, I agree with your comments.

--
"The invisible and the non-existent look very much alike." -- Delos B. McKown
Re:Too hard? by crazyphilman · 2003-03-19 04:33 · Score: 1

EastCoastSurfer said: "I think that the *real* programmers that you have talked about all write libraries now. These guys all have jobs at the tool makers like MS, Apple, etc..." (then he goes on to say that bosses want software engineers, not real programmers, and etc).

To which I reply:

Uhm, no. Companies do not want "software engineers"... They want outsourcing companies in India, and H1-Bs for the jobs that are still here. In case you haven't noticed, most programmers have been laid off, and the job market is at its worst state in decades. What jobs *are* available are generally for slave wages, with completely unreasonable and illogical minimum skill requirements. Want a laugh? Check out "fuckthatjob.com". It chronicles companies with the worst job placement ads, like companies that try to bill full-time programming jobs as "internships" (which, by the way is totally against the law, but no one seems to care).

You'd have to have been living under a rock to not be aware of this. Half the developers in NYC are probably close to their last unemployment check by now. The other half are sweating, with the axe over their heads. I'm guessing from your post that you're tucked away somewhere safe, and in deep denial about the state of our field.

--
Farewell! It's been a fine buncha years!
Re:Too hard? by markhb · 2003-03-19 05:26 · Score: 1

> If you think SOAP is nice for document transfer you should check out HTTP. It's great. And most firewalls let it through. You can use HTTPS to encrypt, too!

+1 interesting? +1 interesting? Funny maybe, but interesting? Where do I go to nominate this one for the CSM Hall of Fame??

--

Remainder of my .sig: be the majority of voters.

--
Save Maine's economy: write stuff down. All comments are exclusively my own, not my employer.
Re:Too hard? by EastCoastSurfer · 2003-03-19 05:48 · Score: 1

I didn't realize I was responding to the state of the job market, but I am aware that it is not too great right now. Actually it is not too good for anyone currently. The economy as a whole isn't doing that well.

They want outsourcing companies in India, and H1-Bs for the jobs that are still here.

I've dealt with these. If you know exactly what you need, have requirements defining the entire process and don't want/need any innovation then yes this way can work. Most companies need innovation so while you can farm out the mundane part of churning out the code you still need someone to design and architect the solution.

What jobs *are* available are generally for slave wages, with completely unreasonable and illogical minimum skill requirements.

So what constitutes slave wages? 25k/year, 50k/year, 150k/year? The economy just passed through a period where there were a lot of jobs overpaying what they were worth. Dot-coms paid many people much higher wages than what they were really worth and now these people expect to find that same money again elsewhere. I'm sorry, but that isn't going to happen.

Half the developers in NYC are probably close to their last unemployment check by now.

Have you thought of moving? The cost of living is extremely high in NYC as is. Plus you are competing with many more people for any given job. I spoke to a recruiter the other day about finding a candidate and he said that he is actually seeing the job market starting to improve in his area(southeast) for skilled IT people.

I'm guessing from your post that you're tucked away somewhere safe, and in deep denial about the state of our field.

No on is ever safe. You or I or anyone else can pretty much be replaced at any time. The company I work for is not having the best of times right now, but that's how it is in a down turn economy. I am also not in denial about our field. In fact I think our field is going through a pruning process right now. There are many who don't have the drive or competence to call themseleves IT professionals and this downturn will weed them out.

I also see our field changing during these times. Businesses want IT professionals who don't just understand the tech side of how things work but also the business side. Coding this new Whizbang app in the new language C+++ is cool and all, but is it good for the client/company? Most companies want solutions that will help them add to their profitability. Very few pay IT people just for fun.
Re:Too hard? by crazyphilman · 2003-03-19 07:51 · Score: 1

I think you are basically optimistic in nature, whereas I am a sour pessimist. However, I can see why you would have your point of view, just as you can see where mine comes from. As I respect your view, I'm sure you'll understand and respect my opinion that this isn't merely a "pruning" but rather a plowing-under and asphalting over. In my opinion, it's pretty much over for IT in this country (except for civil service, academia, and government).

As far as many people working for dot-coms being overpaid, that wasn't true where I worked. I won't go into too many details, but I worked in software engineering at the time, and none of us were overpaid. I had one of the better salaries among my friends because some of my work was getting patented and it was making the company a lot of money, and even then I was only making 60K. I only got that because I threatened to quit -- the 45K they had been paying me was putting me into debt (I know from firsthand experience how expensive NYC is). So much for all those well-paid dot-com programmers, eh?

As for me, I got smart and moved far upstate. I work in civil service, now, and I'm in a strong union, so I have great benefits AND job security (remember job security?). I'm never going back to private industry... I'd go to BOCES and become a plumber first. And, yes, I am a real-live programmer (currently I'm building enterprise systems for my employer, and loving every minute of it -- you private industry guys have no idea how nicely a department can be run when your bosses aren't driven by greed).

Anyway, we'll agree to disagree. I really hope you don't find out firsthand how bad I think it's going to get. Honestly. It's going to get rough, and I don't think it's going to come back, ever. Ask the steel workers, or the auto workers, how things turned out for THEM. It's almost the same exact situation, played out in three different decades.

--
Farewell! It's been a fine buncha years!
Re:Too hard? by perfessor+multigeek · 2003-03-19 09:55 · Score: 1

What do you think about the viability of programmers transitioning to less desktop computer/server work? Things like writing code to run automated sprinklers with AI tied into temperature and light sensors?

From where I stand, I'm seeing a huge market cropping up for all those devices with built-in intelligence that have been vaporware for years. The example above is something a horticulturalist would love to have, expecially if they could specify species and thereby what reflectance and temp data would esult on what water use. Other examples include sophisticated controllers for all those LED lighting systems just now in development, systems for salt water fish tanks, water treatment systems, vison/AI hybrids to watch animal behavior on farms, and the umpty-bazillion toys with mech/slectronics capabilities.

It looks to me like electronics to drive physical devices is about to experience a boom as big as or bigger than the dot-com boom. So I'm curious, how do you think C++ --> Labview --> White River looks?

Rustin

--
Data is the lever, rigor the fulcrum, brains the force that drives it all.
Re:Too hard? by crazyphilman · 2003-03-19 12:06 · Score: 1

Well... Actually you bring up a very interesting possibility. But it's not going to be any kind of corporate job; I think companies are going to make the interfaces for such devices very point-and-click. The actual programming is going to be done in the cheapest possible location (e.g. India, or malaysia, or someplace like that). However, human nature being what it is, most people will find themselves somewhat, ah, "challenged" in dealing with the devices. And, I don't think they'll be able to hack them without assistance. So...

I think there's a good possiblity that we could find grey-market work tinkering with and hacking devices like this for individuals. If we can acquire the right interface for doing actual programming on the device, we could theoretically replace a lot of the existing stuff on it with our own, and sell that too.

This kind of thing would be local, word-of-mouth stuff, like handyman jobs used to be. It's not as good as a *normal* programming job, but at least we won't starve.

Another possibility is, we could be able to trade LAN admin skills for free rent, building-manager style. Apartment complexes might start building up their own hotspots and such, and they'll need someone to handle the tech support. Handymen at complexes get free rent, so does the super, why not the tech guy? So you spend some time keeping hackers out, and replacing broken cables, etc. Could be cool.

--
Farewell! It's been a fine buncha years!
Re:Too hard? by crazyphilman · 2003-03-19 12:18 · Score: 1

I just realized I didn't answer the whole question... Sorry, ha ha, I'll add this one in.

I think that a core group of programmers will do what I did, i.e. get a good job in civil service or government and stay programmers until they retire. Others will gravitate back to academia, which works pretty well too. I think you can write off private industry; it's toast.

I think many programmers are going to end up doing all sorts of other work, mostly some variety of tech support at a much lower wage. It's what they know best, you know? So... Probably they'll gravitate towards community and local colleges, end up in the basement with the tech support team, keeping their network alive and hackerless. Some will end up doing programming for these institutions too.

There'll probably always be work at small offices, running cable and keeping the PCs running, but that'll be very low-wage. People might be able to work as a contractor, billing by the hour with a bunch of clients and a small pick-up or something, kind of like a landscaper but without the lawnmowers.

I don't know... Looks kind of bleak to me, but not hopeless.

--
Farewell! It's been a fine buncha years!
Re:Too hard? by perfessor+multigeek · 2003-03-19 15:30 · Score: 1

Another possibility is, we could be able to trade LAN admin skills for free rent, building-manager style. Apartment complexes might start building up their own hotspots and such, and they'll need someone to handle the tech support. Handymen at complexes get free rent, so does the super, why not the tech guy? So you spend some time keeping hackers out, and replacing broken cables, etc. Could be cool.
Funny you should mention that. I just finished rereading a cyberpunk novel where the main character makes her living doing just that for several years at an arts cooperative. Finding better printer drivers for the fabric printers, keeping out hackers, running the renderfarm. It's portrayed as a very comfy life with a deeply seamy underside (collecting obligatory data for the cops, living among but not of a community) in a way that rings true to me as a former tech support guy.

I agree. We should expect to see a lot more techs moving to less "business-y" jobs where they hold hands of clueless users in more widely spread aspects of their lives. This sounds likely. For most techs it may also be really miserable but better then leaving tech.

Well, I guess that we'll start seeing it soon.

Rustin

--
Data is the lever, rigor the fulcrum, brains the force that drives it all.
Re:Too hard? by crazyphilman · 2003-03-21 03:44 · Score: 1

It sure beats handing out fries at MacDonalds. Besides, it might be fun. Set up the core as Linux, run cable from your apartment building's central server to all the apartments (or use Wi-Fi), let people hook up whatever they want to the LAN, maybe set up some kind of groupware, so they can all talk to one another... It has potential.

--
Farewell! It's been a fine buncha years!
Re:Too hard? by dfn5 · 2003-03-28 03:48 · Score: 1

Oh, and if you're making web-based apps, wtf are you using C for?
To build PHP.

--
-- Thou hast strayed far from the path of the Avatar.
Re:Too hard? by Anonymous Coward · 2003-03-28 08:28 · Score: 0

your sig has guaranteed misspelled.
Re:Too hard? by Sayjack · 2003-03-28 13:03 · Score: 1

It's better to go with a general purpose parser because your needs today are not your needs of tomorrow. Also, you may not use the more advanced
features of XML but what happens when some other
vendor sends you a file which does? What happens when the day you realize that a parser which supports W3C schema is the best thing since sliced bread? What happens when you decide that a parser which supports XPath could have saved you thousands of lines of code?

In Java I plug into something based upon jaxp so that I can seamlessly switch parsers if necessary. And this move saved me loads of trouble when the early version of Xerces started failing after the product went to test. Plugging in Crimson took 5 minutes and worked flawlessly. I imagine that there are similar alternatives (like jaxp) in other languages.

A good rule of thumb:

It's better to have a feature it and not need it than to need a feature it an not have it.

--
-- Good judgement comes with experience. -- Experience comes with bad judgement.

Re:He's not the only one! by Anonymous Coward · 2003-03-18 01:14 · Score: 0, Redundant

TROLL... Goatsex link Somebody please mod parent down.

Hah. by termos · 2003-03-18 01:15 · Score: 3, Funny

They should only be glad not to be coding cobol, intercal or befunge!

--
Note to self: get smarter troll to guard door.

Re:Hah. by jkrise · 2003-03-18 01:29 · Score: 1

They should be glad they didn't discover E=mc^2... I'd be very uncomfortable

--
If you keep throwing chairs, one day you'll break windows....
Re:Hah. by Anonymous Coward · 2003-03-18 01:38 · Score: 0

They should only be glad not to be coding cobol, intercal or befunge!
What's wrong with Cobol? I only coded in it for a class, but it seemed perfectly fine for report generation and such. In fact, it was great for those that had fixed column sized printouts, etc. It's a bit wordy, but otherwise a great language.
Re:Hah. by Anonymous Coward · 2003-03-18 01:45 · Score: 0

you do realize you can do fixed width output with other languages.

Now tell us what happens when your fixed width field is 80 characters and you now need 100.
Re:Hah. by Surak · 2003-03-18 01:52 · Score: 3, Funny

It's a *bit* wordy?

Son, there are professional Cobol programmers who HAVE NO FINGERS LEFT.

Join the Cure. We're trying to raise $5 billion to cure Cobol Fingers through transplants.

Call 1-800-I-REALLY-REALLY-USED-TO-BE-A-COBOL-PROGRAMME R

Today!

--
My journal has hot /. gossip.
Re:Hah. by Anonymous Coward · 2003-03-18 04:31 · Score: 0

Yet another clueless dick. You should be praying for a brain transplant -- yours is dead.
Re:Hah. by Anonymous Coward · 2003-03-18 05:28 · Score: 0

Anyone that can dial that is obviously not a Real COBOL Programmer.
Re:Hah. by Anonymous Coward · 2003-03-18 06:56 · Score: 0

How am I supposed to dial without fingers, you insensitive clod?

yeah yeah,skip the
how did you type this - and - why don't you use other pointy body parts to type - comments :)
Re:Hah. by desktopheap · 2003-03-18 20:44 · Score: 1

this is the funniest thing i've seen today.
thank you.
that is all.

( mention JCL and i might puke on your shoes, tho )

--
Jesus died for your sins. Make it worth his time.

Really? by leecho · 2003-03-18 01:16 · Score: 3, Interesting

Well, programming *is* a hard task, and simplifying it is about building layers and layers of better abstractions to machine code and binary data.

Without XML, what would you normally do? Create a flat text file and read it using whatever syntax you'll like that day. I agree XML is ugly as hell to type in manually, but at least it's a standard, and every programming language in use today can handle it in a standard way - DOM, SAX, whatever.

Re:Really? by protocoldroid · 2003-03-18 01:23 · Score: 1

agreed. XML isn't perfect, but, what programming language is? When I'm stuck outputting a text-like file (that i'll have to input again), I definately use XML. I like the standard, and it makes me more organized. Saying "XML is hard" is like saying "math is hard". all depends on the situation and the person using it. (1 + 1 = 2) == ([hotdog] all beef frank [/hotdog])
Re:Really? by Anonymous Coward · 2003-03-18 01:26 · Score: 5, Funny
To paraphrase:

XML is like:
- * SGML without configurability
  * HTML without forgivingness
  * LISP without functions
  * CSV without flatness
  * PDF without Acrobat
  * ASN.1 without binary encodings
  * EDI without commercial semantics
  * RTF without word-processing semantics
  * CORBA without tight coupling
  * ZIP without compression or packaging
  * FLASH without the multimedia
  * A database without a DBMS or DDL or DML or SQL or a formal model
  * A MIME header which does not evaporate
  * Morse code with more characters
  * Unicode with more control characters
  * A mean spoilsport, depriving programmers the fun of inventing their own syntaxes during work hours
  * The first step in Mao's journey of a thousand miles
  * The intersection of James Clark and Oracle
  * The common ground between Simon St. L and Henry Thomson
  * The secret love child of Uche and Elliotte
  * Microsoft's secret weapon against Sun's Open Office
  * Sun's secret weapon against Microsoft's Office
  * The town bicycle
Re:Really? by phrantic · 2003-03-18 01:37 · Score: 2, Insightful

If programming was easy everyone could/would do it.

Yeah i am sure that someone can make a compiler than allows you to feed in pseudo code in clear English, written with crayons on the back of a ceral packet, but you are robbing Peter to pay Paul, you will have to take the hit somewhere....

--
--My sig is bigger than your sig--
Re:Really? by Anonymous Coward · 2003-03-18 01:43 · Score: 0

XML is like:

* SGML without configurability
* HTML without forgivingness
* LISP without functions
* CSV without flatness
* blah blah blah
Given that SGML is overconfigurable, HTML is too forgiving, LISP functions are overkill for a data format and CSV is too flat, I'd say these are points in XML's favour. I'm sure your other quibbles are just as trivially wrong, but I can't be bothered to read them.
Re:Really? by Anonymous Coward · 2003-03-18 01:49 · Score: 0
you forgot one...
- Slashdot without the goatse
- Jon Katz without Columbine
- Bill Clinton without a sex-drive
- a GNU/Hippie without the body odor
- Iraq without Saddam Hussein
Re:Really? by Anonymous Coward · 2003-03-18 01:50 · Score: 2, Interesting

While Lisp functions are overkill for a dataformat, lisp syntax (sexps) are not, and are more mature, simpler, and clearer than XML.

Even for C programs, I tend to use Lisp sexps as my persistent file format.
A simple Lisp parser is smaller and faster than an XML parser, for no loss of expressivity.
Re:Really? by Anonymous Coward · 2003-03-18 01:57 · Score: 0

Heh.

No kidding.

In related news, Dennis Ritchie said C is too hard for BASIC programmers to learn.
Re:Really? by Anonymous Coward · 2003-03-18 02:23 · Score: 0

Without XML, what would you normally do?

Use a DBMS.
Re:Really? by dubious9 · 2003-03-18 02:38 · Score: 0, Offtopic

First, please mod myself and parent down as off-topic.

Dear Governor Bush

Blantant disrespect. President Bush did not break any laws becoming president. Both He and Gore tried as hard as they could. Tell me Gore would have not persued similar actions if the rolls had been reversed. Two, the presidential election is not a popular vote. Blame the system if you are unhappy with the result. Dubbya is president. Get over it.

Just like when you went AWOL while the poor were shipped to Vietnam in your place.

G.W. was a fighter pilot in the reserves. There was a good chance that at any time his unit could have been called up. Ingoramus.

Two, I support war. I personally have serval close friends in Kuwait, and offshore. The timing may be dubious, but Saddam needs to be disarmed and he's had more than a decade to do it. What reason do you think that he'd disarm himself.?

Point: Pacificism and indesicion got the world into a heap of trouble before world war 2.

Also, when my friends are out there fighting Iraqis, they will be fighting a lot of French military equipment. Also French subborness pushed a gun-ho president into war, when they could have at least considered a new resolution, and not flatly rejected it even before Iraqis. If the French wanted to delay war they could have. They just wanted war under no curcumstances. How could that be a rational option?

Please someone explain the anti-war justification to me. I consider myself an open minded indiviual, for what reasons should we not go to war?

--
Why, o why must the sky fall when I've learned to fly?
Re:Really? by Anonymous Coward · 2003-03-18 03:03 · Score: 0

Do you know how slow and/or buggy the XML parsers available for many languages are?

Creating a custom format for everything is obviously doing extra work, but using XML for everything rather than using something appropriately scaled for the task is most certainly not what I would consider good design.

Using something overcomplicated and idiosyncratic (XML cannot be read using a normal lexer and parser, it is context-sensitive in very ugly ways) results in unnecessary code bloat and more bugs.
Re:Really? by Anonymous Coward · 2003-03-18 03:52 · Score: 0

Looks like you just fired the first salvo of the 'war on spelling'.
Re:Really? by stand · 2003-03-18 03:54 · Score: 4, Informative

It is customary to attribute quotations when you publish them. Otherwise it's called plagarism. Credit where credit is due and all that.

Unless, of course this particular AC is Rick Jelliffe, in which case I apologize.

--
Four fifths of all our troubles in this life would disappear if we would just sit down and keep still. -C. Coolidge
Re:Really? by captredballs · 2003-03-18 05:50 · Score: 1

Because it is hastily planned and executed attack that will destabilize an already unstable part of the world, also setting a precedent for "preemptive strikes" , which are fundamentally opposite to the legal ideals our country was founded on, thereby invalidating any claim we had to the moral high ground.

And don't forget that American soldiers will be breathing the American nerve gas we gave them in the 80's.

--

I suppose I'm not too threatening, presently, but wait till I start Nautilus
Re:Really? by Anonymous Coward · 2003-03-18 06:17 · Score: 0

Because it is hastily planned and executed attack that will destabilize an already unstable part of the world,

There is a definate possibiliy to this, but there is also the possibility that a free,democratic,secular and prosperous Iraq will usher in a new era of middle east stability. also setting a precedent for "preemptive strikes" , which are fundamentally opposite to the legal ideals our country was founded on

If you thought that somebody was aiming and preparing a gun at you, you could justifiably fire back first and it would be considered self defense. Also the person aiming the gun has unjustifiably used it in the past, there is proof that he has a gun, and he admitted the he once had a gun not too long ago.

Also, we are not convicting Iraq us a crime against us, we are preventing further abuses against his own people and his nieghbors, evidence of which is well documented, as well as protecting ourselves.

And don't forget that American soldiers will be breathing the American nerve gas we gave them in the 80's.

When combating with unsavory characters, you have to deal with unsavory characters and it frequently comes back to haunt you. Also see Afgan US anti-aircraft missles. What would your alternative be?
Re:Really? by Feztaa · 2003-03-18 06:43 · Score: 1

I agree XML is ugly as hell to type in manually,

I disagree. Recently I've seen a number of programs change their configuration file syntax from "hacked up mishmash" to wellformed XML, and readability increased dramatically. With the old syntaxes, I had to look up how to do this or that in the manual, I constantly forgot how to write it... it was very annoying. On the other hand, the XML is pretty much self-documenting. All the tags have long, descriptive names, etc, and everything Just Makes Sense(tm).

XML might be a bitch to parse from the programmer's perspective, but it's a treat to parse from the user's perspective (well, my perspective as a user, anyway).
Re:Really? by Anonymous Coward · 2003-03-18 07:42 · Score: 0

Hey, quit hogging the bong.
Re:Really? by Anonymous Coward · 2003-03-18 09:23 · Score: 0

Well, programming *is* a hard task...

I strongly beleive that like anything, programming is only difficult if you don't understand what you are doing.

XML technologies are not particularly difficult to grasp. It's just text manipulation and tree traversal. If that's too hard, maybe you should look into finding a new career.
Re:Really? by Anonymous Coward · 2003-03-18 11:18 · Score: 0

A flat file Windows ini-type config file can just as easily use long descritive names.
RLV=123
is just as likely as
ReallyLongVariable = 123
and MUCH nicer than
<CONFIG>
<ReallyLongVariable>
<Value>
123
</Value>
</ReallyLongVariable>
</CONFIG>
but no more (or less) intuitive. However, it's pretty fucking obvious how to comment out lines from ini files; how do you comment out (or comment for that matter) an xml file? How is xml more self-documenting than ini? That's just a load of horse shit.
What your example shows is that someone made a really bad flat file format, and then put some thought into it later, and made it xml instead of flat file. I bet if you took that godawfully huge xml config file, and flattened it, keeping all the variable names intact, you'd like it even better.
Re:Really? by Anonymous Coward · 2003-03-18 11:42 · Score: 0

None of those programs can be downloaded from the apache.org domain.
Re:Really? by Bedouin+X · 2003-03-18 13:36 · Score: 1

That could just as easily be:

Which isn't so bad.

--
Dissolve... Resolve... Evolve...
Re:Really? by juhaz · 2003-03-18 15:06 · Score: 1

However, it's pretty fucking obvious how to comment out lines from ini files;

Commenting lines out of random flat text config files is _OBVIOUS_ of all things? How on earth could ANYONE arrive at that conclusion after seeing any number of such files generated by different applications is way beoynd me.

There is no real way to know whether you comment something out of an unknown file with no previous comments by beginning the line with #, //, ;, %, ' or <INSERTYOUROWNHERE>, or maybe there's no commenting out lines and it's c-style /* */ pair, or maybe it's not even possible to comment anything out!

how do you comment out (or comment for that matter) an xml file?

Like html file.



It may be a bit harder than ^#, but at least it's CONSISTENT across every possible xml file on the planet and doesn't change according to the mood of the programmer, phase of moon, solar spot activity and alignment of planets. Also multi-line comments are better suited for xml because you usually don't comment out a row, but element that may or may not span multiple rows.

But XML is great for computers... by Max+Romantschuk · 2003-03-18 01:18 · Score: 1, Insightful

First of all IDNRTA (I Did Not Read The Article).

Writing XML by hand sure is no picnic. But I don't see writing XML by hand as something we should strive to do.

XML is great for file formats. It's waaay better than binary formats. It's not as compact, but that is rarely an issue these days. Having a standard, structred, text-based, and editable-by-hand-when-necessary format is a godsend. Period.

--
.: Max Romantschuk :: http://max.romantschuk.fi/

Re:But XML is great for computers... by Max+Romantschuk · 2003-03-18 01:21 · Score: 3

First of all IDNRTA (I Did Not Read The Article)

OK... This is exactly why you SHOULD read the articles... I just posted blatantly off topic due to an annoying quick-read = misread mistake... yay me :)

Mod me down, I deserve it ;)

--
.: Max Romantschuk :: http://max.romantschuk.fi/
Re:But XML is great for computers... by Anonymous Coward · 2003-03-18 01:23 · Score: 0

First of all, IDNRYSP (I did not read your stupid post), but I agree with everything you said, except for your stance on XML, the situation in Iraq, and your favorite pop-sicle flavor.

And I mean it!
Re:But XML is great for computers... by CoolVibe · 2003-03-18 01:26 · Score: 5, Insightful

Having a standard, structred, text-based, and editable-by-hand-when-necessary format is a godsend. Period.
You mean like most other non-xml config files in /etc, like say hosts, DNS zone files, named.conf, passwd/shadow, hosts.allow/deny, sendmail.mc or resolv.conf (etc. etc.)? These have standard layouts, text-based, can be edited by hand and can be easily parsed.
My point: XML is over-used for a lot of things. In some places it makes sense, but in many places it doesn't.
Re:But XML is great for computers... by Ed+Avis · 2003-03-18 01:48 · Score: 5, Insightful

You mean like most other non-xml config files in /etc, like say hosts, DNS zone files, named.conf, passwd/shadow, hosts.allow/deny, sendmail.mc or resolv.conf (etc. etc.)? These have standard layouts, text-based, can be edited by hand and can be easily parsed.

You just gave the best argument for adopting XML as widely as possible. Yes, all these can be parsed (with the possible exception of sendmail's config files which may be Turing-complete) but they all require *different* code for each config file. If they were in XML you'd still need different semantic code, of course, but a whole wodge of syntax issues (how do I quote strings, how do I escape newlines, how do I mark nested scopes, what happens when the string delimiter character occurs inside a string, how do I deal with comments, what is the character set, is there a formal grammar for the document, etc etc) would be dealt with. Maybe not in the way that you or I think is perfect - IMHO XML is a little bit verbose compared to say Lisp- or Tcl-style encodings. But they would be dealt with *once*. No need to learn a new or almost-the-same-but-slightly-different set of syntactic conventions for every single config file.

Maybe XML is over-used for a lot of things, but making up your own file format is definitely over-used a lot more. Simple line-oriented files are reasonable to have as plain text, for everything else please avoid the temptation to reinvent the wheel by devising a new syntax and block structure.

--
-- Ed Avis ed@membled.com
Re:But XML is great for computers... by Xformer · 2003-03-18 02:13 · Score: 1

You would still need different parsing code for each file. Sure, the general XML portion of the syntax would be the same, but you would still need an understanding of the expected data that would be different in every case.

If there's one good thing about XML in those cases, it's expandability. Suppose you wanted to add an XYZ field in the middle of a password file record. With the current format, that would be quite difficult. With XML, you'd just define a new element (optional or required) that could then be inserted and not throw off parsing of any other part of the file.

It would still be a nightmare to maintain, though, unless you made heavy use of management utilities.

--
All I want is a kind word, a warm bed and unlimited power.
Re:But XML is great for computers... by EvilTwinSkippy · 2003-03-18 02:22 · Score: 1

My point: XML is over-used for a lot of things. In some places it makes sense, but in many places it doesn't.
Ok. Name one.
(Silence)
You see a standard actually requires WORK to adhere to. I used to work with the SECS protocol for electronic transmissions in the semiconductor industry. Let me tell you, that was about 60 pages, single spaced. In the end we had a Tcl library that encode and decode from the SECS standard into something our software could use.
Now once you have a standard encoding and decoding suite I fail to see why it matters if you code it in XML, morse code, or smoke signals.
What I like is the fact I have only 1 parser to deal with, and that parse works with my current scripting solution. AND that format has a parser for most of the software I intend to ship software to. QED.

--
"Learning is not compulsory... neither is survival."
--Dr.W.Edwards Deming
Re:But XML is great for computers... by Anonymous Coward · 2003-03-18 02:26 · Score: 4, Interesting

Right, so instead of using one regexp for /etc/hosts and another regexp for /etc/passwd, I'd have to use ten pages of getTheGodDOMObjectFromTheGodDOMXMLFile crap for /etc/hosts.xml and another ten pages for /etc/passwd.xml.

How, exactly, has XML simplified *anything*?
Re:But XML is great for computers... by FatalTourist · 2003-03-18 02:34 · Score: 1

First of all IDNRTA (I Did Not Read The Article).

Second of all, RTFA (Read the Fine Article)

--

Escape Pod Films: Sketch Comedy and Web Series
Re:But XML is great for computers... by Anonymous Coward · 2003-03-18 02:39 · Score: 0

One example of overuse is xml config files. I use to do all configuration that way until I woke up and started using scripts. It is much cleaner to define a builder with all the config methods: setPort(), setHostname, etc... Embed a script engine in your program, pass the builder into the script, the script calls the methods, done.
Get over the fact that code is in your config and look at it this way. The difference is that you're embedding the config data in a method call instead of between tags. Not much of a difference if you ask me.
Re:But XML is great for computers... by Ed+Avis · 2003-03-18 02:46 · Score: 2, Interesting

(Replying to AC post, please mod it up if you can.)

I admit that interfaces like DOM are rather clunky. But your regexps would break if a new field were added to /etc/passwd, or probably even if the format were changed to allow comments. So files like /etc/passwd become fossilized over time.

The answer is a better interface for reading XML files, one that knows about the format (which is described in a DTD or other grammar) and can present a neat interface like

passwd.user["abc01"].real_name

(or whatever the syntax of your preferred language looks like). DOM is so awkward because it knows nothing about whether a element would be present, or whether there might be more than one of them, or whether whitespace before and after the element is significant, so it has to provide an API to explicitly wade through all that just in case you want it. A tool like FleXML which knows that must appear exactly once and in a particular place can put it into a single field.

(Actually FleXML isn't ideal for this example because the parsing code it generates will stop working when the file format is extended, if new elements started appearing inside . But if you made the generated code only a little bit slower it could skip over these extensions to the file format, so existing apps would continue to work when new things were added to the DTD.)

The answer I think is for programming languages which better support XML, which can read a document and put it into the language's native data structures. Libraries like Perl's XML::Simple try to do this, but they do so without any knowledge of what the legal documents are, so the resulting interface is still rather awkward.

--
-- Ed Avis ed@membled.com
Re:But XML is great for computers... by rabidcow · 2003-03-18 03:32 · Score: 2, Informative

most other non-xml config files in /etc, like say hosts, DNS zone files, named.conf, passwd/shadow, hosts.allow/deny, sendmail.mc or resolv.conf (etc. etc.)

all these can be parsed but they all require *different* code for each config file.

Nonsense, if you're smart about your parser, you'll need about 3. If you're not smart about your parser, you'd probably design lousy XML anyway.

how do I quote strings, how do I escape newlines, how do I mark nested scopes, what happens when the string delimiter character occurs inside a string, how do I deal with comments, what is the character set, is there a formal grammar for the document, etc etc

afaik, most config files ignore these issues, but you could easily separate these options from the core of the parser. Pass them in as a traits class or something.
Re:But XML is great for computers... by arkanes · 2003-03-18 03:39 · Score: 1

An even BETTER solution is to use a standardized data format that doesn't add 30% overhead to all your files because of syntatical bloat, and that actually has a finalized, unambiguous spec so that there aren't 40 million ill-formed parsers out there.
Re:But XML is great for computers... by Ed+Avis · 2003-03-18 04:03 · Score: 1

I'm curious, what are the ambiguities in the XML specification?

--
-- Ed Avis ed@membled.com
Re:But XML is great for computers... by Ed+Avis · 2003-03-18 04:05 · Score: 1

I was thinking more from the viewpoint of some person who didn't design the file format of named.conf or whatever, but wants to write his own tool to parse it. If it uses an ad hoc file format he first has to parse that and then do semantic stuff. If it's XML, at least the first stage is already dealt with, and there are fewer things to worry about like 'have I handled nested comments correctly'.

--
-- Ed Avis ed@membled.com
Re:But XML is great for computers... by crazyphilman · 2003-03-18 04:09 · Score: 1

Max said: "XML is great for file formats. It's waaay better than binary formats. It's not as compact, but that is rarely an issue these days. Having a standard, structred, text-based, and editable-by-hand-when-necessary format is a godsend. Period."

I tend to agree. If you're going to store data in a flat file, it's a pretty good idea to set up an XML DTD for it, for a couple of reasons:

1. You can write up a stylesheet, and code a small viewer, so that you can peek at the data manually when you want to (and maybe even tweak it). Because of the way XML is set up, this is pretty easy to do.

2. XML is way easy to parse, especially if you are using a language with regexps. It just plain makes life easier.

I think the problem most people have is when they go beyond the basic idea of XML, i.e. a tagged file format for data files. It's all the weird extra tools that have been built up over the years that are making XML into a royal pain in the ass. Why do we need all these weird transformational tools? What's with all the parsing libraries? Why all this complexity? All you really need is the basic concept of it. Eecccch.

Sometimes I wonder whether the REAL industry behind things like XML is in selling you unnecessary courses and software.

--
Farewell! It's been a fine buncha years!
Re:But XML is great for computers... by Dalroth · 2003-03-18 04:33 · Score: 2, Interesting

In C# at least:
XmlDocument Doc = new XmlDocument();
Doc.Load("/etc/passwd.xml");
string Password = Doc.SelectSingleNode("/users/user[@name='dalroth'] /@password").Value;
Really doesn't seem that difficult to me. Bryan
Re:But XML is great for computers... by Zaiff+Urgulbunger · 2003-03-18 04:45 · Score: 2, Insightful

Indeedy.

And I've said it before, but I'll say it again -- XML as most people see it is *just* the serialised form of an XML structure. The same as Databases don't actually have to store lists of data in the order that you read it in.

But as you quite rightly point out, having a standard, very accessible (if slightly verbose), method to create and edit data structures is indeed a god send!

Here's an idea (which I've also said before!) - imagine if all those config files were XML based. So you could edit them using a text editor - same as now except slightly more cumbersome to edit.
But we're agreed that being able to use a basic tool such as a text editor is a good thing right?

Okay, so next up from that would be an XML editor so you can navigate the structure to find the element you want to tweak. The nice thing here is that you've got a standard tool that works with any XML file and therefore any config file.

You can also build standard tools to work with these standard files so automating the update of a number of config files would be easy.

Now lets go back to the whole thing about serialisation -- we're just manipulating data structures. The text-based, serialised form of these structures is called XML. The good thing is being able to edit with a text editor -- available on *any* platform including non-current platforms where no active development is occuring.

But we're not limited, and we can build tools to work these structures more effeciently. And we don't *have* to use the serialised form if we don't want too -- it just happens that at this point in time, were the tools are not as evolved as they will be, it makes sense to use the text based form.

In the future we could for example have a file system that is structured like an XML file? So then all those separate config files become part of the one structure, and thus even easier to manage.

I'm rambling, so I'll stop now! My points are simply that, yep XML isn't perfect but don't get too hung up on it's being large-verbose-text-files, but it isn't -- thats just how it is currently being presented. Instead look at how it bridges the divide between old school proprietry, closed, binary formats, and the accessibility of text files.
Re:But XML is great for computers... by 21mhz · 2003-03-18 04:49 · Score: 1

Add to that unified means for processing all those files. When I need to examine or modify my XML files programmatically, I often go no further than XSLT and off-the-shelf command line tools. Stylesheets taking 6-10 lines of actual code (neatly aligned and delineated at that) can deliver you from tedious and error-prone hack jobs using regular expressions and what have you. XSL transformations can be used for quick reporting, format upgrades, and even generation of XSL stylesheets themselves.

Now, try to whip up in 15 minutes a Perl script that modifies your favorite non-XML non-plain format (changing forwarders in the options section of named.conf is a good example that came to my mind recently) and see if you didn't screw up somewhere (remember, there can be nested blocks, comments, quoted strings resembling actual syntax and other joyful stuff). Simple? I hope you know your Perl well, my friends.

--
My exception safety is -fno-exceptions.
Re:But XML is great for computers... by ProfKyne · 2003-03-18 05:08 · Score: 1

Case in point: make's custom build file format vs ant's XML-based build file format. Better make sure you don't have any tabs in the first character column of a make build file!

--
"First you gotta do the truffle shuffle."
Re:But XML is great for computers... by Anonymous Coward · 2003-03-18 05:30 · Score: 0

It's not difficult, it's just awful.

gawk -F: '/^dalroth:/ { print $2; }' /etc/passwd
Re:But XML is great for computers... by Smallpond · 2003-03-18 05:32 · Score: 2, Insightful

Had you read the article, his point was that you shouldn't have to slorp in the whole file just to read one field. In fact, he's using perl and regexp to avoid having to do things like Doc.Load.

The author claims that existing tools are oriented toward either converting to a big internal data structure, or to processing gradually using callbacks, neither of which is optimal for small fast code or simple programming.
Re:But XML is great for computers... by CoolVibe · 2003-03-18 06:12 · Score: 1

named.conf is pretty easy to parse. If you're unsure, you can always resort to yacc/bison to build a grammar file and build a parser that way. It's ages old, and really not that hard once you've rtfmmed a bit about it. And it's very flexible too.
Same goes for perl. You can do a lot with a well built regular expression. I'd handle the named.conf case by building a regexp that matches one zone, cut and parse everything neatly into a nice data structure (while ignoring whitespace and comments). Then I just regenerate the named.conf from the data I've gathered. Don't whine that that gobbles memory, because an XML parser does basically the same when modifying xml files.
In case of a forwarder entry, no problem, I've done it before, and no, it wasn't hard. I never needed XML to do it. I've done parsers in C too. It's just cutting the data up into convenient chunks and dealing with the data. Since the most basic config files in /etc have a simple structure, parsing them is really really easy.
I can see XML being used to create portable documents, or but I fail to see the use for it regarding simple config files. XML means eXtensible Markup Language, not I Will Build A Kitchen Sink With This language.
Re:But XML is great for computers... by CoolVibe · 2003-03-18 06:15 · Score: 1

(sorry for reply to self)
Oh, xml is also great to store general data that would normally be delimited bt tabs or colons, but another poster already commented on that.
Re:But XML is great for computers... by 21mhz · 2003-03-18 06:30 · Score: 1

To summarize, all you have to do to properly modify a value in all possible cases (remember, you need to track nested blocks, there can be ugly things in comments, and other unforseen stuff) is to build a custom parser, use it to navigate the internal structure of the document and spit modified data back. For every other format in /etc, rinse, repeat.

With an XML-based file of similar complexity, you write a 10-liner XSL stylesheet. The rest of the work is already done for you.

--
My exception safety is -fno-exceptions.
Re:But XML is great for computers... by CoolVibe · 2003-03-18 06:41 · Score: 1

Not for every file in /etc... for some (like resolv.conf for instance) you don't even have to parse that hard. passwd/shadow are easy too (colon delimited), and sendmail.mc isn't the line noise you think it is.
XML is overkill for most stuff that's in /etc
Re:But XML is great for computers... by Anonymous Coward · 2003-03-18 06:45 · Score: 0

Uh, yeah but all those files have their own formats and semantics. They are also meant to be edited by hand by sysadmins, thus needing to be human readable as well as machine readable. Unix config files are usually flat formatted, easily edited in vi. I agree that XML is not needed in properties files for small programs, but when programs get complex and need structured data, what is better? EDI? RTF?
Re:But XML is great for computers... by rabidcow · 2003-03-18 06:46 · Score: 1

I was thinking more from the viewpoint of some person who didn't design the file format of named.conf or whatever, but wants to write his own tool to parse it.

Still, most of them are pretty similar. Probably the only reason there isn't a generic parser is that the parsing is so simple that no one thinks it's worthwhile.

Nearly all of them are a keyword followed by some number of whitespace-separated tokens. Lines starting with # are comments (and often ;). Meaning for the fields is derived from order rather than named attributes. You've got practically all the parsing you need in std::istream.

It's ad hoc, but it's still pretty standardized. There's more variation than XML, but it's still a lot easier to write a parser for. (granted you don't have to write one for XML, but *someone* did)
Re:But XML is great for computers... by xv4n · 2003-03-18 07:25 · Score: 1

totally agree. Been using ant for some time now and the xml build file doesn't seems to be that obscure once you get a hold of it. And the benefits compared to make are considerable.
Re:But XML is great for computers... by Ace+Rimmer · 2003-03-18 08:02 · Score: 1

How would he know that there is no other element (or whatever) he wanted if he hadn't read the whole document (regexps don't understand dtd, do they)? Sure, regexps also process the whole document.

Not to mention there are also some XML databases. So you can load documents into them and then ask for some portions more conveniently (and effectively).

--
:wq
Re:But XML is great for computers... by Sentry21 · 2003-03-18 08:43 · Score: 1

You mean like most other non-xml config files in /etc, like say hosts, DNS zone files, named.conf, passwd/shadow, hosts.allow/deny, sendmail.mc or resolv.conf (etc. etc.)? These have standard layouts, text-based, can be edited by hand and can be easily parsed.

Ever handled a 500-entry named.conf? Ever found that you only have 499 } characters in the document?

The beauty of XML, as I see it, is not only that it can be easily parsed by any XML parser, but that it can be easily *validated*. I was playing around with XML parsing for the conf files of an IRC bot that a friend is writing, and here's what I found.

The format of the file he has now is user:hostname:somethingelse[:optionalpassword], which is fine, but it's overly terse for no good reason, and if you put the wrong data in the wrong place, the program will choke and die without a good reason - or the program will work fine and behave oddly - or it will work fine period and ignore the mistake. There's no way of knowing how it will behave. Named for example will give no outward indication whien you load it up that the conf file is malformed, but will gladly completely forget about all domains after the error silently.

With XML, on the other hand, mistakes can be detected. When I typo'ed something in my bot's XML file (I forgot what value I'd used) and then tried to run the test program, it errored out, not just saying 'problem parsing the file', but told me what line, what the problem was, and why it was a problem, even including the relevant lines and context and highlighting the error. This is one libxml2 function that I run, and it parsed through the whole file and told me what was wrong. If I'd had a DTD, it could have (as I understand it) told me if I was missing tags or had invalid tags in the file as well.

However, if I want to parse out hosts, bind zone files, named.conf, passwd, shadow, hosts.allow/deny, sendmail.mc (icky) and resolv.conf, I have to write special cases for every one of them, because there's no standard way to form a proper document tree. Apache's conf file is another good example - it's always always always 'Variable value' or 'Variable "Value string"', except when it's not. Special cases are a nightmare.

There are a lot of places where XML just plain doesn't make sense - resolv.conf for example, practically takes less time to write by hand than it would to parse by code, and there are so few possibilities. Other options, however, such as named.conf, could benefit.

In my personal case, however, it's all irrelevant, because I'm moving all my services to use LDAP, which is something no one ever considers, but should. Still...

--Dan
Re:But XML is great for computers... by Anonymous Coward · 2003-03-18 09:58 · Score: 0

You just gave the best argument for adopting XML as widely as possible. Yes, all these can be parsed (with the possible exception of sendmail's config files which may be Turing-complete) but they all require *different* code for each config file. If they were in XML you'd still need different semantic code, of course, but a whole wodge of syntax issues (how do I quote strings, how do I escape newlines, how do I mark nested scopes, what happens when the string delimiter character occurs inside a string, how do I deal with comments, what is the character set, is there a formal grammar for the document, etc etc) would be dealt with.
Yes, but this is a problem that was solved in the 70s by the use of lex and yacc. Frankly the lex/yacc parsers of this sort of file is substantially smaller than the XML semantic code that you need along with the schemas, etc, etc, etc.
Re:But XML is great for computers... by Anonymous Coward · 2003-03-18 10:00 · Score: 0

>with the possible exception of sendmail's config >files which may be Turing-complete)
You are stupid beyond belief. Do you know what
Turing complete even means ? Do you know that
most programming languages are turing complete
yet CAN BE PARSED ?
Re:But XML is great for computers... by 21mhz · 2003-03-18 10:25 · Score: 1

sendmail.mc isn't the line noise you think it is.

Funny, but I do think that sendmail, was it designed in these days, might well end up with configs in XML. It's so sweet for describing rules and painlessly extending them.

--
My exception safety is -fno-exceptions.
Re:But XML is great for computers... by Ed+Avis · 2003-03-18 11:29 · Score: 1

What I meant was that Sendmail might use a macro language which is Turing-complete, so you cannot properly parse the files without expanding the macros, and to expand the macros you end up running a whole Turing-complete programming language.

I meant that the *file format* may be a Turing-complete macro system. Obviously a language such as C is Turing-complete, yet its file format is decidable. But there are some languages where the syntax can be changed by executing bits of code at compile time - ie, where there the line between compile time and run time is blurred. Perl is one such language; you cannot in general parse Perl code without also executing some of it. I believe TeX is another. Sendmail's config files might be the same.

--
-- Ed Avis ed@membled.com
Re:But XML is great for computers... by Anonymous Coward · 2003-03-18 11:48 · Score: 0

...and can present a neat interface like

passwd.user["abc01"].real_name

only you can't. There is absolutely no way to do that in XML without programming a fixed document structure in. If you do so, you are no longer using XML, you are using an HTML-like markup syntax. It is not extensible, and if it "passes" XML validation, it is only a coicidence, because nothing is gained by that, except that you can use an XML "parser" to do the functional equivalent of reading a line (or column) at a time.
Re:But XML is great for computers... by J.+Random+Software · 2003-03-18 14:24 · Score: 1

Certain details of the format have to be fixed (because if you don't know anything about your input you can't really consume it, only ignorantly transform it) but the format can still be extensible. For example, that could be implemented in XPath by
//user[@id="abc01"]/@real-name
which will search the entire document for the matching user element, no matter which other attributes are in use and which elements (LDAP info? groups?) may be nested around or inside it.
Re:But XML is great for computers... by fymidos · 2003-03-18 15:18 · Score: 1

So, i will need an xml parser/editor to fix a machine that has gone down and i am left with init 1.

Just for the record, it's not like there was no other way to have a config file. People chose to have a plain text file for many reasons, and it's just better this way.

XML for a spreadsheet file format - just great. For configuration files, no thanks it's just stupid.

--
Washington bullets will simply be known as the "Bulle
Re:But XML is great for computers... by fymidos · 2003-03-18 15:47 · Score: 1

I completely fail to understand the need for such a complex config file. What's next, maybe have main() on the config file, so we can reaaaaally configure the program?
I understand the need for (let's say) a window manager to have all the options of various programs on one file for speed, even though there is a better way to trade simplicity for speed and that is a binary file.
No single program should need something like this though.

--
Washington bullets will simply be known as the "Bulle
Re:But XML is great for computers... by Ed+Avis · 2003-03-18 22:56 · Score: 1

OK, you can parse an existing config file with yacc and lex or bison or Antlr or other parser generators.

But how do you know the grammar? How do you know that the parser you have written will handle all the instances of that file format?

The nice thing about XML is that you can publish a DTD for your file format, and that defines it. Everyone can understand what the DTD means since DTD is itself a standard. (Ditto XML Schema or whatever, although those are a lot more complicated.)

Now, where is the documented grammar for named.conf? Where is the information I need to write a parser for these config files? Nine times out of ten you just have to guess by looking at example files. Manual pages describing the file format are helpful, but they very rarely document everything (for example the page may say 'comments are from # to the end of the line but does not specify what happens when # is inside a string, or when the end of line is escaped with \ ). Even if there is a formal grammar for the file format, it's not likely to be easy to generate a parser from it without some manual work, whereas XML DTD can be used to directly generate a parser with FleXML or other tools.

The point of XML is not that it's technically superior to other file formats but that it is an agreed standard which lots of people understand. This applies also to the grammars for XML files. You can give your file format as a DTD and anyone familiar with DTD can use it, without worrying about side issues like whether \\ inside a single-quoted string becomes one backslash or two. Also there are tools like nsgmls which given a DTD can quickly syntax-check an XML file to see whether it conforms to that DTD. Again, the advantage is in having a common language to talk about file formats, not that XML is magically better than whatever 'Not-Invented-Here' format you might make yourself.

--
-- Ed Avis ed@membled.com
Re:But XML is great for computers... by botik32 · 2003-03-20 04:26 · Score: 1

Here's an idea (which I've also said before!) - imagine if all those config files were XML based. So you could edit them using a text editor - same as now except slightly more cumbersome to edit.
<XML>
<based>
<config files="are">
<a mess="." />
</config>
</based>
</XML>
Let me say it again, XML-based config files are a mess.
They are prone to error: forget a '>' and your config's toast. Of course, you could use an xml validator after you're done, but have you noticed the sudden rise in complexity of the work? Remember, this was supposed to be a _simple_ operation.
Suppose you are a *nix admin and want to edit your config file. I use a utility which keeps its configuration files in xml format and I am getting very frustrated every time I edit one of those. I am afraid I will mess up somewhere and the utility will ignore the rest of the configuration. I had _never_ had such worries with a simple text configuration.
Second, your assertion regarding the openness of the xml-based formats is debatable. Nothing stops a company from concocting an ugly xml schema which they would change every second release in a way that is completely incompatible with previous versions.
Out of sad experience, I believe XML should never be at the user end of an user interface, no matter how uber 1337 the user is. It simply is not an easily editable format, period. XML is great at other things, like encoding data oblivious to endianness, but config file formats it is not!
Re:But XML is great for computers... by Zaiff+Urgulbunger · 2003-03-20 09:14 · Score: 1
They are prone to error: forget a '>' and your config's toast. Of course, you could use an xml validator after you're done, but have you noticed the sudden rise in complexity of the work? Remember, this was supposed to be a _simple_ operation.

Yes, they are harder to edit using a text editor, so anything beyond a minor config tweak could result in problems. But:
1. Thats only if you use a plain text editor. If you used an XML editor then your file is likely to remain at least "valid" (in an XML sense), so...
2. Garbage in garbage out -- if you mess up bad with either text or XML config files, then things aren't going to work right either way!
Of course, you could use an xml validator after you're done

You could do this. Given a schema, you could use a standard validation tool that could validate each parameter. Or you could just do the same as you presumably do at the minute which is restart the daemon/application and watch for errors!

The difference here is that with XML you *can* do the former but with plain text you can *only* do the latter.

..your assertion regarding the openness of the xml-based formats is debatable..

I never used the word "open" once! I said "accessible" in that you can use a text editor, and these being common currency on all platforms, the XML format presents no barrier to any user.

I know thats the same with a text file -- the point is simply that a new (superior in my view) format could be introduced without being a barrier to people without access to newer tools. Conversely, a binary format for example would present problems for people who did not have access to the appropriate binary editor tool.

Nothing stops a company from concocting an ugly xml schema which they would change every second release in a way that is completely incompatible with previous versions.

That could happen even with a text format, so on its own, I think that arguement is moot! Although you have raised the point that with XML you *can* change the format (as in extend it) and still retain backwards compatibility -- something that is harder to do with plain text formats.

I believe XML should never be at the user end of an user interface

Hmmm, I think maybe the problem here is that I don't think of text editors being part of a UI. Using a text editor is, if you step back and think about it, a really ugly horrible way to modify the behaviour of an application. It gives you no indication of what you should do or how you should do it. It also gives no feedback if you do something correctly or incorrectly.

On the plus side, text editors are ubiquitous, and text based config files are very easily implemented.

Having read back a little on this thread (now look what you made me do!) I can see there are issus with XML files and streams, so using grep as one example isn't going to work so well. I fully understand that if you're used to a bunch of tools that all work perfectly and well then there isn't much incentive to change.

The problem I see with text files is that it is harder to make changes to the file layout (this might not be an issue since many of the daemons they control are stable in an evolutionary sense), and it doesn't scale. By the latter, what I mean is that with text config files you got just that, a bunch of text files with different rules on syntax (albeit very similar). With XML, you could XLink them into one large structure - like LDAP? (I'm guessing here since I know nothing about LDAP!).

Here's a nice example - with an XML config file I could very easily create a form UI (in anything, but take XHTML as an example) and use XForms to relate the UI to the structure of my XML config file.

Thus, with very very little work (it really is too simple), I've created a friendly UI to editing this config file which includes *full* parameter checking.

There's good points

A good point by shish · 2003-03-18 01:19 · Score: 3, Insightful

Sure it sucks, but it's a *standard* that everyone can use, and there are many libraries for it so you don't need to write your own parsing code

--
I mod down anyone who says "I will be modded down for this", regardless of the rest of their comment

Re:A good point by jilles · 2003-03-18 02:13 · Score: 3, Insightful

Not only is it a standard, it appears to be the only widely accepted standard. Not using it currently boils down to going back to the hacked together, generally incompatible data formats of the past. Reinventing the wheel still is a popular way of passing time but it has never been very productive.

People often fail to see the point of widely adopted standards but the bottom line is that it makes it easier to reuse functionality that confirms to the standard. There are now both SAX and DOM based parsers for most common programming languages. Basically if you spend some time figuring out how these APIs work you can work with XML from almost any language.

That is not the problem. What is a problem is that everybody is introducing their own xml based languages and in many cases forget to publish the appropriate xml schema/dtd.

Now the guy who is complaining here is a perl programmer who has to process data that is passed to him in XML form. His point is that it easier for him to throw together a bunch of regular expressions to do his thing than it is to use some off the shelf validating parser with a generic DOM/SAX based API. Good for him that is job is so simple that a bunch of regular expressions do the trick for him. I'd hate to maintain his code though and I suspect he doesn't have much reuse beyond the odd copy paste.

--

Jilles
Re:A good point by EvilTwinSkippy · 2003-03-18 02:25 · Score: 3, Insightful

Amen, and amen.
Yes standards suck. But the suck in a way that is consistant and allows other sucky things to talk to other sucky things.
I'll bet the 802.11b is a really crappy standards. But as long as I can pick up interchangable devices for $50 at the local computer store I'll live in ignorant bliss.

--
"Learning is not compulsory... neither is survival."
--Dr.W.Edwards Deming
Re:A good point by arkanes · 2003-03-18 03:47 · Score: 1

Something you'll find repeated a number of times at places like the links in the post is that XML sucks, BUT the political advantages of widespread adoption may outweigh it's technical weaknesses. That doesn't mean that something not-sucky that achieves widespread use wouldn't be better. Oh, and tell your perl dude that someone has written an XML module, check CPAN. Although he's got an excellent point, in that a simple data model is better than a complex one.
Re:A good point by grammar+fascist · 2003-03-18 05:38 · Score: 1

His point is that it easier for him to throw together a bunch of regular expressions to do his thing than it is to use some off the shelf validating parser with a generic DOM/SAX based API. Good for him that is job is so simple that a bunch of regular expressions do the trick for him.

Okay...I think you missed the class that teaches how to avoid ad hominem. Still, you have a good point otherwise. Let's move his arguments to a common real-world example.

Let's say you're a clearinghouse for some specialized electronic transactions. You have clients who send you many small files. You package them and send them to their various destinations. Your customers on the other end return you a few extremely large files to be parsed, split, and have the data returned somehow to your original clients.

If you receive these files in XML, you have two options:

- DOM-like parsers. These will 1) be initially slow, and 2) take ungodly amounts of memory. If the files are large enough, this may rule out DOM altogether.

- SAX-like parsers. You either write take the callbacks yourself or write a zillion wrapper classes to do the parsing. Either way, it's a pain in the butt and your future maintainers will have to suffer with debugging code that executes non-sequentially.

So where's the common stream-based solution that exists for nearly every other file format? Something like that would be ideal, don't you think?

The original author's problem with XML is that there aren't any.

--
I got my Linux laptop at System76.
Re:A good point by jilles · 2003-03-18 06:01 · Score: 2, Interesting

Assuming that these data streams have something in common you'd probably spend a week or so developing a generic, maintainable solution using e.g. SAX and reuse that in each particular case. The adhoc solution of using regular expressions probably saves you time on the short term, but on the long term you'll probably keep reinventing the wheel.

However, this is all beside the point since we've now established that there's nothing wrong with XML but that it's just the tools to manipulate it which are still lacking in certain ways. I'd be the first to agree that the SAX and DOM APIs are a bit overkill for some situations. However, concluding from that that XML is not a good solution goes too far IMHO.

--

Jilles
Re:A good point by snakecoder · 2003-03-18 09:05 · Score: 1

I second the Amen on the good point. Standards do suck, but learning them is a one time process. One of the articles gives examples of code that is cumbersome, but you wrap that in a function one time and you reuse that code over and over. If you think xml files suck, you're going to love DTD's

Once I learned the DOM parser, it was easy to
use it to layer my own parse method on top. Now I really can use.
myDom=myParse.getDoc("somedoc.xml")
Now if the element exists I can just write
myDom.price/(myDom.revenues -myDom.expenses) or some variation of that.

So there Brian, not so hard after all.

--
-Nuke the moon
Re:A good point by Delphix · 2003-03-28 09:38 · Score: 1

"So let's all use a sucky standard" is the best you can say about XML? That's like saying let's all go use Windows because it's the "standard" on 94% of computers...

I for one am glad there are some people out there who aren't falling in line.
Re:A good point by Delphix · 2003-03-28 09:41 · Score: 1

"So, let's all go use this sucky thing because it's standard" is the best you can say about XML? That's like saying let's all use Windows because it's the "standard" on 94% of computers... I for one am glad there are people out there who aren't falling in line.
Re:A good point by MadAhab · 2003-03-28 17:24 · Score: 1
But the entire point is that week could easily be solved in 5 minutes now with more convenient and less over-engineered solutions - and you aren't likely to spend that week fixing the "inferior" ad hoc solution during the life of most code projects - and those that do live longer have proven that learning a local dialect isn't much of a problem compared to learning the software itself.
In this sense, XML is like the local communist party boss insisting that everyone choose new words for hammers and gaffs and threshers because they represent counter-revolutionary forces.
XML does suck. It sucks because:
1. The tools for programming with XML suck
2. The tools for writing XML suck
3. The tools for designing and distributing XML DTDS suck
It's not a failure of concept, it's a failure of execution. It's a proof of a dirty secret of software engineering, which is that formal practices are no better at producing quality results than entertainment companies are at producing quality entertainment; in fact, the best results usually come from small, unrestricted forces. XML the concept doesn't suck, but XML the reality definitely sucks.
--
Expanding a vast wasteland since 1996.

what? by Anonymous Coward · 2003-03-18 01:19 · Score: 0

i'm no programming guru, as a matter of fact i would classify myself as one of the 'visual basic'-programmers the guy above mentioned. (no i don't know VB). However, the XML i have come into contact wth has struck me as very easy to learn. The one thing I experienced as too complicated was the situation that arises when several XML types are mixed...

Don't Blink by Cnik70 · 2003-03-18 01:20 · Score: 1

The problem with XML seems to be that the formats change too fast, and many never seem to be backwards compatible. I wouldn't mind coding for XML if I knew that an application would viable for more than a few months.

--
-Cnik

Re:Don't Blink by samael · 2003-03-18 01:46 · Score: 2, Informative

You can use XSL to translate any XML document into a different format. So your old documents should be convertable.

If your subdialect keeps changing, that's down to the people defining the syntax, not the language itself.

--
My Journal

It's about tools, libraries by Anonymous Coward · 2003-03-18 01:20 · Score: 5, Interesting

Well, first he chose a bad tool (Perl regexp) for XML processing, and then complains about his tools being insufficient.

Using Perl regexps to parse XML is silly, because there's too much variability (e.g. attributes in any order, elements covering multiple lines) that regexps aren't good at handling. You can do it, of course, but it quickly gets messy.

There's a number of tools and libraries (with Perl or other languages) beyond plain DOM and SAX that use proper XML parsers and are reasonably easy to use. He should use one of those, and stop complaining.

Re:It's about tools, libraries by kinnell · 2003-03-18 01:34 · Score: 4, Informative

As he say in the article, the reason he uses Perl regexp is that the tools/libraries have to read the entire file. If this is a long stream, it's grossly inefficient - you have to load the entire thing into a massive tree structure in memory. If the job can be done serially with regexps without using a noticeable amount of memory or time, then it is often better. This is the point of the article - there is a choice between using a method which is often grossly innefficient for real world problems (XML libs) and a fast but messy method (Perl regexp). Neither of these is really satisfactory, hence the complaint.

--
If I seem short sighted, it is because I stand on the shoulders of midgets
Re:It's about tools, libraries by p3d0 · 2003-03-18 01:44 · Score: 1

There are event-based parsers that don't need to read the whole file. I think expat is one of these.

--
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
Re:It's about tools, libraries by skillet-thief · 2003-03-18 01:47 · Score: 1

This is what people usually say. However, I had a problem (using MS Access XML exports -- granted it's pretty simple XML) where a quick regex solution was a lot faster than learning how to use some other tool. Just chop up your different elements, get your tags and get out... Probably (definitely) not a universal solution, but in my case it saved me a lot of time and got the job done.

--
Congratulations! Now we are the Evil Empire
Re:It's about tools, libraries by Anonymous Coward · 2003-03-18 01:49 · Score: 0

He also refers specifically to processing XML using a callback model (e.g. SAX) as well, and claims it's unnatural. I don't think I agree with him wholeheartedly on that point, once you've wrapped your head around the basic idea it's not so hard to do something useful with an XML document. Certain kinds of Perl modules (XML::Twig), as you point out, offer yet another way of dealing with XML documents that's somewhere in between.

I would have preferred to see a specific example of something one would like to do with a callback-based interface and can't easily.
Re:It's about tools, libraries by PigleT · 2003-03-18 01:51 · Score: 2, Interesting

I agree that it's about tools and libraries. And this is what I think about them, too.

At work, I brush up against XML occasionally, mostly for documentation or data-resultset purposes. In my own time, I use it in my photo gallery - result-sets from database queries get converted to XML and then spat out through XSLT in Sablotron, straight to web. For all the hoops it goes through, it's actually still quite nippy.

However, I also dislike it intensly.

I've written a blog-like system-news announcement board using a Ruby CGI against postgresql as a backend. I can pull back a result-set - a simple table-thing with each row being a text announcment, half a dozen fields (when posted, by whom, etc). And I wanted to output this in HTML form for the web, in plain-text to send to a user who wanted it via email every day, and in s-exp form for my own gratification.
However, the first problem you run into is the formatting. A textarea in an HTML form gives no line-wrapping (wanted for plaintext output, but only in specific fields) and embeds ^M characters everywhere. When the output is HTML, those ^Ms want to become br tags. When the output is plaintext or sexp, they want to become \n. Simple, if ONLY there were a way of doing either elementary reformatting or search-n-replace in XSLT. There is, but s/// is about 10 lines' worth, if my googling is to be believed. That makes it non-optimal for one of its primary uses: making transformations on big blocks of text-based data, and it can't even edit within a node correctly? Pathetic.
Why shouldn't I just write 3 output methods in my Ruby CGI script that take the result-set directly to text, HTML or sexp formats, with the power of ruby to do a #gsub("^M", "\n") on just the fields I want, in a tiny few extra characters of code?

Now to tackle what you've said:

> Using Perl regexps to parse XML is silly

No, it's not. Perl regexps are a highly featureful, pre-existing, code.

> e.g. attributes in any order, elements covering multiple lines) that regexps aren't good at handling.

These things are not a problem. You can easily match an attribute occurring, as it does, within a n opening-tag, and pull out both the name and the contents. Using that to set a variable of given name in your program - a highly important part, given that XML is a data-transfer format and it's the internal representation afterwards that is its whole raison-d'etre - is trivial. Thus, perl wins.
Multi-line matching is explicitly catered-for in perl, with /m or /s on the end of the regexp.

> There's a number of tools and libraries

Indeed there are. And you know what? When I've got a small paragraph ( characters, I dunno.)
In short, "programmed text" won the day for me.

--
~Tim
--
.|` Clouds cross the black moonlight,
Rushing on down to the circle of the turn
Re:It's about tools, libraries by Sique · 2003-03-18 01:51 · Score: 5, Interesting

No. It is not. It is about basic computer science.

XML is a grammar of Chomsky Type 2 (context free grammar). So you need a stack machine (or equivalent) to parse the whole (left or right) subtree to get your information. This may be fine for small data (like config files), but it takes a huge amount of memory space if you have real world data like the SWIFT file you have to parse for a special transaction. What he is complaining about is exactly this: Lots of parsing to get a simple datum.

With regexp your parsing is much faster, because you can concentrate on substrings, you can parse them without using a stack, you can use them in stream context. But regexp are Regular Expressions (Chomsky Type 3 grammar), so they are in fact just a subset of XML and not able to parse XML completely.

One of the links in the article points to another rant, where the author wants some regulations for a limited XML. Badly enough the ideas he is proposing are in fact context sensitive and such they are Chomsky Type 1 (context sensitive grammar) and a superset of XML instead of a simplified subset. Someone remembers the Early algorithm with something that can be described as a multi dimensional stack?

Generic XML parsers are memory intensive and can't be as fast as regular expressions. That's just computer science. Deal with it.

--
.sig: Sique *sigh*
Re:It's about tools, libraries by Anonymous Coward · 2003-03-18 01:59 · Score: 2, Insightful

I don't buy it.

There's two ways: DOM-like, where you read the file and have tree-like access. It's simple, and here the inefficiency complaint holds, very much so for large files.

There's SAX-like, where you process events. Plain SAX is fast. It's somewhat inconvenient, but not that much worse than regexps. I've co-developed a large open source app using SAX: it works, it's efficient for large files, so SAX is certainly doable.

But there's more: Tim Bray's blog message has created attention elsewhere, and on xml-dev one person introduced a Perl API based on SAX which lets you easily extract information from the stream. See:
http://lists.xml.org/archives/xml-dev/200303/msg00 536.html

So... I still say: Proper tools exist. Use them, be happy!
Re:It's about tools, libraries by PigleT · 2003-03-18 02:01 · Score: 4, Informative

I agree that it's about tools and libraries. And this is what I think about them, too.

At work, I brush up against XML occasionally, mostly for documentation or data-resultset purposes. In my own time, I use it in my photo
gallery - result-sets from database queries get converted to XML and then spat out through XSLT in Sablotron, straight to web. For all the hoops it goes through, it's actually still quite nippy.

However, I also dislike it intensly.

I've written a blog-like system-news announcement board using a Ruby CGI against postgresql as a backend. I can pull back a result-set - a
simple table-thing with each row being a text announcment, half a dozen fields (when posted, by whom, etc). And I wanted to output this in HTML form for the web, in plain-text to send to a user who wanted it via email every day, and in s-exp form for my own gratification.
However, the first problem you run into is the formatting. A textarea in an HTML form gives no line-wrapping (wanted for plaintext output,
but only in specific fields) and embeds ^M characters everywhere. When the output is HTML, those ^Ms want to become br tags. When the output
is plaintext or sexp, they want to become \n. Simple, if ONLY there were a way of doing either elementary reformatting or search-n-replace in XSLT. There is, but s/// is about 10 lines' worth, if my googling is to be believed. That makes it non-optimal for one of its primary uses: making transformations on big blocks of text-based data, and it can't even edit within a node correctly? Pathetic.
Why shouldn't I just write 3 output methods in my Ruby CGI script that take the result-set directly to text, HTML or sexp formats, with the power of
ruby to do a #gsub("^M", "\n") on just the fields I want, in a tiny few extra characters of code?

Now to tackle what you've said:

"Using Perl regexps to parse XML is silly"

No, it's not. Perl regexps are a highly featureful, pre-existing, code. I'd be surprised if libxml *didn't* use regexps in its XML parsers, frankly.

"e.g. attributes in any order, elements covering multiple lines) that regexps aren't good at handling."

These things are not a problem. You can easily match an attribute occurring, as it does, within a n opening-tag, and pull out both the name and the contents. Using that to set a variable of given name in your program - a highly important part, given that XML is a data-transfer format and it's the internal representation afterwards
that is its whole raison-d'etre - is trivial. Thus, perl wins.
Multi-line matching is explicitly catered-for in perl, with /m or /s on the end of the regexp.

"There's a number of tools and libraries "...

Indeed there are. And you know what? When I've got a small paragraph (under 10 lines) of data that I want to transfer from A to B, the last thing I'm going to do is invoke a 600Kb library so I can use a pompous and fashionable set of functions to produce "XML", when perl/ruby/sh have all had
perfectly valid "print" or "echo" commands for the past decade or more. If the output is valid XML, you've no reason to diss the method used to produce it.

As a final example, I've also had a few documents to be writing, of my own, at work. I've had two options: either sit down, set up emacs to
handle XML sources smoothly so I can open and close tags at the push of a key-chord the way I *want* to create the stuff, or program a
small sub-language. Lisp, in the form of _librep_, won the day, with a few small functions to produce strings based on the input. And guess what? Because this is a programming language rather than a mere text-transforming language, I made a CGI out of it, and can embed programs within my "data", too, without feeling the urge to write to
the W3C about it.
Editing it is an absolute dream - opening and closing paragraphs of text is a piece of cake and fits the way I want to work. (Maybe you like looking at spikey angle-bracket characters, I
dunno.)
In short, "programmed text" won the day for me.

--
~Tim
--
.|` Clouds cross the black moonlight,
Rushing on down to the circle of the turn
Re:It's about tools, libraries by kinnell · 2003-03-18 02:06 · Score: 1

I stand corrected ;-)

--
If I seem short sighted, it is because I stand on the shoulders of midgets
Re:It's about tools, libraries by radish · 2003-03-18 02:08 · Score: 1

That's the whole point of SAX parsers (I don't know any perl ones, I'm not a perl guy, but I bet there are several!).

--
---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"
Re:It's about tools, libraries by Anonymous Coward · 2003-03-18 02:13 · Score: 0

using perl regexp is basically an inlined SAX interface.

Sounds like he really wants the data to be in a random-access DB. Well, it's not. Get different data, or deal with it.
Re:It's about tools, libraries by kanthoney · 2003-03-18 02:20 · Score: 1

Isn't that what SAX parsers are for?
Re:It's about tools, libraries by protonman · 2003-03-18 02:34 · Score: 1

> so they are in fact just a subset of XML and not
> able to parse XML completely.

Which means "not everything which is theoraticly possible in XML".

And since *practicly* all XML has a finite number of elements and everything which has a finite number of possibilities can be modelled by regexps just as well; regexp are perfectly capable of parsing XML.

It's not gonna be pretty, but theoraticly possible with all practical XML. That's just computer science. Deal with it.

--
The man of knowledge must be able not only to love his enemies but also to hate his friends.
Re:It's about tools, libraries by Boiotos · 2003-03-18 02:38 · Score: 2, Interesting

Shouldn't SAX-based tools *not* have to load the entire thing into memory?
Bray's paper appears to express a strong preference for an XML that would work well with ?standard regex tools. In it he says, "If I use any of the perl+XML machinery, it wants me either to let it read the whole thing and build a structure in memory, or go to a callback interface." And then it adds that callback "is sufficiently non-idiomatic and awkward that I'd rather just live in regexp-land."
This, in turn, seems to be based on an article linked to in Bray and advocating the same thing.
It seems to me that to convince the larger world that this is necessary, some other options would have to be excluded. Aren't regexs of some sort going to be in v. 2 of XSLT? None of its successful implementations require loading the document into memory, and it nicely magics away the namespace kerfuffle that Gregorio's examples illustrate.
What I took away from the article was considerable amazement that one of the markup luminaries uses such low-level tools to process XML.
Re:It's about tools, libraries by Anonymous Coward · 2003-03-18 03:01 · Score: 1, Interesting

XML is a grammar of Chomsky Type 2 (context free grammar). So you need a stack machine (or equivalent) to parse the whole (left or right) subtree to get your information. This may be fine for small data (like config files), but it takes a huge amount of memory space if you have real world data like the SWIFT file you have to parse for a special transaction.

Are you suggesting that to get to the 15005-th transaction in the file I have to fully parse all of the 15004 previous transactions? That could be true for some of the more complicated context-free grammars, but in XML you can easily discard a nonterminal without fully parsing it - just look for the matching closing symbol. This does not even require a stack, just a simple counter, in case your nonterminal is nested.

Generic XML parsers are memory intensive and can't be as fast as regular expressions. That's just computer science. Deal with it.
Which theorem says that XML has to be memory intensive and cannot be fast? Observe, that theorems about context-free grammars only guarantee the existence of hard context free grammars, not general hardness of the grammars.
Re:It's about tools, libraries by Sique · 2003-03-18 03:02 · Score: 3, Insightful

It is not about the number of elements. It is about the depth you can nestle them. Think about normal algebraic terms (a+b*5-(3*(7-4))). It's often very reasonable to have such terms in XML. But they are unparseable via regexp, because regexp doesn't have a stack and can't count parentheses. And don't reply with RPN (reversed polish notation) and argue that this were parentheses-free. It replaces the parentheses with a fixed number of operator argumentes. And regexp can't count arguments too. Regexp in fact can't count at all (or only until a predefined limit, which is mathematically equivalent).

--
.sig: Sique *sigh*
Re:It's about tools, libraries by Len · 2003-03-18 03:07 · Score: 3, Informative

Generic XML parsers are memory intensive and can't be as fast as regular expressions. That's just computer science. Deal with it.
You're right, but the problem is that "deal with it" may equate to "don't use XML" in a lot of cases, which makes XML less of the universal data representation language than it wants to be.
When the parser uses a lot of memory (like DOM reading the entire input into a tree) it becomes inefficient, sometimes infeasible, to handle large input documents. That's one of the specific problems mentioned by Tim Bray and others.
Re:It's about tools, libraries by Anonymous Coward · 2003-03-18 03:10 · Score: 0

*schoooooop*
[As what you just said goes over my head...]
Re:It's about tools, libraries by Sique · 2003-03-18 03:15 · Score: 4, Interesting

No, I am suggesting, that in general you have to use a stack machine. Surely you can use degenerated trees instead of fully balanced trees to store your data. And a concatenation of elements is a regular expression (and a degenerated tree). But then you are already making assumptions about the data you get. But with such limiting assumptions you can easily streamline your code. But you are loosing the full power of XML on the way. And you need a grammar that makes sure you don't mix terminals and nonterminals.

It starts out already if you are using escape characters to mark nonterminals and escape those characters with itself to mark them terminal. Those markings are still regular, but you loose already some speed ups. For instance \\ matches \\" and \\\", but one means just \ and the end of the string, and the other one means \" and the string continues. The only way to stay out of the mess is to make sure you are using an only left bound parser, first parse for all escape characters and then for the nonterminals, which makes your parser already a (local) 2-pass-parser.

--
.sig: Sique *sigh*
Re:It's about tools, libraries by jgerman · 2003-03-18 03:31 · Score: 1

I've got new for you... all of those tools use regexes. XML is a regular language, and parser built to work with it use regexes to recognize the tokens.

--
I'm the big fish in the big pond bitch.
Re:It's about tools, libraries by Ed+Avis · 2003-03-18 03:32 · Score: 2, Interesting

There are two more methods: interfaces like SAX where you read individual tokens, and callback interfaces like Perl's XML::Twig where you can efficiently scan the whole file and only construct in-memory trees for the parts you're interested in.

The best method might be a lazy programming language where you can say

tree.a[4].b[6].contents

and only when this expression is evaluated will the necessary bit of the tree be parsed.

--
-- Ed Avis ed@membled.com
Re:It's about tools, libraries by protonman · 2003-03-18 03:41 · Score: 2, Interesting

I know, but I thought you'd get that with a finite number of elements, you can't nest them infinitely... (I'm counting tags as "elements" here, a bit sloppy I admit).

My point was that in *practical* XML you simply don't have stuff like [a][a][a][a]... ...[/a][/a][/a][/a].

As long as you want to parse a FINITE number of terms, you can do that with regexps.

If your example string with parentheses is the ONLY one you want to parse, I can do that (in sed/perl-like syntax) like this:

$a+b\*5-\(3\*\(7-4$\)\)

If you want to parse all algebraic terms like in your example with a length less than 5 (!) you can start with this...

(\w|\d\)
$(\w|\d$\)

(to get 9 and (0) and (a) i.e.)

and

$(\w|\d) [+*-\] (\w|\d)$

to get (9+b),(a*b) etc.. etc..

I know, it's gonna be a LONG list, but since the number of possibilities is limited, it's not infinite! (and obviously, I can't use * on the parentheses!)

A problem arises you want to be able to parse a string of arbitrary length with an arbitrary number of parentheses. That's of course impossible for reasons you stated. :-)

But IN PRACTICE, the number of possibilities in your XML file is NOT arbitrary, it is fixed and predictable, so you can use regexps.

I'm nitpicking, I know, but it still is CS. :-)

--
The man of knowledge must be able not only to love his enemies but also to hate his friends.
Re:It's about tools, libraries by Ed+Avis · 2003-03-18 03:44 · Score: 1

Using regexps to parse XML is okay for a one-off, but it is, if not silly, then _unwise_ for larger projects.

For example if your input XML looks like this

One
Two

you might create regexps to parse it. But it would be equally valid for the XML to say

OneTwo

or even

OneTwo

These bits of XML are equivalent to the first, modulo whitespace (and the third example is exactly equivalent to the second, since whitespace after a tag name is ignored). Will your regexp parsing code handle them just the same?

Making your regexps general enough to handle all of these cases is a real pain, even if you know that elements will never nest inside each other. And if you are trying to match a nested data structure to arbitrary depth, this can't be done at all with just regexps.

Better surely to use an existing parsing library which has already been debugged and can smooth over all the syntactic variations, and which won't stop working when the line breaks come in different places or someone adds attributes to one of the elements.

Trying to parse XML with regular expressions is like trying to parse C source code with regular expressions. It's okay for quick tasks, like grepping through your source tree for a particular variable, but shouldn't be used for the task of reading in a whole document.

--
-- Ed Avis ed@membled.com
Re:It's about tools, libraries by Ed+Avis · 2003-03-18 03:46 · Score: 1

Oops, Slashdot ate the XML I typed. (Why can't it convert characters to 'lt' and 'gt' entities?) But I hope you get the idea.

--
-- Ed Avis ed@membled.com
Re:It's about tools, libraries by ajs · 2003-03-18 03:52 · Score: 2, Interesting

Come Perl 6, of course, you'll have the best of both worlds:
$data = STDIN.getlines().join(''); if ($data =~ qr{ ^ (<xml>) $ }) { my XML $parsed = $1; if (my $n = $parsed.findnode('sometagiwant')) { print "Yep, it's there:\n$n\n"; } else { print "Failed to find sometagiwant\n"; } }
And depending on what you want (memory vs speed) your "xml rule" in that regexp can do whatever annotation, datastructure building, etc that you want.
Re:It's about tools, libraries by Kallahar · 2003-03-18 04:37 · Score: 1

But I agree, XML may be good for small messages that need to be exchanged, but for all of my projects it has been far more efficient to store it in a flat file that is read in in a stream. Doing it by XML resulted in a 4 meg data file (vs about 50k in the flat file). And then the processing time and memory overhead on top of that.

Travis
Re:It's about tools, libraries by Cuthalion · 2003-03-18 04:51 · Score: 1

Of course this all depends on your DTD. But given that in XML open and close parentheses need to match and there can be several types of them, your regexp ends up just being an enumeration of possible XML files, which starts out ridiculous and very rapidly blows up to be enormous.

Go on, figure out how to regularly allow , , and cheese and produce meaningful error messages on something like

--
Trees can't go dancing
So do them a big favor
Pretend dancing stinks!
Re:It's about tools, libraries by Cuthalion · 2003-03-18 04:54 · Score: 1

crap

all my >'s and <'s went away.

the example: parse <a> <c> </c> </a> and <a> <c> <d> </d> </c> cheese </a> and make good errors for <a> <d> </d> </e>

--
Trees can't go dancing
So do them a big favor
Pretend dancing stinks!
Re:It's about tools, libraries by Anonymous._.Coward · 2003-03-18 04:59 · Score: 2, Interesting

There's more than SAX and DOM out there. What about data binding tools? Generate some classes from your DTD/schema, call bind(xmlFile) and you've got objects to work with.

There are even partial matching binding architectures. The best one I've seen is SNAQue.

--
take a triptonica to subthunk
Re:It's about tools, libraries by 21mhz · 2003-03-18 05:13 · Score: 1

Generic XML parsers are memory intensive and can't be as fast as regular expressions. That's just computer science. Deal with it.

Memory use of an event-based XML parser is proportional to maximum depth of elements. While elements your documents don't nest to insane levels, you're on good terms with memory.

Next, maybe I'm ignorant CS-wise, but nobody has yet explained to me the benefits of using regular expressions on nested structures and not screwing up with the said nesting.

You're right, but the problem is that "deal with it" may equate to "don't use XML" in a lot of cases

What's the alternative? To roll your own half-assed format/parser, facing just the same problems or worse?

--
My exception safety is -fno-exceptions.
Re:It's about tools, libraries by Anonymous Coward · 2003-03-18 05:13 · Score: 0

err... the problem you are describing applies only to DOM. Ever hear of SAX?
Re:It's about tools, libraries by protonman · 2003-03-18 05:23 · Score: 1

You *KNOW* how I'm gonna answer here don't you? :-D

Yes, it IS like enumerating all possible XML files, but that should stop a scientist, right? ;-)

--
The man of knowledge must be able not only to love his enemies but also to hate his friends.
Re:It's about tools, libraries by protonman · 2003-03-18 05:30 · Score: 1

Crap. All my 'nt went away.

--
The man of knowledge must be able not only to love his enemies but also to hate his friends.
Re:It's about tools, libraries by Sique · 2003-03-18 05:38 · Score: 1

The principal problem remains the same (parsing the whole subtree), but with SAX it's just minimized because of the usage of tokens instead of the complete XML structure. It means that SAX just pushes the limits further away, but it finally hits the same barriers.

--
.sig: Sique *sigh*
Re:It's about tools, libraries by grammar+fascist · 2003-03-18 05:42 · Score: 1

What about data binding [rpbourret.com] tools? Generate some classes from your DTD/schema, call bind(xmlFile) and you've got objects to work with.

They're still DOM underneath, with all the disadvantages.

--
I got my Linux laptop at System76.
Re:It's about tools, libraries by Loma · 2003-03-18 05:54 · Score: 5, Informative

You have used many big words, and you may have your language levels incorrect, but you are clearly wrong in one respect:

Generic XML parsers are memory intensive and can't be as fast as regular expressions. That's just computer science. Deal with it.

Well, I've written my own XML parser, as well as a compiler for a simplified version of C, so I think I'm somewhat qualified to talk on this. A generalized XML parser is not memory intensive, unless you are a very bad programmer. All you need is a depth-first stack, which will be as high as your XML tree is deep. And given that, a stack of size N is capable of handling a tree of size X^N, you are definitely going to run out of disk space before you run out of RAM. In other words, the memory required for parsing an XML tree is trivial.

An XML parser is one of the simplest parsers imaginable. It's a sophmore task to create a state machine to process the generic L(1) (or is it L(0)?) XML grammar. And as you should know, a state machine for an L(1) grammar is as fast as you can get.

Anything you do with regular expressions will be much more complicated. As I'm sure you know, regular expressions are turned into state machines before being used to process the input. And almost all regular expression state machines are much more complicated than the state machine you need for an XML parser. In an XML parser, definite boundaries exist on elements such as:

'<' and '>'

Regular expressions are not this smart. For example, looking for the substring "abc" in the longer string "abababaaabbbabcabababac" is already generating a statemachine that is more complicated than that needed for XML parsers.

Back to the "memory" intensive nature of XML parsers. If you parse your XML tree into a nested hashmap structure, then the memory needed will be proportional to the number of nodes in the XML tree. Maybe this is what you meant by "memory intensive". However, this is totally unnecessary. You can easily construct an XML parser to look for the specific elements you care about. Then you only get those elements, and you only need to allocate the memory for the elements required.
Re:It's about tools, libraries by TheLink · 2003-03-18 06:21 · Score: 1

Because the Slashdot design doesn't cope with it.

It should store metadata about comments indicating whether a comment is in plain text, preformatted text or html. And if it's plain text, stuff is quoted/filtered accordingly. If it's HTML, you filter out the nasty stuff either before or after you store it depending on your security/performance policy, then you quote/filter accordingly.

By quote/filter accordingly I mean if you are sending plain text to a browser you would convert the carriage returns to br tags, less than signs to appropriate tags and so on. If preformatted text, then extra whitespaces should be respected. And if you are sending a plain text comment to a mobile phone, email account or some other program you would use a different filter.

That to me is the correct way to do things.

So I don't quite understand why PigleT (the grandparent poster) is having problems doing something similar. Maybe PigleT is making a slashdot style mistake.

Link.
--
- Too many replies beneath your current threshold
Re:It's about tools, libraries by Ed+Avis · 2003-03-18 06:54 · Score: 1

Well what you propose would work, and may well be the best way to do things (I like the idea of tagging content with its type before storing it, and implemented something similar when I once needed to make a simple message board). But essentially, the change needed is not to do half-baked 'filtering' of text by stripping out 'bad' characters, but just to HTML-escape it correctly, changing & to & and so on. At the moment the Slashdot poster must do this by hand, even for supposedly 'plain text' mode where you shouldn't have to care about HTML.

--
-- Ed Avis ed@membled.com
Re:It's about tools, libraries by Anonymous Coward · 2003-03-18 06:54 · Score: 0

AGREED! I stopped reading the article when he started talking about using regexp for extracting data and printf( ) for serializing xml. People in the 'scripting camp' need not bother with XML if they are going to be lazy. Most Perl programmers and scripters I know are willing to sacrifice simplicity, readability, flexibility, and overall quality for a few negligable performance boosts. Not that all Perl programmers are bad....
Re:It's about tools, libraries by hummassa · 2003-03-18 07:23 · Score: 1

better yet,

<pre>
while STDIN.getline() =~ qr{ ^ (<xml>) $ } {
my XML::Lazy $lazy = $1;
if $lazy =~ qr{ <node('sometag')> } {
print "there"
} else {
print "not there"
}
}
</pre>

--
It's better to be the foot on the boot than the face on the pavement. ~~ tkx Kadin2048
Re:It's about tools, libraries by CynicTheHedgehog · 2003-03-18 07:25 · Score: 1

There is. It's called XML Path, or XPath for short. The following string, for example, would return a list of text nodes for all of the children of a specific parent:

rootElement/parent[@name='Jane']/child/text()

I can also return the parents with children named "John":

rootElement/parent[child[@name='john']]

Or the name of the parents with children named "John":

rootElement/parent[child[@name='john']]/@name

And you get the idea. You can do a lot more with it as well, including number and string comparisons, sorting, and a number of other function doo-dads. It's really sweet. If you use Java give Apache Xalan a shot.

But it does have one drawback: you have to use DOM to use it. Theoretically, it should be possible to write a SAX version, but every time you queried the document you'd have to read through the whole file. An ideal solution would be to create an index using SAX and then optimize the XPath expression evaluator to use that index.

But I digress. The point was to say that the XML toolkit supports a variety of solutions, from DOM to SAX to XPath to XSLT that do things quickly and easily (depending on the API you use). If he doesn't want to use DOM (and in this case he shouldn't) he needs to use some kind of Perl SAX API...the regex stuff he's doing is a little over the top in my opinion.
Re:It's about tools, libraries by TheLink · 2003-03-18 08:03 · Score: 1

You are suggesting storing the data as HTML?

e.g. convert plaintext to safe HTML and store it. Convert HTML to safe HTML and store it.

That is ok as long as you will always want to only output HTML. And don't mind lower fidelity to the original data or everything can be converted to HTML.

Scoop (kuro5hin) seems to do it fine, plus it even italicizes stuff surrounded by underscores when you pick autoformat.

I don't know why this problem hasn't been fixed in Slashdot for YEARS.
--
- Too many replies beneath your current threshold
Re:It's about tools, libraries by Ed+Avis · 2003-03-18 08:11 · Score: 1

I'm not particularly proposing storing data as HTML, just saying that you don't need to change the storage format to fix the stupid quoting problems. The fix is simpler than that. Just rip out whatever code currently makes it its business to remove < characters, and replace it with non-broken code that HTML-escapes them properly. Whether this happens on input or output doesn't really matter.

--
-- Ed Avis ed@membled.com
Re:It's about tools, libraries by Anonymous Coward · 2003-03-18 08:19 · Score: 0

If you knew anything about the standard computer science definition of regular experessions vs. the perl implementation of them, you would know that regular expressions in perl are more powerful than the standard computer science ones, precisely because they *can* count (ever seen {1, 3} in perl?). Perl regexps require a PDA to compute. Similarly, if you look at, say an YACC file, it is more powerful than a PDA, even though its goal is to parse CFGs. It is equivalent to a Turing machine in power, due to states (i.e. it is not context-free).
Re:It's about tools, libraries by rodentia · 2003-03-18 08:31 · Score: 1

You get stuck in the tools you know, I guess. I don't know what the basii are for Tim's assertions in this article. I'm a self-taught programmer via: typesetting; SGML systems scripting, macro writing; some coursework in Java; XML'99/Xtreme Markup and some lucky breaks. I took the markup wonk path. Voila, a 40 yr old newbie developer.

Frankly, I don't know why XSL is so disregarded, else regarded with fear and awe. I am some kind of master esoteric in our staff 'cause I can write recursive templates. I manage the print and presentation tier of a large web application largely with XSL, against some extremely large instances and it is frankly easy. Use a two-stage transformation, manage grouping and re-parenting in the first, adding hooks the second transform will use to fire garbage collection (new output target, fo:page-sequence, et. al.) if memory usage requires any management at all. Cache the stylesheets, hand the transformers a SAX stream and let them manage their own data structures.

And it is maintenance-level programming to extend Saxon or Xalan to handle your regexp, if that's what you gotta have. The first Java class I ever wrote that wasn't an exercise was a Xalan extension.

If XML is hard, I don't wanna be easy.

--
illegitimii non ingravare
Re:It's about tools, libraries by Anonymous Coward · 2003-03-18 08:49 · Score: 0

I remember that guy!!! I used to love watching Tennessee Tuxedo and his walrus pal, Chomsky!
Re:It's about tools, libraries by pyrrho · 2003-03-18 10:11 · Score: 1

but what the heck does he mean that "callback are non-idiomatic". I believe that callbacks are an idiom.

And one many of us are quite used to! Using SAX you don't have to load the entire document.

--
-pyrrho
Re:It's about tools, libraries by warrax_666 · 2003-03-18 10:53 · Score: 1

Yes, it IS like enumerating all possible XML files

... and the fact that there are at least exponentially many possibilities does not bother you in the slightest?

--
HAND.
Re:It's about tools, libraries by Polo · 2003-03-18 10:59 · Score: 1

Actually, Perl and XML are really easy if you use the XML::Simple package.

<addr> <name>anonymous coward</name> <phone> <mobile>555-1212</mobile> <home>800-555-1212</home> </phone> </addr>

is parsed by XML::Simple and you can access the elements like this:$r->{addr}->{name}or$r->{addr}->{phone}->{mobile};
If you have duplicate keys, you can turn on an option and access them with$r->{addr}->{0}->{name}-{0} and so forth. I believe there's even an option in the middle of the two.
Re:It's about tools, libraries by protonman · 2003-03-18 11:06 · Score: 1

Of course not. I never said I was a developer!

The GG*P (heh) claimed "It's just CS!" and said regexps were *theoretically* unfit to parse regexps...

I just went on and on and on to show that albeit impractical, cumbersome (and probably stupid), *theoretically* regexps can parse all XML data files you can throw at them...

--
The man of knowledge must be able not only to love his enemies but also to hate his friends.
Re:It's about tools, libraries by Anonymous Coward · 2003-03-18 11:27 · Score: 0

You've obviously missed the point. We're talking about streams here. There's nothing TO map. You have a stream of data coming in, could be a couple of records, could be billions. What are you gonna do with your binding now, huh?!
Re:It's about tools, libraries by fymidos · 2003-03-18 18:15 · Score: 1

Most Perl programmers and scripters I know are willing to sacrifice simplicity, readability, flexibility, and overall quality for a few negligable performance boosts.

Could you *be* more mistaken ??? The only performance a perl/shell scripter cares about is the speed of the development process!

Simplicity is the actual goal of a script, it is of course kind of hard to indent one-liners (regarding the readability), the flexibility is perls middle name and what really matters is the quality of the parts, there is no such thing as "overall quality" !!

--
Washington bullets will simply be known as the "Bulle
Re:It's about tools, libraries by rp · 2003-03-19 00:20 · Score: 1

First: XML is about tools and libraries.

Second: XML is a bracketed language, and therefore, much *easier* to parse than arbitrary regular languages, let alone context-free ones. Parsing is the activity of recognizing structure in input that doesn't explicitly encode it. This can be costly because it may require nondeterminism. In an XML document, the structure is already marked up explicitly. The structure can simply be read off deterministically.

Your comment does apply, not to the task of parsing XML in general, but to the problem of wanting to quickly determine if given content, available in a document, matches a certain pattern. This can be done with sublinear algorithms. Note that XML was not intended to be used for that purpose: it's an interchange format, not a data storage format.

Second: nothing prevents you from writing an XML document scanner that uses all the known tricks and algorithms of sublinear string matching on regular (flat) XML, even when it still handles non-regular XML (trees of arbitrary depth) correctly. So your comment doesn't point out an inherent drawback of XML, although it does apply to standard off-the-shelf XML processors.

In short, you have a point, but the conclusion is nonsense.
Re:It's about tools, libraries by Ben+Hutchings · 2003-03-19 01:59 · Score: 1

And almost all regular expression state machines are much more complicated than the state machine you need for an XML parser.

The tokenising can be done with a very simple state machine, but the parsing requires a stack.
Re:It's about tools, libraries by Anonymous._.Coward · 2003-03-19 03:58 · Score: 1

They're still DOM underneath, with all the disadvantages.

No they're not. These tools compile a schema to generate classes and then build instances of those classes (the objects) using the XML values to initialise them. JAXB, castor and SNAQue have nothing to do with DOM.

Binding tools let you, as a programmer, write code over the XML using the same semantic model that the data represents. You never see nodes, children, attributes. You see the real world view. If it's XML about books, ISBN and prices your code works with books ISBN and prices with associated types string, int, float etc.

--
take a triptonica to subthunk
Re:It's about tools, libraries by Anonymous._.Coward · 2003-03-19 04:00 · Score: 1

You bind to records one at a time as they arrive. Processing starts right away and finishes when your stream ends. If the stream doesn't end, so what you just keep parsing on the fly.

None of the tools I mentioned do that. An extension of one will. Soon.

--
take a triptonica to subthunk
Re:It's about tools, libraries by Anonymous Coward · 2003-03-19 10:25 · Score: 0

so you parse (up to) the whole document every time you want to find a node.
Re:It's about tools, libraries by TekPolitik · 2003-03-28 10:16 · Score: 1

There's a number of tools and libraries (with Perl or other languages) beyond plain DOM and SAX that use proper XML parsers and are reasonably easy to use. He should use one of those, and stop complaining.

About two years ago I was looking at all of the publicly available code to parse XML. Not only did I look at the API or callback facility used (all the ones I found had extraordinarily bad APIs and/or extraordinarily bad callback architectures), I looked at the code itself to check its conformance to the XML standard. Not one that I found conformed.

In the end I wrote a new one from scratch, based on all of the relevant standards documents (and with the standards documents basically forming the design spec). It ended up being a hell of a lot simpler to use than any of the publicly available code, and a hell of a lot more flexible.

Plus, I understood XML much better when I was done.

Now you might suggest that by doing this I have forgone the community testing involved in public source code XML parsers. However in two years, there has been not a single bug detected in the parser I wrote. I attribute this largely to the fact that the XML specification and the other specifications it depends on are so thorough and have so many established test cases that if you exercise a little discipline, and make sure you implement as per the spec, it's very difficult to get it wrong.

I tend to agree. by NetDanzr · 2003-03-18 01:20 · Score: 3, Funny

The last book on XML I read and understood was XML for Dummies.

Alas, XML by Joe+the+Lesser · 2003-03-18 01:21 · Score: 0

While I do not think XML can be called 'difficult', I certainly would not want to undergo a large project based on it, especially given it's formlessness. (Yes, it can prove extremely useful for support, but I cringe on using it as a strutural base.) Languages like Java are/have become very intuitive, with a great community support, and a java developer can usally understand a java program he has never seen quickly. I have yet to see the xml come out of the dark ages, and until it decides to define exactly what it is or what it wants to be, I don't think it will.

--
"I only speak the truth"
Karma: null(Mostly affected by an unassigned variable)

Re:Alas, XML by 6hill · 2003-03-18 01:37 · Score: 2, Insightful
I have yet to see the xml come out of the dark ages, and until it decides to define exactly what it is or what it wants to be, I don't think it will.

D'oh. What is the nature of the alphabet? To provide a common set of basic symbols from which to build the contents of a natural language.
XML is a meta-language; it is specifically designed so that you, the user/code monkey/designer can define exactly what it is in terms of your projects. Unlike Java or other programming languages, XML is as free from in-built semantics as possible (i.e. "formless" as you put it) because it was meant to be that way! It's not a programming language, it's an alphabet.
As for the uses of XML, I see a few things where it would be and is of great use:
- storing representation-free data (i.e. same data could be imported into several programs that would then draw a graph, present a table, or devise a representational dance based on it)
- an easily interpreted configuration/etc. language building blocks; readable by humans, operatable by machines, structured by definition
- protocol languages in the lieu of SOAP
And then there's the usual suspects: multichannel publishing, information sharing a la Amazon Associates, etc. XML bends to all these shapes, that's what makes it so beautiful.

Wonder if... by Cnik70 · 2003-03-18 01:22 · Score: 0, Offtopic

I wonder if there is an XML standard to describe internet trolls...

--
-Cnik

Maybe he should have read Knuth by thogard · 2003-03-18 01:23 · Score: 4, Insightful

XLM parsing (just like the TeX language) has a problem that when there are problems in the input files, the situation diverges into two different caes, one requires an infinite memory and the other infinite time to deal gracefully with errors.

None of this would have ever been needed had CS been tuaght properly. There are other concepts to describe how files are to be organized. Some of the systems date from the 1950's. BNF (which seems to work very well for programmers to describe file formats to other programmers) dates from the early 1960's. What was needed is a BNF type grammar that is machine readable.

Would XLM have ever taken off if the web used something sane and not a hacked version of a nasty text formatting system from decades ago?

Re:Maybe he should have read Knuth by JimDabell · 2003-03-18 01:36 · Score: 1

XLM parsing (just like the TeX language) has a problem that when there are problems in the input files, the situation diverges into two different caes, one requires an infinite memory and the other infinite time to deal gracefully with errors.

Huh? The specification is very clear. If there is a problem in the input stream, there is one, and only one thing to be done: terminal failure.
Re:Maybe he should have read Knuth by Anonymous Coward · 2003-03-18 01:39 · Score: 0

> What was needed is a BNF type grammar that is machine readable.

Like Yaml (www.yaml.org)?
Re:Maybe he should have read Knuth by Ed+Avis · 2003-03-18 01:40 · Score: 5, Informative

XLM parsing (just like the TeX language) has a problem that when there are problems in the input files, the situation diverges into two different caes, one requires an infinite memory and the other infinite time to deal gracefully with errors.

WTF? Perhaps you could explain more about these two cases. As far as I know, general XML parsers such as Expat do not require unlimited memory to parse any finite input document, nor do they require infinite time.

The Document Type Description (DTD) system is equivalent to a BNF grammar for XML documents. It's not quite as flexible as a full BNF because it enforces that elements are correctly nested, but I don't see this as a bad thing.

And yes, DTDs are machine readable. Other grammars for XML documents such as DSD, XML Schema or Relax-NG are also machine readable.

Just as with BNF grammars and flex(1), you can take a DTD and generate an efficient parser from it using FleXML.

Comparisons with TeX aren't really appropriate because TeX is a Turing-complete language, and so impossible to parse automatically in 100% of cases (unless you want to allow that your program will sometimes fail to terminate, ie hang, on particular input files). I don't know what you mean by your subject line 'Maybe he should have read Knuth'...

--
-- Ed Avis ed@membled.com
Re:Maybe he should have read Knuth by Anonymous Coward · 2003-03-18 01:51 · Score: 0

XML, you stupid shit, isn't a programming language at all. It is merely a way of wrapping human-readable texts with some human-encoded machine readable markup so that you don't need AI to figure out the structure of a human readable text.
Re:Maybe he should have read Knuth by ignipotentis · 2003-03-18 02:38 · Score: 1

Maybe you should reread that yourself. The point being made was that even for a finite document, as the ammount of information increases (i.e. the depth of the tree) you can generally expect your memory usage to continue to escalate. If a Database was ever developed using this scheme, it could easily run most (read that all but a special few) machines in existance out of memory very quickly depending on the tree search algorith used. For example, the Breadth First Search Algorith is capable of generating 1MB/sec in analyzing nodes... Big deal you say... in 24 hours, if the search is still running... your using 86GB. Mind you, this search could be just to find the color of a dog.

--
Don't waste time... procrastinate now!
Re:Maybe he should have read Knuth by Sique · 2003-03-18 02:42 · Score: 2, Informative

Comparisons with TeX aren't really appropriate because TeX is a Turing-complete language, and so impossible to parse automatically in 100% of cases (unless you want to allow that your program will sometimes fail to terminate, ie hang, on particular input files). I don't know what you mean by your subject line 'Maybe he should have read Knuth'...

Maybe you should read Knuth also... There are two different things: One is the grammar and the other one is the language. You can write a turing complete language in a regular grammar (Chomsky Type 3), completely parseable with regexp (think: (([linenumber] ((INC|DEC) [register])|JMZ [linenumber])[newline])*). You can also write a primitive-recursive language using a free grammar (Chomsky Type 0) (think: your average english book about primitive-recursive languages), which is unparseable within finite time and memory.

So TeX is a Turing complete language written in a Chomsky Type 1 grammar (It should be LL2, but I am not sure). XML for itself is a turing incomplete way to describe Chomsky Type 2 grammars.

--
.sig: Sique *sigh*
Re:Maybe he should have read Knuth by Anonymous Coward · 2003-03-18 02:42 · Score: 0

Hello? If you don't put a closing XML tag then the parser will run to "infinity". Of course the end of file will terminate it, but still, in thoery this requires infinite resources.

Say you have a 1 TB XML data file and all you want is some header information. Well if the entire data set is enclosed in a tag, then the WHOLE 1 TB data file must be read into memory, or somehow indexed.

Sorry, you lose, be sure to play again soon.
Re:Maybe he should have read Knuth by Ed+Avis · 2003-03-18 02:55 · Score: 1

TeX's grammar is odd, because you can redefine character classes and how to parse particular tokens. There are whole chapters in The TeXbook about how TeX processes input characters and the various mechanisms you can use to alter its behaviour.

So the grammar really is bound up with the language. You can't parse TeX code without also evaluating it, as far as I know. I didn't express myself clearly by saying 'TeX is a Turing-complete language', after all, C is Turing-complete but you can write code to parse C. But there are some languages like TeX and Perl where you can redefine bits of grammar on the fly, and the full power of the language is available to do this. So you cannot in general parse code in these languages without risking non-termination.

I don't know what Chomsky's Type 1 grammars are (you described the others in your message but not Type 1), but I'm pretty sure TeX's grammar is not decidable. To see whether a TeX document is syntactically valid you sometimes have to execute it. So there isn't really a line between 'syntactically correct' and the meaning of the program.

All this AFAIK, I am not a hardcore TeX hacker.

--
-- Ed Avis ed@membled.com
Re:Maybe he should have read Knuth by Ed+Avis · 2003-03-18 02:57 · Score: 1

Well of course, if you have a big document it will use memory. I'm not arguing with the proposition that XML is memory-hungry (at least for some parsers and some documents). I was disputing the original poster's claim that parsing an XML document can take _infinite_ memory or _infinite_ time, which is clearly not the case.

(It's possible the original poster did not mean to imply this, his statement was not very clear, talking about 'handling errors'.)

--
-- Ed Avis ed@membled.com
Re:Maybe he should have read Knuth by Ed+Avis · 2003-03-18 03:00 · Score: 1

Hello? If you don't put a closing XML tag then the parser will run to "infinity". Of course the end of file will terminate it, but still, in thoery this requires infinite resources.

I don't know what you mean by 'in theory'. A finite input file requires finite resources. Period.

An infinite input file could require infinite memory to parse it. So what?

Say you have a 1 TB XML data file and all you want is some header information. Well if the entire data set is enclosed in a tag, then the WHOLE 1 TB data file must be read into memory, or somehow indexed.

Not so, it depends on the parser you are using. Some implementations such as DOM will try to read the whole file. But equally you can use a token-based parser such as SAX, read tokens from the file (start tag, end tag, content, attributes) until you get the information you want, then stop processing.

--
-- Ed Avis ed@membled.com
Re:Maybe he should have read Knuth by Anonymous Coward · 2003-03-18 04:22 · Score: 0

The majority of Americans -- the ones who never elected you -- are not fooled by your weapons of mass distraction. We know what the real issues are that affect our daily lives -- and none of them begin with I or end in Q

Oh, I can think of at least one thing that begins with "I" and ends in "Q" that (the lack of) is affecting your daily life....
Re:Maybe he should have read Knuth by Minna+Kirai · 2003-03-18 04:59 · Score: 2, Informative

I don't know what you mean by 'in theory'. A finite input file requires finite resources. Period.

He probably means "taken to the limit". A way of characterizing the performance of a system- how does it fail, when faced with an overwhelming amount of work? (It's like O-notation, which assumes the problem size is infinite to elimiate lower-order effects from the description)

An infinite input file could require infinite memory to parse it. So what?

The intention probably was to point out that a program which extracts from a non-XML database can be written to use constant memory, regardless of file size (or log memory, to be pedantic). Whereas with XML, the memory used increases as long as the file size does.

(There are tricks which can reduce the memory use, but they usually come down to making assumptions about the formatting of the file, which can lead to skipping over malformed XML chunks)

(I'm not espousing those views, just attempting to translate for you)
Re:Maybe he should have read Knuth by 21mhz · 2003-03-18 05:31 · Score: 1

Whereas with XML, the memory used increases as long as the file size does.

You probably ought to say "might increase".

(There are tricks which can reduce the memory use, but they usually come down to making assumptions about the formatting of the file, which can lead to skipping over malformed XML chunks)

Trick #1: impose a (configurable) limit on element depth. XML documents with nesting deeper than, say, 2048, have no reason to exist anyway, even if well-formed. Problem solved.

--
My exception safety is -fno-exceptions.
Re:Maybe he should have read Knuth by Ed+Avis · 2003-03-18 05:34 · Score: 1

But a program to read from XML could also be written to use constant memory, if the XML format does not require arbitrary nesting. For example, if you have the simple DTD

<!ELEMENT people (person*)>
<!ELEMENT person (#PCDATA)>

then files conforming to this DTD will not have nested elements beyond a certain fixed depth (in this case, 2) and so there is a fixed amount of memory which will suffice to parse any file conforming to this DTD.

If you used a different format, one which allowed block structure nested to an arbitrary depth, then as the input files get bigger and bigger they may also get deeper and deeper, so more and more memory is required to keep track of the current depth. But that is a property of the data model, not XML itself. Parsing a pathological C program fragment such as 'if (1) { if (1) { if (1) { ...' has the same problem.

The memory required is a property of the data model you use, it doesn't particularly depend on the syntax used to represent that data model, unless you use a deliberately stupid syntax.

--
-- Ed Avis ed@membled.com
Re:Maybe he should have read Knuth by Minna+Kirai · 2003-03-18 06:00 · Score: 1

But a program to read from XML could also be written to use constant memory,

The part which bothers me in practice, is that it can't take constant time. To reach the Nth entry of an XML file, you've got to read through everything that comes before it. (Technically, you should read the rest of the file too, but a 2x factor doesn't matter in the long run).

This has been a problem for some popular applications. For instance, /. just reported on the new Zaurus 5600 PDA, a more powerful followup to the 5500 model. The prior version used XML to store the PIM data, but the newer (faster) model won't do this, as it's simply not scalable to more than 1000 entries.

(Technically, the old Zaurus data format might not've been XML either, since no DTDs were ever published)
Re:Maybe he should have read Knuth by Ed+Avis · 2003-03-18 06:51 · Score: 1

To reach the Nth entry of an XML file, you've got to read through everything that comes before it.

Hmm, I can see this is a problem. You wouldn't have to fully parse everything that came before but you would have to tokenize at least.

Some kind of extension to XML which let you say 'I expect every 'foo' element to begin at 2000 byte offsets' might partly solve this. It wouldn't make XML suitable as an on-disk format for heavy database apps, but it might help in the middle ground where you want the advantages of XML in parsing and validation, but also want to avoid reading in the whole file for every access. Pad with null bytes between elements and your data will still be a valid XML document.

Alternatively, prohibiting the wacky quoted data sections (the kind that end with ]]!> or something and are hardly ever used), along with making sure that > and < cannot appear unquoted in attribute values (maybe this is already the case), would let you textually scan the file for the string '<foo' to quickly find 'foo' elements. And maybe now I see the point of the original article, that XML cannot be easily scanned with regular expressions. But with a few tweaks perhaps it could be.

--
-- Ed Avis ed@membled.com
Re:Maybe he should have read Knuth by Minna+Kirai · 2003-03-18 07:26 · Score: 3, Insightful

I think the root of that difficulty comes from using XML to solve two different problems. One problem is data transmission between systems- which XML was designed for, and handles adequately. When recieving a data chunk from an external source who might not be trustworthy, a safety-concious program really has to read the whole thing and verify it complies with the format. Skipping over some sections to reach the part you're interested in isn't allowed.

But, for data storage within an application (or a set of tightly coupled systems that trust each other to function correctly), XML is less advisable. Traditional (SQL) databases, or hand-rolled file formats, may be a better solution when high speed and scalability are needed.

JoelOnSoftware has an long article on why XML is suboptimal for the latter use.
Re:Maybe he should have read Knuth by Ed+Avis · 2003-03-18 08:22 · Score: 1

But with today's hardware, it's actually quite rare that you will need an 'efficient' file format at all.

Consider how the Unix idea of storing as much as possible in human-readable text files with variable line lengths has displaced binary formats (or fixed-size records) for many applications. Text files are a lot slower to read and less 'scalable' in many ways but so what? Hardware is fast enough.

Of course there are still many cases where you do need to optimize the file format for speed, since as hardware gets faster data sets get (much) bigger. In these cases perhaps an SQL database or a more lightweight database like Berkeley DB would be appropriate. But in most cases, not handcrafting your own binary format.

--
-- Ed Avis ed@membled.com
Re:Maybe he should have read Knuth by MegaFur · 2003-03-18 10:14 · Score: 1

Dude, you can't even *spell* X-M-L correctly. Why should we trust that you know what you're talking about?

Perhaps someone should have "tuaght" you how to spell.

--
Furry cows moo and decompress.
Re:Maybe he should have read Knuth by SoupIsGoodFood_42 · 2003-03-18 10:56 · Score: 1

Hello? If you don't put a closing XML tag then the parser will run to "infinity". Of course the end of file will terminate it, but still, in thoery this requires infinite resources.
Infinitly? Unless you're data is infinite in size. In which case there is no computer technology that will ever be able to do anything with it.
Say you have a 1 TB XML data file and all you want is some header information. Well if the entire data set is enclosed in a tag, then the WHOLE 1 TB data file must be read into memory, or somehow indexed.
1TB? Just what would someone be doing with an XML file that size? Have you even considered that XML is not meant for such huge datasets (atleast not at this time)?
Sorry, you lose, be sure to play again soon.
Sorry. But you loose, because you obviously don't know the rules or the objective of the game.
Re:Maybe he should have read Knuth by J.+Random+Software · 2003-03-18 13:25 · Score: 1

<foo isn't allowed in an attribute value, but it could appear in marked sections, comments, or processing instructions (which aren't too hard to ignore), or in a processing entity declaration in the prolog (which has to be fully parsed to deal with), or in a few other places if you read the external DTD subset. It also won't do what you want if the element you want is in a namespace (the correct prefix, if any, is unpredictable) or if the document uses default namespaces (you might have found someone else's foo element from a different namespace).
Re:Maybe he should have read Knuth by fymidos · 2003-03-18 18:33 · Score: 1

Not so, there are two worlds, mathematics and the "real" world. You propably only need a finite amount of fuel and time to travel to another galaxy. But the more fuel you load, the more fuel you need, the more you travel, the more time it will take as the other galaxy is moving away from you, and ,as you propably know, it cannot be done.

You see an input file that takes up all the space on your system is always a 'finite' file but most certainly cannot be parsed if it has an opening tag that doesn't close.

--
Washington bullets will simply be known as the "Bulle
Re:Maybe he should have read Knuth by fymidos · 2003-03-18 18:39 · Score: 1

You understand of course that if you need a computer program to read it you will need a computer program to write it. In my exp, computer programs don't quite understand why 2048 is the number to stop nesting. Sometimes they are just not reasonable :)

--
Washington bullets will simply be known as the "Bulle
Re:Maybe he should have read Knuth by Ed+Avis · 2003-03-18 21:05 · Score: 1

Of course a file cannot be parsed if it has an opening tag that cannot be closed. XML requires all tags to be closed. So the file would not be well-formed XML. I don't see the fact that such a file is not parsable as a problem, as long as the XML parser gives you a meaningful error message ('the tag is opened on line 10 but not closed'), and most do.

You'd only run out of memory if you had an input file that looked like ...

for many megabytes. But even that is only allowable if your XML file format (DTD or whatever) permits elements to nest inside other elements. Whether this is true depends on the application. As I mentioned in another post, the same problem applies equally to any language that allows constructs to be nested to arbitrary depth, including C source and Lisp sexps. It's not particularly a limitation of XML.

--
-- Ed Avis ed@membled.com
Re:Maybe he should have read Knuth by Ed+Avis · 2003-03-18 22:46 · Score: 1

I don't think the namespaces are problematic (you just have to look for /<.+:foo/, right?) but it might be nice for things like processing instructions to be removed, and for comments to disallow the < character. Assuming that the advantage of being able to jump to a random part of the file and quickly find particular elements outweighs the advantage of being able to put unescaped < characters in comments.

Hmm, I ought to have a look at the various XML-lite proposals and see how many of them have this property.

--
-- Ed Avis ed@membled.com
Re:Maybe he should have read Knuth by J.+Random+Software · 2003-03-19 05:58 · Score: 2, Informative

It's fairly common to comment out markup when hand-editing, since <![IGNORE[...]]> can't be used within a document. Skipping non-markup in the document should be just a matter of matching the Perl regex
( [^<]+ |  | <\?.*?\?> | <![CDATA[.*?]]> )* <foo

If someone else defines a foo element in a different namespace, I don't see how you can do anything other than ignore it--it's almost certainly not what you were looking for, and you have no idea what it might mean.

Re:xml by Pyromage · 2003-03-18 01:23 · Score: 4, Informative

XML isn't intended for web pages. That's what you missed:

It's biggest use right now is data interchange. Moving bits between one magic widget and another. And for that, HTML sucks. It just can't represent arbitrary data. Programming languages (C++, Java) are for instructions, not data.

XML fits in perfectly where it's at use-wise. Tim Bray is talking about programming for it: The available interfaces are very counter-intuitive, and that's what Bray's getting at.

This does not bode well by fudgefactor7 · 2003-03-18 01:23 · Score: 1, Insightful

When an author says his work was not well done, that should be a sure fire red-flag that perhaps the whole thing should be aborted like an unviable fetus.

Re:This does not bode well by JimDabell · 2003-03-18 01:32 · Score: 5, Insightful
Did you actually read the article?

I can sum it up very easily:
- Callbacks irritate him.
- It's not always practical to build a tree in-memory.
He's looking for a nicer api for processing XML, he's not looking to replace XML entirely.
Re:This does not bode well by ChimChim · 2003-03-18 01:36 · Score: 2, Insightful

He never said his work, XML, is not well done. What he said was that the programming languages, APIs, and Environments haven't made the task of processing XML easy enough. XML itself is sound, or as sound as many alternatives.

The thing is, back in the day when people wore onions on their belts, programmers had to be convinced that UNIX's "file is a bag of bytes" form of data access was better than the more direct/powerful/convenient methods they'd been used to, like raw access to the drive. But programmers aren't users, and what's great for users, or has benefits beyond the realm of CS will always complicate things for the programmer. However, the more complicated things are for programmers, the longer it will take to build systems and get usable products. So Tim Bray is basically saying that XML has succeeded in the data-interchange modekl, but is failing to also make programmers lives easier, which is also important.
Re:This does not bode well by Random+Walk · 2003-03-18 01:39 · Score: 3, Insightful

After reading the article, I would say he tries to use XML for something it is not very suitable for, and argues that in this case the available libraries are not useful (surprise ...).
XML is not a stream - it has a hierarchical tree structure, and IMHO is not useful for anything that (a) by its very nature is a continuous stream of data (say, a log file), or (b) wants to be processed as a stream (because it's big, and would require too much memory to be handled as a single data structure).
The problem seems to be that XML is good for portability and standardization, and therefore is abused for things it's not well suited for (the well-known 'if all you have is a hammer, every problem looks like a nail' syndrome).
Re:This does not bode well by blancolioni · 2003-03-18 02:06 · Score: 1

Callbacks irritate him.

I've used XML Ada successfully, and it uses inheritance instead of callbacks. Maybe that would irritate him less.

It's not always practical to build a tree in-memory.

I'm not sure what that means. At some point you have to grab the information you want, and that may or may not mean building a tree. But it's up to the programmer. The last time I parsed a big XML file was to convert the public Cyc knowledge base into a different format, no tree required.

I'm still not sure what advantage XML has over Lisp. I'm near certain that the hype and giddiness is overdone.
Re:This does not bode well by JimDabell · 2003-03-18 02:22 · Score: 1

It's not always practical to build a tree in-memory.

I'm not sure what that means. At some point you have to grab the information you want, and that may or may not mean building a tree.

Sorry, I guess I could have been clearer. The author feels that there are really only two widespread, implemented options for parsing XML "properly" - callback-style, and in-memory-tree-style. One was inappropriate, and he didn't like the other, so he fell back on regexps. I have to say, I sympathise, but there's no way I would chuck away a proper parser for a few regexps.

I'm still not sure what advantage XML has over Lisp. I'm near certain that the hype and giddiness is overdone.

Oh, XML is definitely overhyped, even its most staunch supporters would agree with you there. I did read a very good article not too long ago about the benefits of XML over LISP, but google is being uncooperative right now. I think that the main reason is more to do with the human aspect of it - there's something human-friendlier about XML than LISP.
Re:This does not bode well by Citizen+of+Earth · 2003-03-18 02:57 · Score: 1

Callbacks irritate him.

A proper low-level API should work like open/read/close fetching a 'token' at a time. Callbacks are so painful to manage that I have to wonder why programmers have been so masochistic as to continue to tolerate SAX-type APIs. expat was such a pain in the ass that when faced with the task of processing XML, we wrote our own parser.
Re:This does not bode well by sdcharle · 2003-03-18 03:06 · Score: 1

Eh, I don't think so. It's quite common for someone to look back on something they've created, and notice all the flaws and things they could have done better.
Re:This does not bode well by blancolioni · 2003-03-18 04:08 · Score: 1

Sorry, I guess I could have been clearer.

Ha. I could have read the article.

I have now. I'm surprised to hear that the only options for XML parsing are callbacks and full tree creation. Even if you don't use Ada, you might find the XML Ada stuff interesting, and it looks like an easy paradigm to convert to the OO language du jour.

there's something human-friendlier about XML than LISP.

It's that parenthecitis thing again. I can see that, but it seems that XML is being used for a lot of things that are read and written only by machines. Save formats, for example.
Re:This does not bode well by jefp · 2003-03-18 04:34 · Score: 1
- Callbacks irritate him.
- It's not always practical to build a tree in-memory.
Yes. When I was handed an XML problem my reaction was the same. SAX is inconvenient and DOM is too slow for large files. However instead of whinging about it, I wrote my own parser API, which doesn't fit into either of those categories. It's very simple to use, and fairly fast. Basically, it's an iterator which returns an object for each span of text in the file. The spans come with a stack of attributes showing what XML elements they are enclosed in. That's about it. I hope to release it as freeware soon. I still need a catchy three-letter acronym for it, tho.
Re:This does not bode well by JimDabell · 2003-03-18 04:38 · Score: 1

How about XIP: XML Iterating Parser?
Re:This does not bode well by jefp · 2003-03-18 04:43 · Score: 1

Pronounced 'zip'. That's pretty good.
Re:This does not bode well by Anonymous Coward · 2003-03-18 05:32 · Score: 0

I agree with your conclusion.
He should really look into the XML-pull API, wich does neither..
Re:This does not bode well by kataklyst · 2003-03-18 16:18 · Score: 1

Sounds like a good idea.

It might be nice to include a method on the object your iterator returns to test if the stack matches a given XPath. That'd make your code really simple if you just wanted a few things out of a complex document. I guess a full XPath implementation would be a bit of a chore, but you could get most of the gain with just a small subset of the spec.

Re:I agree to tend. by Anonymous Coward · 2003-03-18 01:26 · Score: 0

You should try reading O'Reilly's upcoming "Undocumented XML Hacks". From what I can tell, it will be really insightful.

Re:xml by Anonymous Coward · 2003-03-18 01:26 · Score: 0

Yeah, we don't need no steekin' standard! We can design our own protocols, we can design our own configuration language, we have all the free time we need to think of those things! They're *fun* to do!

Re:xml by CynicTheHedgehog · 2003-03-18 01:27 · Score: 2, Informative

When you're writing an application and you have to decide what format messages should be written in, or what type of file configuration data should be stored in, most people say, "Why, XML, of course. That way we're guarenteed that it is extensible, transformable, and readable by anyone who would ever need to read it." Granted, there are lots of other document formats in which that is the case, but they are not industry standard. As long as there is a schema, everyone will accept it. And if it's not in the format that they would like, they are free to run it through an XSL transformation. Easy as pie.

XML is not hard, but it is a discipline. It requires a lot of reading and a fair amount of practice, but once you have it down, that's it. And from now on, your document storage design decisions (barring any space/memory constraints) are made for you.

Re:I can't believe it !! by Anonymous Coward · 2003-03-18 01:28 · Score: 0

Perhaps you should go out and buy some duct tape then? Hmm?

Not worth it by AppyPappy · 2003-03-18 01:28 · Score: 1

I've been a programmer for 22 years and XML has never interested me. I've seen the "great new thing" come and go over the years. APL, PL/1, OS/2, etc. XML looks amazingly like the old "great new things" that went onto the heap.

Note: PHP and Linux look a lot like the things that DIDN'T go on the heap. Simple to understand, easy to use, powerful. If a non-programmer can grasp it easily, it usually doesn't go on the heap.

--

If you aren't part of the solution, there is good money to be made prolonging the problem

Re:Not worth it by Anonymous Coward · 2003-03-18 01:48 · Score: 0

If a non-programmer can grasp it easily, it usually doesn't go on the heap.

Sorry, how does that apply to Linux?
Re:Not worth it by LizardKing · 2003-03-18 01:56 · Score: 1

PHP and Linux look a lot like the things that DIDN'T go on the heap. Simple to understand, easy to use, powerful.

For PHP, add to that: resource hungry and many of the apps written with it are notoriously insecure. For Linux, add: increasingly over engineered and poorly documented. It is simple to use and powerful, but the learning curve's steep for Joe Sixpack.

If a non-programmer can grasp it easily, it usually doesn't go on the heap.

Your implication being that if it can be understood by the great unwashed, then it must be good. Where does that leave Unix versus Windows then? What about COBOL and VB? The majority of well informed people would argue that Windows, COBOL and VB are dubious technologies ... yet they prosper because they are relatively simple to use.

Frankly, if I interviewd you for a job at my firm and you trotted out the "real programmer" and "great new things are shit" mantra I'd show you the door double quick.

Chris
Re:Not worth it by Xformer · 2003-03-18 02:25 · Score: 1

For PHP, add to that: resource hungry...

It's a web scripting language... what'd you expect? ...and many of the apps written with it are notoriously insecure.

That's not exactly the fault of the language, now is it? Sounds more like a problem with the people using it.

--
All I want is a kind word, a warm bed and unlimited power.
Re:Not worth it by Anonymous Coward · 2003-03-18 02:46 · Score: 0

Good. Wouldn't want to work for someone with an advanced case of technophillia, anyway.

Use what works.
Re:Not worth it by Anonymous Coward · 2003-03-18 02:59 · Score: 0

If you interviewed me, I'd shoot you and be awarded a stockpile of medals from the UN, recognizing my service to humanity.

"For PHP, add to that: resource hungry and many of the apps written with it are notoriously insecure."

If you don't know how to program, stay the hell out of the editor. *shrug*

"For Linux, add: increasingly over engineered and poorly documented. It is simple to use and powerful, but the learning curve's steep for Joe Sixpack."

Joe Sixpack. I love that phrase, especially when people like you have no idea what it refers to. Joe Sixpack isn't going to use a computer for much more than sending e-mail with the caps-lock key pressed down and browsing for porn.

K-Mail makes the former easy, Mozilla makes the latter much more efficient with the ability to block pop-ups and advertisements.

Oh, but I know it's so hard to type in a username and password to log into a computer and be whisked right into KDE. Let's ignore the fact that Joe Sixpack would feel proud in his ability and requirement to do so, when his friends don't have to. He's a compututer genius, eh?

"Your implication being that if it can be understood by the great unwashed, then it must be good."

You imply that software development is the sole domain of programmers.

You work in a little podunk place, don't you?

"The majority of well informed people would argue that Windows, COBOL and VB are dubious technologies ... yet they prosper because they are relatively simple to use."

Windows is relatively simple? Please. Go down to your local help desk and see how 'simple' it is to use.

Cobol's simple? I must laugh, yet my shock prevents me from doing so.

As for VB being dubious.. You certainly aren't well informed. VB is excellent for the rapid development of small applications on Microsoft-based platforms.

"...at my firm..."

The Gods help your firm.
Re:Not worth it by AppyPappy · 2003-03-18 03:19 · Score: 1

but the learning curve's steep for Joe Sixpack.

I have to disagree. Most people can use Linux once it is installed, especially Mandrake and the like. They can't tell the difference between it and MS Windows. You click, stuff comes up. They're happy. You have email, you have a spreadsheet, you have a word processor, you have a browser. That's all you need. What you DON'T have is the ability to screw up your machine with email viruses and screensavers from Hell.

--
If you aren't part of the solution, there is good money to be made prolonging the problem
Re:Not worth it by LizardKing · 2003-03-18 03:26 · Score: 1

That's not exactly the fault of the language, now is it?

Well, there have been a number of posts on BugTraq describing holes in PHP itself, but it seems like a new hole in apps like PHPNuke appear every week.

Chris
Re:Not worth it by LizardKing · 2003-03-18 03:36 · Score: 1

I have to disagree. Most people can use Linux once it is installed, especially Mandrake and the like

True, but to exploit the full potential of a Unix like operating system then you need to learn the command line. I agree that people who just want web surfing, word processing and e-mailing capabilities would find Linux no harder than Windows. What's harder is for former Windows programmers to grasp concepts like pipes, "everything's a file" and the single rooted filesystem (amongst other things). Once they've got their heads around these concepts though, then things go swimmingly.

Chris
Re:Not worth it by Xformer · 2003-03-18 04:14 · Score: 1

The number of BugTraq posts for holes in PHP itself doesn't begin to equal the number in applications that are built on it. That just goes back to my earlier statement.

One could rewrite PHP web applications in a number of other languages, and they'd still have just as many chances of containing security holes.

--
All I want is a kind word, a warm bed and unlimited power.
Re:Not worth it by Anonymous Coward · 2003-03-18 05:55 · Score: 0

I think you misunderstood him. XML is the newest
stupid trick/fad we all have to bow down to.

It is tiresome to see these come and go and when
they come someone always thinks they are the greatest
thing since ice cream. When they go, no one
mentions them.
Re:Not worth it by AppyPappy · 2003-03-18 06:13 · Score: 1

{ maynard g krebbs } PROMPT!! { /maynard g krebbs }

--
If you aren't part of the solution, there is good money to be made prolonging the problem
Re:Not worth it by TheLink · 2003-03-18 06:53 · Score: 1

Actually PHP has to take a fair bit of the blame.

Track vars, magic quotes... Sheesh.

By the time you disable all the insecurely or badly designed features, you end up with something that doesn't look like PHP. And you start wondering why you are using PHP and not some other language.

PHP has a disproportionate number of badly designed features. They look great to newbies, but they come back and bite them.

Which is why PHP Nuke and relatives are in such a mess. They're programs done the classic PHP way, and PHP makes it easy for programmers to slip down that slope.

Some newbie thinks a Magic PHP feature saves them a lot of work compared to some lame old language, later someone turns off the crap feature and adds addslashes one by one in appropriate places. Then later, someone has to remove the addslashes and use a proper DB interface.

You should filter inputs so your program can cope with them, and you should filter each output accordingly so the programs you are sending data to will handle them properly, and data doesn't get corrupted or lost.

The fact that they create features like magic quotes gives me a bad impression of the PHP language designers.

Link.
--
- Too many replies beneath your current threshold
Re:Not worth it by Anonymous Coward · 2003-03-18 10:46 · Score: 0

Come on, you forgot Java! :-)
Re:Not worth it by Anonymous Coward · 2003-03-18 11:55 · Score: 0

What's harder is for former Windows programmers to grasp concepts like pipes, "everything's a file" and the single rooted filesystem (amongst other things).

From experience, that's trivial - none of them are particularly hard concepts to get your head around. Afterall, a crude form of pipes existed even in DOS, and is the single-rooted filesystem really that much more of a step from "My Computer", which acts much like a virtual root from a UI perspective?

Re:xml by BFKrew · 2003-03-18 01:29 · Score: 2, Informative

On the web, a big problem is that the content of the page is mixed in with the formatting. So, this content cannot be displayed easily on a PDA, phone or even across different browsers to an extent.

By separting the content from how it is displayed makes it easier to display it in pretty much any format. By taking a single XML document you could create a page that looks great on Mozilla, great on IE, a WAP enabled phone, Opera, Microwave, Fridge - whatever!

XML is NOT a programming language. It is more like a way of describing data and one MAJOR benefit in my opinion is that it is human as well as machine readable. I can ask my 'pointy haired boss' to make an ammendment to an XML document and he will pretty much be able to read it quite easily.

It has plenty of uses such as a way of sharing data. There is no reason, for example, why a XML source could not be used in other webpages, as an input source for a database, or even as a way of getting output from your C++ program into my Java app, my ASP.NET page or even another C++ program!

XML by Anonymous Coward · 2003-03-18 01:29 · Score: 0

XML is ASCII with tags and content between the tags...
This sounds simple, but experienced programmers (unlike that one in the topic) do not NEED to go deeper into the theory since XML programming *IS* pretty stereo-type.

Think of a XML file like a filesystem where tags are directories and content are files.. and you'll know how to get ANY job done with only a handful of functions..

But now the inventor himself flamed XML.. so what should I care about it..

Re:xml by Uller-RM · 2003-03-18 01:31 · Score: 2, Insightful

Since you apparently know nothing about XML, try reading the article. You'll learn something new, and you won't have to talk out your ass on this topic.

XML's not a language -- it's a grammar, a guide of sorts, for hierarchical data storage. You design file formats that conform to XML. The goal is that it's easy to read that file format in any language or platform (given a XML processor/parser for that platform), since your data is stored in plain human-readable UTF8-encoded text.

Might as well poke fun at the rest of your idiocy -- as it happens, HTML 4 is pretty close to being XML-conformant, and the W3C's now pushing XHTML which is fully conformant.

Granted, a lot of people treat XML as another buzzword, the way that OOP once was. It's not a magic bullet -- it's just a guide to making cross-platform file formats, and it works pretty well for that.

Hysteric? by Anonymous Coward · 2003-03-18 01:32 · Score: 0

Don't take life too serious... or you might become a terrorist. :))))

Iron guys by buddha42 · 2003-03-18 01:32 · Score: 1

I would think that loading the whole thing into memory wouldn't be a problem for the 'to the iron' guys he mentions. The best use I can think of there are configuration files, in which case you want the whole thing anyway, and you usualy only load it once at startup.

Re:Iron guys by Minna+Kirai · 2003-03-18 10:33 · Score: 1

A programmer who writes 'to the iron' probably has a need to do so- either his hardware is very limited / slow (11 mhz embedded CPU), or his performance requirements are stringent and intense.

In either case, the application might not have the freedom to allow an unknown amount of RAM to be allocated when reading config settings.

The problem isn't with XML... by Anonymous Coward · 2003-03-18 01:32 · Score: 0, Troll

The problem is that a vocal minority either haven't taken the time to understand how XML works, or aren't trying to use it appropriately.

I'm not saying XML as it is today is perfect, but it's broadened my skill as a web developer and has allowed me to help my company do things it was only imagining 2 years ago.

[ Reply to This ]

Re:The problem isn't with XML by MrWa (Score:1) 10:10 PM March 17th, 2003

Re:The problem isn't with XML... by Anonymous Coward · 2003-03-18 02:54 · Score: 0

Oh, genius. Yeah, cause hiding goat.cx links by pretending they are forum components is really insightful.

here's the thing by Anonymous Coward · 2003-03-18 01:33 · Score: 0

you know how you php/perl/python weenies make fun of "HTML Programmers?" That's how real programmers feel about you.

Re:xml by Covener · 2003-03-18 01:33 · Score: 1

XML isn't intended for web pages. That's what you missed:

It's biggest use right now is data interchange. Moving bits between one magic widget and another. And for that, HTML sucks. It just can't represent arbitrary data. Programming languages (C++, Java) are for instructions, not data.

XML fits in perfectly where it's at use-wise. Tim Bray is talking about programming for it: The available interfaces are very counter-intuitive, and that's what Bray's getting at.

Sounds more like his web/flash stuff are clients to the data, and he's trying to avoid doing transformations to accomodate them ahead of time.

Re:Let me get this straight... by Anonymous Coward · 2003-03-18 01:33 · Score: 0

Naw, whiners like that don't vote. They can't be bothered to vote. They'd rather let a judge do their rearranging.

Short summary by Anonymous Coward · 2003-03-18 01:33 · Score: 5, Informative

Tim Bray thinks that callback based XML apis are a bit awkward to use. He would prefer to use something like a pull parser (see for example http://www.xmlpull.org for examples in java) to the current perl xml apis.

And he would probably want to be able to parse parts of documents ("XML Fragments"), rather than whole documents.

I agree with his views (not using perl too much, though). But this is *not* the end of XML or anything. Tim just has some thoughts about how the xml api could be better in perl. Not very exciting, perhaps...

What? by Dr.+Evil · 2003-03-18 01:34 · Score: 1

You mean BNF is for humans!?

He is right, I think. by expro · 2003-03-18 01:34 · Score: 3, Interesting

Among other things ...

(1) They need to eliminate the doctype can of worms. Unfortunately, this cries out for an alternative solution for character entities.

(2) Namespaces need to be simplified and better integrated into the core of the language. Expanding on this, there need to be much better mechanisms for modularizing parts of the markup so that it isn't necessary to parse and hold everything in memory to make sense of it.

(3) There needs to be clean-up and standardization of element id's and references, integrating it with (1) and (2).

Do others have more? Should this be done compatibly with XML?

I believe that we really need a standard for arbitrary abstract data models, with XML as just one syntactic representation, but I would have to go into long details to justify this.

Re:He is right, I think. by kalidasa · 2003-03-18 02:03 · Score: 4, Insightful

1. Doctype is necessary. Perhaps you've never tried handling a very complex text (a big DOCBOOK text or a big TEI text). You need to know what kind of text you're dealing with, and there's no way to come up with one universal solution for all kinds of texts. The only character entities needed are the handful of named entities that are part of the standard: < > & etc. The rest can be handled by Unicode (including the PUA) and transcoding (if you are using a ISO 8859 encoding and you need a character outside that encoding, then you need to rethink the encoding you've chosen to use. UTF-8 is your friend). Entities really are good for more complex units (strings, etc.), rather than single characters. What character entities have to do with DOCTYPE is beyond me.
2. True
3. Standardize element IDs? Element IDs are part of the text, not part of the structure. They're simply a way of simplifying the difficulty of accessing random parts of text.
I believe that we really need a standard for arbitrary abstract data models, with XML as just one syntactic representation, but I would have to go into long details to justify this.

So you're saying we need a meta-meta-language? The *MLs are a standard for arbitrary abstract data (and text) models (because not all texts are hierarchical like DBs).

I think the problem here is that DB programmers (I'm excepting Bray from this) are overusing XML for very simple DB tasks that it wasn't intended for. If you're just doing a 40 field, 30,000 record flat DB, XML is NOT the solution. But it is the best solution for complex non-hierarchical data (i.e., books, etc.).
As for Bray, I don't think he's saying XML itself (the markup standard alone) is too hard, that it should be abandoned. I think he's saying we haven't come up with simple enough ways of accessing XML data through APIs. But of course that wouldn't be a spicy enough meatball for the Taco.
Re:He is right, I think. by expro · 2003-03-18 02:30 · Score: 1

I know that the DOCTYPE is essential for certain processing environments today. Having available schema is a good thing. But there are big problems with the syntactical way the DTD and internal subset are supported in XML. It would be much better if the schema were completely switchable and validation and normalization were loosely-coupled operations on an abstract data model rather than syntactic operations at parse time. I would have no problem with DOCTYPE if it were a seperate module that you could choose to support or not support. That is why, for example, SOAP had to subset XML -- because a vicious user can define a few parameter entities and blow the server out of memory with syntax.

While a machine processor and abstract model can easily live without a bunch of character entities, standards such as MathML rely on the heavily to make it readable in raw form for users.

IDs as supplied with DTDs have a variety of problems. If doctypes went away, you have no IDs anyway. The concept of references seems to transcend many lesser considerations of a schema. It is, really, a structural question in most uses, even if not in HTML.

I agree thast XML is no DB. It is a format useful for transferring things between object models such as DBs. HTML should have an abstract data model before it is a syntax. That would tell us, for example, what is the same between the HTML syntax rendition and the XHTML syntax rendition.
Re:He is right, I think. by Bazzargh · 2003-03-18 07:20 · Score: 1

I believe that we really need a standard for arbitrary abstract data models, with XML as just one syntactic representation, but I would have to go into long details to justify this.

You mean XMI? (XMI = the XML serialization of the OMG MOF, which is the meta-metamodel for the UML metamodel? The MOF is its own metamodel so you stop there BTW)

Oooh yeah, bring the complexity baby! If you thought working with XML was bad just take a look at that stuff. And weep. Its XML Schema on steroids.

Re:xml by The+Apostrophe+Guy · 2003-03-18 01:36 · Score: 0

"It is biggest use right now is data interchange."

ahem...

Re: of course there is!! by borgdows · 2003-03-18 01:37 · Score: 0, Troll

In SOVIET RUSSIA, XML standardizes YOU!! Let's bomb the french! Anyway, XML is for loosers!

SGML, then? by Anonymous Coward · 2003-03-18 01:38 · Score: 0

Personally, I don't want to go back. At least XML is a bit more regular.

Re:of course there is! (sorry for the prev post) by borgdows · 2003-03-18 01:39 · Score: 0, Flamebait

In SOVIET RUSSIA, XML standardizes YOU!!
Let's bomb the french!
Anyway, XML is for loosers!

His idiom. by palad1 · 2003-03-18 01:40 · Score: 5, Insightful

He's stating that he'd basically like others coders write more code the way he sees fit.
[quote]
while () {
next if (XX);
if (X|||X)
{ $divert = 'head'; }
elsif (XX)
{ &proc_jpeg($1); }
# and so on...
}
[/quote]

Repeat after me: I will never leave parsing XML up to a regexp especially if my xml may contain CDATA and Comment sections. I will never...

Unless you are 100% certain the file you are parsing is directly under your control, ie: no comments, no cdatas, params always in the same order, same indentation, same bloody encoding [pardon my french], well, you just will have to acces the data using some kind of DOM or abstract tree representation.

I don't think he thinks no one uses XML, he seems to deplore the fact that some people don't get it at all and resort to heavy duty tools for trivial tasks [thus justifying his example above].

Basically XML is quite simple, but that's not the matter, the problem is that XML bundles ACTUAL DATA, it's all about the complexity of those data, not the API used to access it [although writing a DOM implementation is a real pain]

Re:His idiom. by Anonymous Coward · 2003-03-18 03:41 · Score: 0

[pardon my french]

I'm sure you meant: [pardon my freedom]
;)
Re:His idiom. by Anonymous Coward · 2003-03-18 04:08 · Score: 0

"...order, same indentation, same bloody encoding [pardon my french], well, you just will have ..."

Don't you mean pardon your freedom? ;)
Re:His idiom. by Anonymous Coward · 2003-03-18 07:08 · Score: 0

well, actually this idiom looks very similar to what we used to use when we'd pipe the ESIS output from the SGMLS parser into our perl programs. ESIS output is a normalized format that allows you to write event driven programs based on element starts and ends. It looks something like:

ELEMENT1(
content
)ELEMENT1

Re:xml by mystery_boy_x · 2003-03-18 01:41 · Score: 1, Troll

XML was never intended to be a replacement for HTML or anything else.

XML is fundamentally very simple and easy to understand. It is only DOM and other such atrocities that make it hideous. DOM is a prime example of how to make a technology designed for simplicity and flexibility and turn it into a hideous morass.

--
I am not a lawyer but my sister is, so don't mess with me

Re:Let me get this straight... by Saddam+Hussein · 2003-03-18 01:41 · Score: 1

No, I plan to kill him.

XML is good by Ender+Ryan · 2003-03-18 01:42 · Score: 4, Interesting

I don't understand why so many people complain about XML so much. It's really quite useful for storing arbitrary data. We have several hundred thousand text-based documents where I work, and it has been a total nightmare, until I converted the whole thing(well, I'm not done yet...) to XML.

The documents are generally displayed as HTML on the web, but they're also read by a couple different programs for different purposes. When I first started here, it was mostly a mess of poorly hand-written HTML, but thankfully there were *only* about 20k documents at the time.

I was charged with the task of writing said programs to read these damn files. Unfortuneately, they weren't all marked up the same...

Now that we have XML and standard libraries for reading XML, it makes handling these documents a snap. Any program that needs to read them can simply have an XML parser plugged into it. The integrity of the documents themselves is maintained by the fact that they don't work if they're not properly marked up. So all these documents work, 100% all the time, and writing programs to read said documents is very simple and not prone to errors.

Yay for XML! :)

So, to sum up, XML is doing what it was meant to do, no less. Unfortuneately, it's also probably doing a bit more as well, XSL anyone? Yeck, why not just have a stand XML scripting language, why the need for the language to be valid XML itself?

--
Sticking feathers up your butt does not make you a chicken - Tyler Durden

Re:XML is good by osgeek · 2003-03-18 02:36 · Score: 1

Why learn two markup formats when you can learn one? I find DTDs to be an unnecessary bit of mental parsing, when XSL is so much more straightforward.

--
Why are you letting these clowns ruin our country?
Re:XML is good by regen · 2003-03-18 04:13 · Score: 1

So, to sum up, XML is doing what it was meant to do, no less. Unfortuneately, it's also probably doing a bit more as well, XSL anyone? Yeck, why not just have a stand XML scripting language, why the need for the language to be valid XML itself?

So that you can write XSL that operates on other XSL. This also the advantage to have schemas (XSD) also being valid XML. I've written a bunch of XSL which transforms XSL and XSD.

For example, lets say that you have several different files that have related layouts, you can write a master schema and stylesheets to transform this master schema into the schemas for the different files. Likewise, if you would like to purform similar transforms on these different files, you can write stylesheet to transform a master stylesheet into a stylesheet for each of the different files.

This solves the problem of keeping a set of related schemas and stylesheets in sync. You change the master stylesheet or schema and then transform the master into the various different versions.

--
The Economics of Website Security
Re:XML is good by Twylite · 2003-03-18 04:18 · Score: 1

XML falls far short of its goals. See my critique for a far more detailed analysis.

The short of it is that XML contains multiple redundant ways to store data, and implicit within XML is processing of data. This makes it an encoding/format that is prone to implementation errors and security concerns. It took years for the major vendors/creators of XML parsers to achieve interoperability.

A good grammar that meets the goals of XML should have a single, clear structure and be inert (no implicit processing).

The intent of XML is good; the execution is bad. XML could be greatly simplified without losing any of its power other than human readability, which is a goal of questionable virtue as it is.

--
i-name =twylite [http://public.xdi.org/=twylite], see idcommons.net
Re:XML is good by Ender+Ryan · 2003-03-18 04:45 · Score: 1

The link you posted doesn't work.
XML could be greatly simplified without losing any of its power other than human readability
But that is PART of the intent. I think XML would lose a lot of it's utility without human readability.

--
Sticking feathers up your butt does not make you a chicken - Tyler Durden
Re:XML is good by mcelrath · 2003-03-18 05:14 · Score: 1

It's quite simple really. Taking a look at my 650k galeon bookmarks file (stored in XBEL, an XML syntax for bookmarks), and using the utility xmldiff (a python script that attempts to find differences in xml), I obtain the following results:
$ time diff bookmarks.xbel bookmarks.xbel.old > /dev/null 0.073u 0.042s 0:00.12 91.6% 0+0k 0+0io 0pf+0w $ time xmldiff bookmarks.xbel bookmarks.xbel.old > /dev/null (I got impatient, top tells me 7:28 and counting -- if I remember correctly it takes more than 1/2 hour)
That is, parsing XML is more than 5000 times SLOWER than the line-by-line comparison. It is inappropriate for most purposes simply because the resources required to parse any decent-sized XML file are prohibitive.
-- Bob

--
1^2=1; (-1)^2=1; 1^2=(-1)^2; 1=-1; 1=0.
Re:XML is good by ProfKyne · 2003-03-18 05:27 · Score: 1

why not just have a stand XML scripting language

You mean like Jelly?

--
"First you gotta do the truffle shuffle."
Re:XML is good by 21mhz · 2003-03-18 05:56 · Score: 1

That is, parsing XML is more than 5000 times SLOWER than the line-by-line comparison. It is inappropriate for most purposes simply because the resources required to parse any decent-sized XML file are prohibitive.

Thank you for this deep conclusion. You have just compared a highly optimized utility written in C with a Python script of dubious maturity. These utilities are doing different things, by the way, so they are not interchangeable. You did not provide any consideration of algorithms and optimizations used by both. You then presented the results as an all-out defeat of XML as a format. For anyone who accepted your, erm, logic, I have a bridge to sell.

--
My exception safety is -fno-exceptions.
Re:XML is good by mcelrath · 2003-03-18 07:07 · Score: 1

I am simply pointing out with an example that parsing XML is O(n^3) while diff is O(n). On the XML, that's one 'n' to find the end of the tag, another 'n' to find the closing tag, and another 'n' for the number of tags in the document. Ok now go argue that some of my n's are bigger than others, and reduce it to O(n^2), but that is the best you will do. O(n^2) is nothing to be proud of, especially for a data STORAGE format.
Show me a better xmldiff and I'll test it. Also see my other responses in this thread.
-- Bob

--
1^2=1; (-1)^2=1; 1^2=(-1)^2; 1=-1; 1=0.
Re:XML is good by 21mhz · 2003-03-18 10:09 · Score: 1
I am simply pointing out with an example that parsing XML is O(n^3) while diff is O(n).
1. Xmldiff is a bad example for parsing.
2. Just because you can't come up with an O(n) parser doesn't mean it's impossible. Look at the expat sources at some time, and read up on stack automata.
--
My exception safety is -fno-exceptions.

Re:of course there is! (sorry for the prev post) by borgdows · 2003-03-18 01:42 · Score: 3, Funny

arggh!!! fuck'in XML tags!! lol

<?xml version="1.0" encoding="bork">
<troll>
<sovietrussiathing>In SOVIET RUSSIA, XML standardizes YOU!!</sovietrussiathing>
<offtopic>Let's bomb the french!</offtopic>
<flamebait>Anyway, XML is for loosers!</flamebait>
</troll>

WTF? by samael · 2003-03-18 01:43 · Score: 4, Informative

XML isn't a replacement for Java or C++. Neither is HTML. You're looking at three seperate areas there.
HTML is a page description language.
C++ and Java are data processing languages.
XML is a data description language.

You can certainly describe a page using XML, and I see no reason why you couldn't construct a programming language using XML syntax, but how on earth are you going to store data in C++ or Java?

--
My Journal

Re:WTF? by Anonymous Coward · 2003-03-18 01:50 · Score: 0

In an array
Re:WTF? by Anonymous Coward · 2003-03-18 03:33 · Score: 0

HTML ain't no darn page description language fool.
Re:WTF? by _xeno_ · 2003-03-18 08:04 · Score: 1

You can certainly describe a page using XML, and I see no reason why you couldn't construct a programming language using XML syntax, but how on earth are you going to store data in C++ or Java?
Like this: XPM file format.
Be afraid. Be very afraid.

--
You are in a maze of twisty little relative jumps, all alike.

-1: Irrelevant by Anonymous Coward · 2003-03-18 01:43 · Score: 0

Why is IDNRTA an excuse? This is 100% irrelevant. He's talking about XML being hard to access in programs, not it being hard to type in.

Re:-1: Irrelevant by Max+Romantschuk · 2003-03-18 02:36 · Score: 1

Why is IDNRTA an excuse? This is 100% irrelevant.

You're absolutely right. Then again, if you would have read my reply to my own comment you would have noticed I realized my mistake :)

--
.: Max Romantschuk :: http://max.romantschuk.fi/

Re:XML by Anonymous Coward · 2003-03-18 01:43 · Score: 0

XML is a tree structured language. Like Lisp SEXPs, only much more hyped and more of a pain to type. It's not simply ASCII with tags. If it was, then there would not be bad-nesting and stuff like that, it would be a true markup language, and I'd be able to do [b]this[i] sort [/b]of[/i] thing.

XML is NOT A MARKUP LANGUAGE. It's Lisp-reinvented-badly. Again. Sigh. Only this time, it's not other-scripting-language-becomes-lisp, it's other-data-format-becomes-lisp.

JAXB by Hellvetica · 2003-03-18 01:44 · Score: 1, Interesting

Java Programmers: Take a look at the Java Architecture for XML Binding (JAXB), available in the Java Web Services Developer Pack V 1.1 (see article here). From my basic understanding of it, it "binds" XML to a set of Java content classes, saving you the time and effort of traversing a DOM tree or dealing with SAX. I have yet to use it, but it looks perfect for my application, which uses an XML-based configuration file.

Actually, I'd be interested if anybody here has used this yet? Is it ready for prime time?

Re:JAXB by elsegundo · 2003-03-18 02:18 · Score: 1

I am currently using JAXB on a project. I needed to pull data out of a relational database and put it into and XML form. JAXB does this very well, and makes creating the XML much easier than doing it by hand, i.e. stringBuffer.append("<element>"). The one thing about JAXB that is a pain is creating the xml schema. First, I tried to write a DTD and then use a conversion tool to write the schema, but too much got lost in translation. Of course, I was using a freeware converter (dtd2xs). XMLSpy may do the conversion better. So I had to hand-correct the converted schema. Once the schema is done, JAXB is really an excellent tool. YMMV....

--

The revolution will be televised. Blackout restrictions apply.
Re:JAXB by Anonymous Coward · 2003-03-18 02:26 · Score: 0

Yeah, I've used both JaxB and a similar tool, Castor.

Both are extremely easy to use. The Java programmer deals with a normal looking Java object and can produce a valid XML document from it to send to an HTML programmer or wherever (both programmers are me in my project).

Of course, now you have to learn XML Schema if you are responsible for writing the schema. It's unnecessarily complex but still easy enough. I wrote my first (fairly complex) schema in a few hours and had Jaxb compiling it just fine.

XML would be a lot simpler if they had left attributes out of the spec. There is nothing you can do with an attribute that you couldn't do with a nested element (if the spec had decided to go with elements only). Why double the complexity of the spec (and associated tools) with attributes?
Re:JAXB by Y-Leen · 2003-03-18 05:08 · Score: 1

> now you have to learn XML Schema

There are other data binding tools that don't require a schema. There's one we looked at from Strathclyde University in the UK that binds to XML and generates objects based on the programmers classes. here
Re:JAXB by Anonymous._.Coward · 2003-03-18 05:12 · Score: 1

Yep, I've been using that technology for the last few weeks and it's pretty cool. Takes a while to get used to their mapping from Java classes to XML but when you do the rest is simple.

And when the XML changes, the system rebinds just to the data you want! Adding new tags doesn't break your old code. Sweet!

--
take a triptonica to subthunk

Oh Yeah? by Anonymous Coward · 2003-03-18 01:44 · Score: 0

Then please tell me what language I should be using to clone Japanese people !

"Load into memory" vs. "Callbacks" by itsallinthemind · 2003-03-18 01:44 · Score: 4, Informative

Say what you will about Microsoft - and many of you have - but they really got it right with their XmlReader class in .NET. It streams the document like SAX (the "callback" interface Tim mentions in his comments), but allows the programmer to cursor over the document manually rather than having to handle everything in thrown event handlers (which I agree can be a real headache, especially in highly variable or deeply nested documents.)

XML is just one of the tools in our collective toolbox. Use it where it helps you solve a problem. Don't bother if it doesn't.

Re:"Load into memory" vs. "Callbacks" by Anonymous Coward · 2003-03-18 02:54 · Score: 0

Now, if this was actually well documentet, it would perhaps be an alternative to our current strategy of "load everything into RAM, and buy more RAM if it fails".
Re:"Load into memory" vs. "Callbacks" by cdmitri · 2003-03-19 03:14 · Score: 1

I agree. I've used both methods to read the same XML file and I wish Java had an equivalent of .NET XmlReader, it makes parsing simple XML easier since the code is less fragmented.

XML is not a programming language... by borgdows · 2003-03-18 01:45 · Score: 2, Insightful

... it's a convenient format to store and retrieve hierarchical information, that's all.

Yea can be hard by Anonymous Coward · 2003-03-18 01:45 · Score: 1, Interesting

Writing an XML document is easy. I looked at a sample document and was able to produce xml documents without reading any books on the subject.

Parsing is another issue. Last night I spent some time parsing XML data in perl that was being retrieved from a daemon I wrote in C. producing the XML output was easy. Parsing it in perl was hard. I think maybe the author is talking about the lack of really good, easy to use libraries (abstactions) for parsing XML data. I'm a bliever that a a lot of work in the backend produces ease of use in the front end. In other words, I'd like to parse XML data with ease in just a few lines of code in the application. All the work will be done in the library. XML::Parser proves that this is just not the case.

XML: bad implementation of a good idea by g4dget · 2003-03-18 01:47 · Score: 4, Interesting

I have to agree that XML has serious problems.

Now, I have to say: a universal syntax for tree-structured data is very useful: experience since the 1970s with one such universal syntax, Lisp, has shown that. It is unfortunate that XML is about the worst imaginable implementation of that idea. XML combines being a nuisance to type with having comparatively complex semantics and lots of redundant features.

What is ironic is that the same "real world programmers" who wax ecstatic about XML also condemn Lisp as too complicated and too difficult to read. The universal syntax that XML aspires to, Lisp syntax delivered many decades ago. It's just that prejudice and ignorance caused people to re-invent the wheel (and in square form, too) in the form of XML.

I am pretty torn between whether XML is a blessing or a curse. We really need something like it, but XML is so bad that it may not even live up to the level of "poorly designed industry standard but better than nothing".

Re:XML: bad implementation of a good idea by Ed+Avis · 2003-03-18 03:54 · Score: 1

Could you name the 'complex semantics' and 'redundant features' of XML? It seems pretty simple to me. There is a small amount of leftover SGML crud like entities (which probably aren't necessary except to escape the < and > characters), but most of it has been removed.

Anyway, XML doesn't really have semantics, it's just syntax. The closest thing I can think of is the rule that the ordering of attributes in the same element doesn't matter.

--
-- Ed Avis ed@membled.com
Re:XML: bad implementation of a good idea by g4dget · 2003-03-18 09:12 · Score: 1

Could you name the 'complex semantics' and 'redundant features' of XML?
Sure, just look here, which lists many of the problems quite well.

ANSI X.12 by wardk · 2003-03-18 01:47 · Score: 1

IMHO

Solved this problem using flat files long long ago, of couse it has to be re-invented using lots of extra data, aka "tags".

Wrote my first EDI parser in COBOL (110, 810, 850 btw, first production EDI program at boeing...1990), it wasn't that difficult. we had implementation docs that layed things out logically and you could grok the files without having to view tag hell. XML compared to X12, X12 wins.

XML isn't any better, but it's sold lots of new fancy parsing software, and it looks like html...cool

I could go on, but the waters boiling....

In related news... by arvindn · 2003-03-18 01:48 · Score: 3, Funny

It's now official. C++ creator admits it was all a hoax! Read on for the details of the stunning scoop...

On the 1st of January, 2003, Bjarne Stroustrup gave an interview to the IEEE's 'Computer' magazine.

Naturally, the editors thought he would be giving a retrospective view of twelve years of object-oriented design, using the language he created.

By the end of the interview, the interviewer got more than he had bargained for and, subsequently, the editor decided to suppress its contents, 'for the good of the industry' but, as with many of these things, there was a leak.

Here is a complete transcript of what was was said, unedited, and unrehearsed, so it isn't as neat as planned interviews.

Interviewer: Well, it's been a few years since you changed the world of software design, how does it feel, looking back?

Stroustrup: Actually, I was thinking about those days, just before you arrived. Do you remember? Everyone was writing 'C' and, the trouble was, they were pretty damn good at it. Universities got pretty good at teaching it, too. They were turning out competent - I stress the word 'competent' - graduates at a phenomenal rate. That's what caused the problem.

Interviewer: Problem?

Stroustrup: Yes, problem. Remember when everyone wrote Cobol?

Interviewer: Of course, I did too

Stroustrup: Well, in the beginning, these guys were like demi-gods. Their salaries were high, and they were treated like royalty.

Interviewer: Those were the days, eh?

Stroustrup: Right. So what happened? IBM got sick of it, and invested millions in training programmers, till they were a dime a dozen.

Interviewer: That's why I got out. Salaries dropped within a year, to the point where being a journalist actually paid better.

Stroustrup: Exactly. Well, the same happened with 'C' programmers.

Interviewer: I see, but what's the point?

Stroustrup: Well, one day, when I was sitting in my office, I thought of this little scheme, which would redress the balance a little. I thought 'I wonder what would happen, if there were a language so complicated, so difficult to learn, that nobody would ever be able to swamp the market with programmers? Actually, I got some of the ideas from X10, you know, X windows. That was such a bitch of a graphics system, that it only just ran on those Sun 3/60 things. They had all the ingredients for what I wanted. A really ridiculously complex syntax, obscure functions, and pseudo-OO structure. Even now, nobody writes raw X-windows code. Motif is the only way to go if you want to retain your sanity.

Interviewer: You're kidding...?

Stroustrup: Not a bit of it. In fact, there was another problem. Unix was written in 'C', which meant that any 'C' programmer could very easily become a systems programmer. Remember what a mainframe systems programmer used to earn?

Interviewer: You bet I do, that's what I used to do.

Stroustrup: OK, so this new language had to divorce itself from Unix, by hiding all the system calls that bound the two together so nicely. This would enable guys who only knew about DOS to earn a decent living too.

Interviewer: I don't believe you said that...

Stroustrup: Well, it's been long enough, now, and I believe most people have figured out for themselves that C++ is a waste of time but, I must say, it's taken them a lot longer than I thought it would.

Interviewer: So how exactly did you do it?

Stroustrup: It was only supposed to be a joke, I never thought people would take the book seriously. Anyone with half a brain can see that object-oriented programming is counter-intuitive, illogical and inefficient.

Interviewer: What?

Stroustrup: And as for 're-useable code' - when did you ever hear of a company re-using its code?

Interviewer: Well, never, actually, but...

Stroustrup: There you are then. Mind you, a few tried, in the early days. There was this Oregon company - Mentor Graphi

Re:In related news... by BeerSlurpy · 2003-03-18 02:03 · Score: 1

Comedy gold. The difficulty of being a good programmer really has been a godsend, salary-wise. People still offer me upwards of 90k on a regular basis.

If I were a VB programmer right now, I'd be fucked.
Re:In related news... by Oswald · 2003-03-18 03:02 · Score: 0, Offtopic

I'm gonna metamod everyday until I get a chance to whack the idiot who modded this "offtopic". Of course it's offtopic; it's also priceless. Thanks for the laughs, arvindn.
Re:In related news... by Rich0 · 2003-03-18 05:21 · Score: 2, Funny

Actually, I've always liked this story. My favorite lines:

We stopped when we got a clean compile on the following syntax:
for(;P("\n"),R-;P("|"))for(e=3DC;e-;P("_"+(*u++/8) %2))P("|"+(*u/4)%2);
At one time, we joked about selling this to the Soviets to set their computer science progress back 20 or more years.

Still good for some things by krygny · 2003-03-18 01:49 · Score: 2, Interesting

The hype and promise of XML has gone too far. It's a boon for document type data. Semantic content like documentation, on-line content, even spreadsheets and email. (e.g., why isn't there a standard address book format based on XML that any application on any platform can use interchangeably?)

But using XML to build relational databases is slipping a round peg into a square hole. You need something to putty the corners.

--
Research shows that 67% of those who use the term "research shows", are just making shit up.

Oh please! by gwappo · 2003-03-18 01:50 · Score: 5, Interesting

It's annoying when posters get presumptious. The people complaining in the article are by all means elite programmers, proclaiming xml is okay because "programming *is* a hard task" is non-sense and in the same league as "HLL's are for wussies, real men code in assembly" and other crap.

The criticism on XML is accurate, correct, valid, if only for the simple reason that the code needed to interface with the libraries is 90% plumbing-work and 10% business-solution. That 90% plumbing-work leaves oppertunity for _a lot of bugs_ to be created and for any solution using XML to become a resource-hog.

Having a standard interchange format like XML is a fun-thing, and "good", as it allows standardized processing of these formats. However, the article identifies a clear gap in the tooling and that gap needs to be addressed for XML to become a widespread success, instead of another buzzword hype.

Re:Oh please! by mysticgoat · 2003-03-18 03:00 · Score: 1

Thank you for saying this so well.
Re:Oh please! by jallen02 · 2003-03-18 03:00 · Score: 3, Interesting

Isn't interfacing with a library by definition "plumbing" though?

I did find the SAX API (In Java) a little tedious to work with for maybe a few days, but after I got used to the idiom it was pretty straight-forward. The interfacing with the library was not really a lot of "extra" code. Most of my SAX parsing code spends it's time in a content handler firing of events based on XML it is processing.

I still cleanly separated the XML interfacing from the server. Once the plumbing is set up, my server doesn't even have to know it is there for the most part. And I rarely have to deal with the interfacing to the library after the initial separation. I either go below the parser level via filter
streams or above it, but the XML parser just does it's job.

It is a tough question to answer, but doesn't having a certain level of configurability necessitate some level of compexity? I think C# does a decent job at keeping the XML processing more simple while still giving the configurability, but to tap into that configurability there is still complexity involved. I think that the problem is easy to identify and the solution will take many more brain cycles to find :)

Jeremy
Re:Oh please! by grammar+fascist · 2003-03-18 05:25 · Score: 1

The author specifically had a problem with SAX because of the callback structure. Generally, to use it well, you have to write a wrapper around a SAX parser.

Tell me: should parsing a file be as difficult as doing asynchronous I/O?

The author also had a problem with DOM, because any program using it becomes a big, fat memory hog if it has to parse very large XML files.

That's the problem. There is no easy stream-based solution for parsing XML like there is for almost any other format.

--
I got my Linux laptop at System76.
Re:Oh please! by Anonymous Coward · 2003-03-18 11:36 · Score: 0

No, the problem with DOM is that it doesn't provide anything. There is no document, no object, and no model. Unless your entire data model consists of node, parent, and children[]. Sheesh, name=value is a more complex data structure. The HTML DOM is a valuable and innovative object model. All the sudden, with Javascript and CSS, the traditional GUI became obsolete. Now, almost every gui *is* a variation on the HTML tag format. But XML on its own is worthless. The XML DOM on it's own is NO MORE COMPLEX than get(). If DOM build an object oriented model of a document, then you'd have something. Realistically, you can't do this without code generation, which makes XML useless as an object model.
Re:Oh please! by mesocyclone · 2003-03-18 18:10 · Score: 1

Good grief. All you need to do is extend the library. We use SAX and then build up a hash-table to let us find all notes and attributes from a string. We add accessors to the normal DOM routines like "find the first node with path /a/b/c" and return its value, otherwise return this default value.

Once you do a few of those, you can handle most of simple XML with most of the work focused on the data (i.e. the business use) rather than screwing around with the access methods, etc.

Oh, and to generate simple XML, just use your language's equivalent of printf.

For complex XML, we use the standard DOM with some methods added again to make it easier.

I'll bet thousands of folks do similar things.

XML is really handy. For the sort of work we do (transaction processing) it is the greatest thing since sliced bread.

--
The only good weather is bad weather.

too hard by PhilipMatarese · 2003-03-18 01:52 · Score: 5, Funny

Admitting something is too hard is too hard for programmers.

Now I'll go read the article.

SSAX by Anonymous Coward · 2003-03-18 01:53 · Score: 1, Informative

Try the SSAX XML parser- has the streaminess of SAX, the objectiness of DOM.

Also neatly illustrates the essential equivalence of XML to a small subset of Lisp.

Hahahah finallly something I know a lot about. by BeerSlurpy · 2003-03-18 01:53 · Score: 4, Interesting

We use XML heavily in a project I'm working on at my company. Some genius decided that everything should be in xml, and that we would use XSLT for a lot of the data manipulation. Naturally we also make heavy use of DTD and SAX. Lots of XML related technologies.

I can tell you now that XML is a Bad Thing. It strives to excel at too many things at once, and becomes inefficient and complex as a result.

XML tries to eliminate the step of writing parsers for data, although writing parsers has never been a significant part of application development to begin with. Its rigidity instead forces you to waste time taking the output of the parser (a complex tree) and putting it into meaningful form. XML document tree traversal = 10000x more complex than getting column data out of a ResultSet... Unfortunately it is also a billion times slower to parse XML than it is to perform a medium compexity database query.

The real problem is that XML only partly addresses the problems that relational database solved years ago (organizing and data accessable), but it does it without any of the efficiency benefits of a well designed database server. In my opinion, 90+ percent of the places where XML is being used today would be better served by using columns in a relational database table to store object fields. You get indexing, you get universal, simple and efficient searching, and you get speed.

XML has too many faults to really list in one short post. The truth of the matter is that it tries to do too many things and DOESNT DO ANYTHING WELL. Sort of like if someone tries to be skilled in all musical instruments but ends up being, at best, mediocre in a few of them.

Re:Hahahah finallly something I know a lot about. by kalidasa · 2003-03-18 02:07 · Score: 5, Insightful

If you're working with data that can be meaningfully represented with columns, you're using the wrong damned tool. XML is for complex structured data, which it does fine. It is not for tables. Don't blame the tool, blame the idiot who thought that XML was a good way to do DBs.
Re:Hahahah finallly something I know a lot about. by dentar · 2003-03-18 02:09 · Score: 1

Wish I had mod points today... oh well

--
-- I am. Therefore, I think!
Re:Hahahah finallly something I know a lot about. by rbolkey · 2003-03-18 02:24 · Score: 1

Well, I'm not too sure if XML strives to do too much, or whether people try to make XML do to much. But I think the latter is more likely. XML makes a really nice interchange format for moving data/objects between applications. I don't think I would use xml files in place of an rdbms for storage, but having the rdbms send me my query results in xml would be nice ( ... ). And that would be what it is really designed for ... data interchange.
Re:Hahahah finallly something I know a lot about. by BeerSlurpy · 2003-03-18 02:37 · Score: 1

If youre communicating within a single language (usually the case) there are libraries for object serialization readily available. Java even has it built in to the core libraries.

If youre communicating _directly_ with programs written in other languages (C talking to java) then any programmer worth his salt can make a communications protocol to allow the two applications to work together. It takes less time to do that than to debug XSLT or DTD problems. And no, none of the problems of a simpler protocol are removed by virtue of talking in XML vs some arbitrary protocol designed for the situation at hand.

Most programs (and most programming languages) end up interfacing at the database level anyway. C and Java both have the ability to execute SQL queries quickly on a db server, thus this is the logical place for them to exchange data. Making a database column contain an XML document (in a big text field for example) only increases the complexity of the programs.

Real world experience has never caused programmers to fall in love with XML. 90 percent of the posters today are quoting the XML marketing horseshit, so take all the positive replies with a grain of salt.

Also, has anyone here every heard of CORBA or RMI? I hear its hot shit for having applications talk to one another without using a database. Sounds crazy eh?
Re:Hahahah finallly something I know a lot about. by sporty · 2003-03-18 02:57 · Score: 1

Also, has anyone here every heard of CORBA or RMI? I hear its hot shit for having applications talk to one another without using a database. Sounds crazy eh?

Er? To use XML, you don't need a database. You can have a function on a server that use either XML-RPC, SOAP or whatever you use to get XML over the line, to get an XML response.

m'thinks you are a tad confused.

Btw, XHTML is a form of XML. So is standard HTML. It's a little malformed (quotations? and self closing tags).

--
-
ping -f 255.255.255.255 # if only
Re:Hahahah finallly something I know a lot about. by Eric+Savage · 2003-03-18 03:10 · Score: 2, Insightful

XML tries to eliminate the step of writing parsers for data, although writing parsers has never been a significant part of application development to begin with.

This is true if you are parsing your own data, but what about parsing third party data? I did that for years and every day was full of dealing with corruption, misformatted files, or formats that varied from the documentation because some new guy was making them on the other end.

True, these problems can happen with XML but they are much easier to spot. Send me a file and a DTD/Schema and I can tell you in a second if any future files are bad.

My view of XML is that what it does really well is transfer data. As far as storing data, well I only consider it when a database isn't available.

--

This is not the greatest sig in the world, this is just a tribute.
Re:Hahahah finallly something I know a lot about. by rbolkey · 2003-03-18 03:31 · Score: 1

On the database level, I think you were confusing what I was saying? Don't store xml, but it would be nice (and there are) dbs that return xml result sets. Instead of having an record structure of some sort, you have a record element with fields as child elements instead of as something like an indexed array. No added complexity at all (assuming you're using built in libraries in languages like java and c# just as you already would with a database).

RMI is nice for what it does, and if you have complete control over the environment and the future direction of the product, it can be used well. XML interchange just gives you some more flexibility for the future in my eyes, when you don't know what programs in what languages will be using your program.

You can, of course, write you own protocols, but unless you need some serious optimization or something that can't be expressed in xml, why re-invent the wheel each time?
Re:Hahahah finallly something I know a lot about. by Washizu · 2003-03-18 04:22 · Score: 1

"The real problem is that XML only partly addresses the problems that relational database solved years ago (organizing and data accessable), but it does it without any of the efficiency benefits of a well designed database server."

You're definitely using XML for the wrong reasons if a database sounds like a better alternative. XML is mainly for sending data from one program to another or file formats for when a database is overkill. You want other programs as far from your Database as possible.

If another program requests something from your DB you get the data, generate an XML file, and send it back. A good example of this are RSS files on most weblogs these days. They query the news database and generate an XML document that you can use to syndicate their headlines. Here is Slashdot's.

--
OddManIn: A Game of guns and game theory.
Re:Hahahah finallly something I know a lot about. by Arandir · 2003-03-18 06:56 · Score: 2, Insightful

Executive Summary: XML is not RDMS which makes it damn hard using this XML screwdriver to hammer in RDMS nails.

Your main problem is that you think a tree should be a table. I think you need to get off of your RDBMS religion and realize that that there's a whole world of data our there that perfectly capable of not being shoved in a table before it can be used.

--
A Government Is a Body of People, Usually Notably Ungoverned
Re:Hahahah finallly something I know a lot about. by Arandir · 2003-03-18 07:11 · Score: 1

Most programs (and most programming languages) end up interfacing at the database level anyway.

What the fsck? Most database programs end up interfacing at the database level, most other kinds of programs do not. I'm working on a complete system at my work that encompasses the bare OS all the way up to very specialized end user applications. Maybe 10% the several thousand programs involved interface at the database level.

Also, has anyone here every heard of CORBA or RMI? I hear its hot shit for having applications talk to one another without using a database. Sounds crazy eh?

Yeah, it does sound crazy. Which is why most programs in the system I am talking about communicate with IPC, RPC, XML and good old TCP/IP. There's also a good deal of Dicom talk as well. How does this possibly work without a database? Because most of these programs don't use databases!

"Dear imaging software, I need frame 56 of the realtime capture you just did. Could you please create a JPEG of it and store it in MySQL so I can query for it on my end? Your's truly, the review workstation."

--
A Government Is a Body of People, Usually Notably Ungoverned
Re:Hahahah finallly something I know a lot about. by leandrod · 2003-03-18 10:56 · Score: 1

> If you're working with data that can be meaningfully represented with columns, you're using the wrong damned tool.

All data can be represented in the relational model. Only that it the relational model uses attributes, not columns. Columns are a corruption of the model.

> XML is for complex structured data, which it does fine.

Yes, it does finely send us back thirty-something years to hierarchical data.

> It is not for tables.

See above. The relational model is about relations and relvars, not tables. Tables are a corruption. See the pattern? SQL is evil, relational is good.

> blame the idiot who thought that XML was a good way to do DBs.

Agreed. But not complex structured data should be in relational (not SQL) databases too.

--
Leandro GuimarÃ£es Faria Corcete DUTRA
DA, DBA, SysAdmin, Data Modeller
GNU Project, Debian GNU/Lin
Re:Hahahah finallly something I know a lot about. by SoupIsGoodFood_42 · 2003-03-18 11:17 · Score: 1

Some genius decided that everything should be in xml, and that we would use XSLT for a lot of the data manipulation.
Perhaps the problem lies here, and not in XML? Don't blame XML just becasue someone took it on as a buzzword and treaded like the holy grail answer to all computer problems. It's a tool, there are places where you should use it, and where you shouldn't.

XML - I don't mind by Sir+Runcible+Spoon · 2003-03-18 01:54 · Score: 1

XML I don't mind, although the include mechanism is tricky.
DTD is a little strange.
Schemas are very strange.
XSLT is beyond a joke, the idea is sound by the syntax is bizzare (try looping from 1 to 10).
XML databases, no where near as good as proper RBDMS, but better than sticking XML in your RBDMS.

We did once use XDR as a middleware, but this we replaced with XML. It is 100 times bigger and 100 times slower. Use XML as an import/export format, don't try to use it as a middleware.

One good thing, it doesn't crash when both ends don't agree on the content of the data and the bytes are not aligned. Instead it just silently ignores the data. (Did I say this was a good thing?).

Re:XML - I don't mind by itsallinthemind · 2003-03-18 03:04 · Score: 1

The important thing to remember about XSLT is that it is a declarative language, not an iterative one. In other words, when you create an XSLT stylesheet, you're not specifying a list of things you want to happen in a specific order - instead, you're defining how you want the result to look. Many programmers (especially the scripting types) are trying to use XSLT as a replacement for iterative langages, and getting frustrated when they can't get the results they want easily.

Personally, I use XSLT for simple presentation tasks - that's it. If I need to do pivots, aggregation, or anything else that's relatively complex, I write a little bit of code in an iterative language to take care of it for me. Yes, I lose some platform portability, but on the flip side the code I create isn't completely inscrutable.
Re:XML - I don't mind by Sir+Runcible+Spoon · 2003-03-18 21:13 · Score: 1

Fair enough, but I find myself repeating elements a variable number of times while just trying to present data as HTML. The most problematic is the display of styled tables suitable for Netscape 4.7 (don't ask me why, it's a company thing). The borders and the padding can only be consistently done by using empty cells (with a clear one pixel gif) and setting a background colour. When it come to showing a horizontal line across the table you need insert:

borderCell + n*(3*paddingCell + borderCell)

(1 + n*4)*borderCell

borderCell + n*(3*paddingCell + borderCell)

Where n is the number of data items across the table.

It's not my choice to use XSLT. The company bought the idea that if we use XSLT all we have to do is substitute a different stylesheet for each skin or target platform. Lucky there is only one target platform, because when it comes to this sort of thing I insert JavaScript to do the interation when it finally hits the browser.

I get very irritated with XSLT. Either it isn't powerful enough to finish the job or the features for doing these things are too well hidden. It is awful to debug (worse even than parsing XML). And the verboseness of the syntax hides what's going on.

XML is a MARKUP language by kahei · 2003-03-18 01:54 · Score: 3, Insightful

...and for doing generic markup in a relatively simple way, it's good.

For storing arbitrary data, and use as a message format (as in SOAP), it's not so good because it has markup-like features, such as the distinction between attributes and elements and the distinction between text and element nodes. (The latter in particular is a huge pain, I wish people would agree to only use text nodes in leaf elements.)

This is why XML parsers/generators, once they get into entities and DTDs and so on, become really a lot more complicated than they would need to be if XML just stored a tree of elements.

However, it's the standard, so we might as well just shut up and use it.

My opinions have no special importance but it *is* important to remember that XML is a markup format that is being used mostly for things other than markup.

--
Whence? Hence. Whither? Thither.

Re:XML is a MARKUP language by nicodaemos · 2003-03-18 06:20 · Score: 1

XML is a MARKUP language
Oh is that what the M in XML stands for? I always thought it was eXensible Motherf*cking Language .... my bad.

Let's ban the use of XML in public. by Boss,+Pointy+Haired · 2003-03-18 01:55 · Score: 1

That would sort out a lot of the mess.

The use of "XML" in any kind of media article, press release, or from the mouth of anybody not immedieately involved in software development should be banned.

I can't help thinking that most of XML's image problems are a direct result of dot.com fiasco (itself, primarily a media induced f*** up) where anything remotely "interweb" was blown out of all understanding and proportion.

In fact, whilst we're at it, why don't we just abolish technical journalists. By definition they don't know what they're talking about, otherwise they'd be doing it.

Go on. Off you go.

Re:And .....? by Anonymous Coward · 2003-03-18 01:56 · Score: 0

Troll or broke? lol

marketing hype/uncritical technophilia detector by moregan · 2003-03-18 02:00 · Score: 1

For any sentence, substitute "tab-delimited" for "XML" and see whether what's being said still makes sense.

similar problem with MathML by e**(i+pi)-1 · 2003-03-18 02:00 · Score: 5, Insightful

It might be too late to correct some things in XML. Good about XML is, that whatever will emerge in the future, it will always be possible to convert old documents into any new form, using simple tools. There is a point with critics: Unlike Latex or HTML which can be written easily by hand, XML can become too bloated to be authored directly by humans. Similar problem with MathML: Latex: $x^5+3x-9=0$ MathML: <mrow> <mrow> <msup> <mi>x</mi> <mn>5</mn> </msup> <mo>+</mo> <mrow> <mn>3</mn> <mo>⁢</mo> <mi>x</mi> </mrow> <mo>-</mo> <mn>9</mn> </mrow> <mo>=</mo> <mn>0</mn> </mrow> You can write complicated formulas in Latex directly but it is almost impossible to do so in MathML, where one has to rely on tools to generate it (i.e. export it with Mathematica or TeX -> MathML converters). Wouldn't it be nice if browsers would understand a basic version of LateX? (That it is possible has been shown with IBM's texexplorer plugin).

Re:similar problem with MathML by jonathan_ingram · 2003-03-18 02:58 · Score: 1

I note that the LaTeX syntax is the default way of representing mathematical formulae in TEI Lite (xml-ish e-text encoding specification which is probably going to be adopted by Project Gutenberg in the near future).

--
-- Help Digitise the Public Domain at DP.
Re:similar problem with MathML by WetCat · 2003-03-18 04:28 · Score: 1

You can write complicated formulas in Latex directly but it is
almost impossible to do so in MathML

Bah! WHY DO YOU NEED (or WANT) to do it?
MathML is a MACHINE readable language, not
human readable.
You are supposed to use XML->HTML translatiors to read that and translators from readable formats to XML to write it.
Re:similar problem with MathML by metasyntactic · 2003-03-18 08:19 · Score: 2, Insightful

One thing that you seem to forget is that XML is useful for putting down the structure of the object in question, while leaving the presentation up to some third-party app.

The XML snippet is indeed more verbose, however it carries much more semantic meaning than lour latex snippet which is just pure text.

How is this useful? Well assume that I'm blind and I use applications that speak text to me. I'll end up with:

"dollar-sign x carat 4 ..."

Whereas with MathML my text-to-speach agent can actuall say:

"x to the fifth plus 3 x minus nine equals zero".

I write latex a lot, and it's a joy to write expressions that will end up looking great. However, I know that when I do so, I'm leaving the mathematical world for the one of fascinating typesetting.

You say that XML can become to bloated to be edited by humans. On that point you are 100% correct. However, remember that one of the tenets of XML is that it should be possible, but not necessarily fun or easier, to hand code up input, as stated by the w3c . All that's required is that the format be human-legible and reasonably clear. If you find writing MathML too difficult (something which would not surprise me at all), then I suggest you work on a tool that converts Latex to MathML. Hell, I'll even help you with it. But given my experience with Latex I am extremely wary as I have no idea how that complicated beast works and I would imagine it would be quite difficult to infer a lot of the mathematical semantics from most Latex snippets.

I just can picture it! by palad1 · 2003-03-18 02:00 · Score: 1

Let's use RegeXPath!

while(){
if(X/*/body/(?h([1-4]))X){
echo "found header $header\n";
}
}

These were the specs, gentlemen, start your hacking!

Re:I just can picture it! by Anonymous Coward · 2003-03-18 02:03 · Score: 0

while()

french...

hard to parse .. by ciupman · 2003-03-18 02:02 · Score: 1

I made a xml parser in half of dozen lines in a functional language (haskell, was not hard..But nowadays there's no need for a programmer to build it's own parser? Why reinvent the wheel if there're already good library to parse the thing .. Gnome has it .. Java has it .. Why waste your time doing another parser?

--
I fuse with Mercer every single day...

K.I.S.S. by xyote · 2003-03-18 02:03 · Score: 1

I haven't messed with XML yet but I did look at some books on XML. And that is where there is an indication of the problem. These books are way too damn thick. Thick books mean way too complicated interfaces.

What's going on? I think the problem started because pure XML is semantic free. It's just syntax. Semantics are added on with other layers of software. And therein lies the problem. Anybody could add these layers. And it was sort of a race. So anybody who spent a lot of time trying to design a simple intuitive api lost out to those who rush out half baked, over complicated and inconsistent apis.

When I design an api, I put in a huge amount of though to it. How do you fit it into and exploit the current abstraction model? Are the features absolutely necessary? It it intuitive, i.e. are the semantics straightforward and understandable?

I am one of the few people who've actually simplified api's on 2nd releases. If I'd been in charge of Java, Swing would have never happened. I'd have taken the AWT and simplified and fixed that up.

Fight bloatware! Put minimalists in charge of api design.

Re:K.I.S.S. by Frans+Faase · 2003-03-18 02:57 · Score: 1

I would like to add to this, that way to many API's are designed from an implementers point of view, instead from a users point of view. An easy way to recognize API's designed from an implementers point of view, is that the user will have to write many very similar looking sequences of calls, and that bad things happen when you call the primitives in the wrong order.

Re:of course there is! (sorry for the prev post) by Anonymous Coward · 2003-03-18 02:04 · Score: 0

(troll
(sovietrussiathing "In SOVIET RUSSIA, XML standardizes YOU!!")
(offtopic "Let's bomb the french!")
(flamebait "Anyway, XML is for loosers!"))

XML is only the beginning by ojQj · 2003-03-18 02:06 · Score: 1

XML is a great idea. Of course it's not the right tool for every task, but it does have a lot of advantages (which other posters will gladly enumerate I'm sure).

Unfortunately XML alone doesn't guarantee data interchangeability between programs. And XML Schema doesn't do it either. Knowing whether or not Tag1 can be in Tag2 doesn't tell you what Tag1 or Tag2 mean or if they correspond to a data structure that you need or can use. For that you need data modeling.

For data modeling in XML I've looked at a huge number of languages: RDF, Iso step 28, and XMI were my favorites (though in my opinion XMI first starts getting interesting with ver. 2.0 which isn't even finished yet). Each has a few advantages and disadvantages. And of course there are lot more than just these. But the problem is that these are all very young standards and APIs which would make them useful are not abundant.

So maybe the author's right that XML is not yet good enough, but I think a lot of progress is being made.

reusable loop structure by Anonymous Coward · 2003-03-18 02:09 · Score: 1, Insightful

I see the article's gripe as another instance of a growingly-common problem: in all common languages, complicated loop structures aren't reusable. In the article, he wants to have a library (the XML parser) provide an efficient method for iteration over the tree structure in his XML file, and he rightly notices that the language doesn't support that very well.

There are 2 basic ways to reuse a loop in languages such as Perl or Java or C. Way number one is to use callbacks: package up the loop body in a function and pass it into the library. As the author notes, this is syntactically annoying. It can also be inefficient: compilers usually can't optimize out the function call, so if the amount of work per iteration is small there can be a lot of overhead.

Way number two is to use iterator-like syntax (a la Java iterators): provide a function which returns you the next object in line and then write a simple for-style loop. This is syntactically somewhat less annoying, but still subjects you to some overhead.

The closest I've seen to a solution to this problem is compile-time computation such as templates in C++ or macros in LISP. These have not been particularly popular for people to use (probably because they're hard to use), and they're not available in many common languages. Does anyone know any better answers?

Re:reusable loop structure by Anonymous Coward · 2003-03-18 02:26 · Score: 0

Well, Lisp is quite Common. Da-dum-tishshsh

Eating Your Own DTDogfood by Minix · 2003-03-18 02:10 · Score: 1

XML is a kinder, gentler SGML. I'm sure it has its place. I even like it, and like what it stands for. But ...

Why is there no standard XML DTD to express DTDs?

alternate rendering of question: I understand that XML was trying to keep as close as possible to SGML but ... if either language is a good choice for representing structured data, and a DTD is itself structured data, why is XML not a good choice for representing DTDs?

--
"There are four boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order." Ed Howdershelt

Re:Eating Your Own DTDogfood by Habbie · 2003-03-29 06:19 · Score: 1

What you are looking for is XML Schema

Convoluted Model by Anonymous Coward · 2003-03-18 02:10 · Score: 0

One of the biggest problems with XML is that it's information model is quite complex; far more complex than what the average problem needs. When looking for an alternative, you should look for solutions which have addressed this fundamental problem.

Those who like Python should look at YAML

Huh? by Sparky69 · 2003-03-18 02:13 · Score: 2, Interesting

What was his argument again? Reading the whole thing into memory is too slow? Ok, agreed, hence SAX. When you're a perl programmer everthing is a regular expression. Look Perl was the first language I learned. I'm all for perl it's wonderful, poetic and fun. And it handles XML perfectly. Are you telling me that using relational databases is easier than XML? That you can just sit down and start doing it without reading some books or at least a couple online tutorials? That's nonsense. The benefits of XML outweigh it's shortcomings IMHO. Especially Schema validation. I love knowing the fact that I don't have to rewrite the same goddamn code to make sure my input is sane! I make a schema for it and voila. Yes the schema spec is big. But have you read the full SQL spec? Of course not. You use a nice little subset and get your work done. Same with the schema spec. I use about 4 tags for 90% of the documents I need to create. So let's summarize XML in a couple rules (there is one caveat, see below): 1. Every element is in between angle brackets 2. Close every tag you open in the reverse order (like a stack but this is far too complicated a subject for people programming, there are NO stacks in computers....right). Does anyone force you to use XML? Of course not. That's a weak argument but it's true. XML gives you the choice to not reinvent a structured data format. I'm not a programming guru by anyone's hallucination. I've been working with XML for a while now (3 years) and it's been terrific. Yes you have to learn some stuff and yes some of the API's are a bit terse but show me something that isn't. What I've come to realize is that if you want to move forward you do have to change. Programmers bitch and whine about how end users don't want to change their UI. Well this sounds like programmars that don't want to move their brains a little and stop seeing things as regular expressions and start seeing them as XML. Stop trying to reinvent the wheel everytime you need to parse a document and move up an abstraction. And it strikes me as odd that one of the cocreators doesn't seem to "get it". The whole point of making a standardized format is so that you can abstract the parsing, transformation and validation functionality. Just my 2 cents CAD. Andrew

XML Problems by withak53 · 2003-03-18 02:13 · Score: 1

The largest problem I've seen with XML is that the content people who often create it do not have the technical knowledge to properly do so.

It's even worse when they've had a little bit of training and try to mark up the data themselves, then not validate the document.

But it's still a young. Easy to use tools are being developed that will let technical people be technical and content people be content people.

Processing arbitrary data is error-prone.... by MrBandersnatch · 2003-03-18 02:14 · Score: 1

During the process of setting up ongoing, for the first time in a year or more I wrote a bunch of code to process arbitrary incoming XML, and I found it irritating, time-consuming, and error-prone.
The first reason he states that processing arbitrary XML data is time-consuming and error-prone?!!? Well WTF does he expect? Parsing and transforming XML documents is a JOY compared to working with unstructured or non-validated data. I used to have to parse very-large full-text databases (10-500,000 records), in a broad variety of formats and write handlers for each and every format...it was a NIGHTMARE!!!

The standardisation on XML means that a limited set of standard tools can be used to do what used to take several-hundred thousand lines of bespoke code...FAR *LESS* error prone. These days if I need to chop up and transform a large dataset I can write a simple parsing routine (SAX, regex or bespoke logic), throw the extracted record to a parser for validation, then transform it to the destination format with XSLT. That used to require one-hell-of-a-lot of bespoke code to perform the same task which was a real maintenance problem.

About the only valid point that gives any credence to the assertation that "XML is too hard for programmers" was given by one of the articles he sites. The article states that the proliferation of standards and tools (e.g. XSLT, XPATH, DOM, SAX, JAXP, XPath, XQuery, XForms etc. etc.)which overall combine to make up what many consider to be the XML standard, make XML difficult for programmers to learn XML.

XML as a standard by-itself however is very easy to learn and the benefits are ENORMOUS when compared to the disadvantages of not-having any such standard!! If the standard hadnt been made we might have been left with SGML...and theres a rant that I would HAPPILY join in with!!

XML Rules! by Anonymous Coward · 2003-03-18 02:14 · Score: 0

I'm in shear horror at the number of you that "have never tried XML", or think it seems "too confusing".

Go pack to your legacy languages and methods. XML solves hundreds of problems for my company on a daily basis. It is the well from which the hope of web services springs.

Let's not be scared of something just because we don't understand it. I teach a class to complete newbies every week. It's 4 hours, and no one has left it without understanding XML. Buy a book and quit your fucking whining!

XML is simple and powerful... by parabyte · 2003-03-18 02:16 · Score: 1

...which is the reason why it gained so much acceptance in almost every community. It has hit a nerve, and it is great to express almost everything you want in a human readable, persistent cross platform data structure.

The problems with XML are in areas of the standard that are complex *and* rarely used, as with every software system. Problems start with the correct handling of entity references, and the correct implementation of xml-schema has not yet been achieved by any implementation I have tested recently. Even worse, it is almost impossible to write an xsd for a complex case that will validate correctly on a second xml processor, even if it works perfectly with a the chosen first processor.

And I think there will not be a conforming svg or smil browser in the next ten years because these specs are too complex to be understood by different programmers in the same way.

XML is great, but some guy at w3c went way too far too fast, making standards that are too complex to be properly understood by mere mortals who need to think about more than just XML.

p.

--
Without order, nothing can exist. Without chaos, nothing can be created.

XML parsing models by HalfFlat · 2003-03-18 02:17 · Score: 3, Informative

If I understand it correctly, the author is lamenting that neither of the standard ways of parsing XML in a scripting language fit the straightforward model of scanning for something relevant and then acting upon it, where the two models are: 1) read in whole file and make a tree (take sup too much memory, is slow, etc.); or 2) use a callback interface.

The style of perl script he was seeking was a simple loop model:
while () { next if /ignorable/; if (/thing-one/) { ... } elsif (/thing-two/) { ... } ... }
To me the thing that distinguishes this the most from the provided XML parsing interfaces is that it has a minimal amount of state.

So isn't what is needed a corresponding structure to the while () above that iterates over the tree-nodes of the XML-encoded data structure, in a depth-first preorder traversal (to avoid having to build the whole tree first)? One could imagine a parser object that scans through the XML file returning nodes (and their parent history) while maintaining an absolute minimum of state. If one wanted to build an in-memory representation of a subtree given a node, then one can always do so when one finds the node one wants.

Such an interface wouldn't be good for integrity verification or the like, but for the sort of application the author was talking about, it would seem ideal. Much less flexible than the normal models, sure, but much easier to work with when the problem fits this sort of description. Perhaps I'm underestimating the difficulty of the task, but it doesn't sound too hard to write, given that it is doing so much less than the fully-featured XML parsing interfaces.

The other problem is the awkwardness of the use of XML in O-O languages such as addressed in the article linked-to by Tim Bray in his article. Though I haven't used this particular program, this seems to be the problem that FleXML is trying to address. When you don't need all of the flexibility that XML can provide, but instead have a fixed schema that your XML-representation follows, why not have your parser automatically built to read it? People have used lex/flex for scanning text files for decades --- in these days of XML Schema, it should be even easier. If FleXML lives up to its promise, it will be. Has anyone here used FleXML and are willing to comment on how well it addresses these sorts of problems?

Re:xml by FireAtWill · 2003-03-18 02:17 · Score: 2, Interesting

I've been working on EDI applications for many years now. I view XML as another attempt to solve the same problem as the ANSI X12 standards. The problem is, 'that problem' was never *the* problem.

In the old days (in my industry), there was a COBOL oriented file structure called the National Standard Format (NSF). It was typically documented as a set of maybe 10-20 hierarchical record formats. The mechanics for reading the files were immediately obvious. The problem was understanding what needed to be done with the data. Of course, there was often a need for a new data element and it got shoved into some filler field, resulting in the National Standard Format becoming the Nearly Similar Format.

To resolve this issue, the industry jumped on the ANSI X12 bandwagon. ANSI X12, like XML offered a flexible, platform-independent standard for representing hierarchical data structures.

Platform-independent means that it's equally difficult to use on all platforms. The 10 pages or so of NSF COBOL record layouts were replaced by a couple of binders worth of standards. One for X12 and one containing the various industry-specific transaction sets. Expensive tools emerged to read the new files and cram them back into the familiar and more workable structures.

'Flexible standards' turned out to be an oxymoron. There are so many options that it is extremely difficult to anticipate what sort of odd interpretations you'll be forced to deal with. And deal with them we must, because the Feds have mandated the way in which we must exchange data (HIPAA).

And still we find ourselves needing extra pieces of data for specific trading partners that we put into places that are beyond the standard.

I'd rather use XML than ANSI X12, but I'd rather not use either. They add much complexity and infernal flexibility in order to 'solve' what used a trivial task - agreeing on a data format.

If we want something truely useful, we'd forget about markup languages and specify an open database format similar to Access that actually has value beyond the narrow problem being addressed.

Syntax != Meaning. by Anonymous Coward · 2003-03-18 02:18 · Score: 0

Programmers would still puzzle over the meaning of an XMLized resolv.conf file. The flat file is so intuitive in this case. XML may be the replacement for "the comma delimited flat file" (btw, that's :tab delimited flat file" for you Windows freaks), but that doens't mean it not overkill. Sometimes a byte is a byte and sometimes a line of information sepearated by a field seperator is all you need.

Re:Syntax != Meaning. by Luke-Jr · 2003-03-18 06:49 · Score: 1

Shouldn't this be the other way around? I haven't seen any *nix apps that use comma delimited file formats yet, but I've seen plenty of Windoze ones... On the other hand, there are plenty of tab/space delimited files in *nix...

--
Luke-Jr

Proposed interface to satisfy his requirements by Gollum · 2003-03-18 02:19 · Score: 1

Tim seems to be asking for an iterative approach to reading XML. One approach (that doesn't check for "well-formedness") might be:

my $xml=XML::Iterator->new();
while(%tag=$xml->getne xttag()) {
print "Tag type = ",$tag{"type"},"\n";
print "Tag attrs are : ", join(", ", keys %{$tag{"attributes"}}),"\n";
}

The hash that is returned could contain all the information that can be determined from the XML doc (and maybe the DTD as well), such as type, etc.

Perl suggestion by skillet-thief · 2003-03-18 02:20 · Score: 2, Insightful

I don't know what's going on in Perl 6, but it seems like Perl needs some kind of built-in way of running through an xml file by tags, in a way similar to the standard line by line file reading operator. Rather than grabbing a single line at a time, or having to slurp in the whole file before whacking it up, you should be able to pass a regex to the input operator so that it will stop when it gets to the end of a chunk of text defined by an end tag.

Obviously, there are ways of getting around this by using a line-by-line approach, but I'm pretty sure that if such a thing existed and was easy to implement, it would get used a lot and would make Perl far more xml friendly.

--

Congratulations! Now we are the Evil Empire

Re:Perl suggestion by _xeno_ · 2003-03-18 07:46 · Score: 1

If I understand what you're suggesting, that won't work (necessarily) to parse XML. Take the following XML:
<a> _ __ ___b __ _ </a& g t;

("_" used to get around Slashcode's broken <ecode> tag that refuses to allow indentation. To get back at me, ecode then fucked up the closing </a> tag. Unless it fixes it on post, which would be nice but annoying.) I'm going to invent the "$a = <FILE>=~/regex/" notation to do what you're suggesting.
The regex is evalutated on the input stream until a match is found, and then everything up to and including the match is returned. Grouping is allowed, so you're $1's will be set.
So you start by doing a:
<FILE>=~/<(\w+)>/
to get the first tag. (Estutue observers will note this does not handle attributes, empty tags, or all legal tag characters. But it's an example.)
Now that you have the first tag, you want to find the end tag. $1 has the tag name, so you can easily do:
<FILE>=~/<\/$1>/
So far, so good: you have the text for a's children.
But now, you repeat the first regex to get "b" as the tag. The second step will net you:
__ ___b __
Whereas, what you really wanted was:
__ ___b __ _
There's no way to parse XML with regular expressions. As the name suggests, a regular expression can only parse a regular grammar. XML is a non-regular grammar. (A previous poster stated it was a non-regular context-free grammar. I think that's right, but am too lazy to try and figure it out on my own.)
Bottom line: you really can't parse XML using regular expressions.

--
You are in a maze of twisty little relative jumps, all alike.
Re:Perl suggestion by skillet-thief · 2003-03-18 09:17 · Score: 1

Thanks for the detailed explanation. You obviously have a point, in that the complexity of XML is tough to deal with using regexes.

However, if the particular problem that you mention is really the problem, then there might be hope for my idea ;-)...

If the input stream can be parsed for regexes on the way in, then it could be taught to do count the number of nested tags so that it would end at the right level of nesting.

If that was the only problem, then it seems like it could be surmounted. The real obstacle would be if this thing just kept getting more and more complex in order to deal with all kinds of other complications.

--
Congratulations! Now we are the Evil Empire

This guy must be dum then! by Idimmu+Xul · 2003-03-18 02:20 · Score: 1

I managed to get XML handling working fine using libxml++ for my project. It was easy, quick and painless, and that was with using source code examples aswell!

For it to have been any easier, it would have most likely required magic!

--
The problem with slashdot is that most of its users were bullied and stuffed into lockers as kids!

C#?!? by ultrabot · 2003-03-18 02:22 · Score: 1

From the article:

The O-O factory, now chiefly represented by Java and C#, where the Big Company Programmers building Big Systems on Big Iron live.

So someone is actually using C#? In a big company, building big systems? And most surprisingly, on big iron?!?

--
Save your wrists today - switch to Dvorak

What is there to understand? by Anonymous Coward · 2003-03-18 02:23 · Score: 0

This is a serious question. What is there to understand about XML that cannot be explained in about half a sheet of A4?

Re:What is there to understand? by pyrrho · 2003-03-18 10:56 · Score: 1

It is the confusion about XML, the hype, and all the metalanguages created using XML that takes up so much extra paper. XML itself could be explained in that space, but how could you leave out XSTL? And don't you want to talk about RSS and SMIL and SOAP, etc. etc.

That is, some have heard many false promises regarding XML (which they should not have believed, imho, anyway, but still). XML does nothing for you, in the same way that the Heirarchical File System does nothing for you. The HFS allows you to use directories to organize things, but as we know, most file systems are still abysmally disorganized. Damn HFS!

XML allows you to have forward, backward, and sideways compatibility in file formats, it allows you, via XSLT, a path for just-in-time compatibility. But it doesn't actually have any way to promote these things... it just makes them possible.

--
-pyrrho

I agree, of course... by alispguru · 2003-03-18 02:27 · Score: 4, Insightful

Given my .sig, how could I disagree?

XML got one thing right over unadorned S-expressions - document packaging, specifically versioning and character-set labeling. XML inherited this from SGML, and it's one of the few things it took from there that was actually worth keeping.

For a good laugh, read the Origin and Goals section of the XML spec. Of the ten goals for XML listed there:

XML shall be straightforwardly usable over the Internet.

XML shall support a wide variety of applications.

XML shall be compatible with SGML.

It shall be easy to write programs which process XML documents.

The number of optional features in XML is to be kept to the absolute minimum, ideally zero.

XML documents should be human-legible and reasonably clear.

The XML design should be prepared quickly.

The design of XML shall be formal and concise.

XML documents shall be easy to create.

Terseness in XML markup is of minimal importance.

I'd say two of them were met, but were bad ideas (SGML compatibility, terseness unimportant), and five of them were completely missed (ease of use, human legibility, quickly designed, formal and concise, ease of creation).

Thirty per cent is a failing grade, folks...

--

To a Lisp hacker, XML is S-expressions in drag.

Re:I agree, of course... by Elbows · 2003-03-18 02:45 · Score: 1

One other nice thing about XML is that closing tags are matched with ending tags. If you leave of a closing paren in Lisp, the parser will give you an error but it can't pinpoint where you screwed up. But an XML parser can spot which closing tag is missing, which means you don't have to hunt for it yourself.

Also, one of the major ideas of XML is to separate code from data, as opposed to Lisp where code and data are the same thing. Similar syntax, different philosophy, I guess.
Re:I agree, of course... by Twylite · 2003-03-18 04:14 · Score: 2, Interesting

Shameless self-plug, but I have a critique of XML's failure to meet its goals on my home page. You may find it interesting.

--
i-name =twylite [http://public.xdi.org/=twylite], see idcommons.net
Re:I agree, of course... by g4dget · 2003-03-18 09:32 · Score: 2, Insightful

One other nice thing about XML is that closing tags are matched with ending tags. If you leave of a closing paren in Lisp, the parser will give you an error but it can't pinpoint where you screwed up. But an XML parser can spot which closing tag is missing, which means you don't have to hunt for it yourself.
That would be a valid argument if XML were designed to be regularly input by humans. But XML is so cumbersome otherwise that almost all of it will be either machine generated or edited in special editors. And balancing closing tags is easy in Lisp if you use a special editor.
Also, most versions of Lisp give you two separate, equivalen pairs of parens that you can use for checking. So, you write:
[item (part-no 123456) (available 5) (stores 3 7 9)]
And checks can be incorporated into the definition of specific constructs. So, you could have:
(item (part-no 123456) (available 5) (stores 3 7 9) enditem)
Or, you could make this an optional part of the syntax, allowing people to close a list starting with "x" with "/x", but not requiring it:
(item (part-no 123456) (available 5) (stores 3 7 9) /item)
Also, one of the major ideas of XML is to separate code from data, as opposed to Lisp where code and data are the same thing. Similar syntax, different philosophy, I guess.
Lisp programs separate code from data all the time, just like well-written programs in any other language. It's just that on those occasions when you do have to deal with code, you can do so using the same syntax as you use for data. In different words, separating code from data does not require for code and data to have different syntax.
The fact that several web standards use incompatible syntax (DTD, CSS, etc.) is actually a big problem. And the fact that almost no web code is written in XML syntax means that all those scripts are inaccessible to XML parsers and easy automatic analysis. Just imagine how nice it would be if the stuff inside the JavaScript tags could be analyzed and indexed with a bit more confidence.
Re:I agree, of course... by pyrrho · 2003-03-18 11:25 · Score: 1

>But XML is so cumbersome otherwise that almost all of it will be either machine generated or edited in special editors.

good thing machines never corrupt files or we would want to validate machine output as well.

--
-pyrrho
Re:I agree, of course... by g4dget · 2003-03-18 16:33 · Score: 1

Well, that indeed is a good thing. Because if machines did corrupt files, I'd worry first about databases and executable files, neither of which contains much "validation". In fact, we decided long ago to leave that kind of validation to disk controllers, file systems, and TCP, and to ensure syntactic correctness of machine-generated files by using parsing and generation tools.
Re:I agree, of course... by pyrrho · 2003-03-18 17:34 · Score: 1

a disk controller will not address corruption because of a bug in your transformations.

--
-pyrrho
Re:I agree, of course... by g4dget · 2003-03-18 18:54 · Score: 1

No, but neither will matching closing tags, because most likely, you will have library code that generates matching closing tags automatically. So, if you have a bad transformation, you'll just get bad output in which all tags still match.
Re:I agree, of course... by pyrrho · 2003-03-18 19:17 · Score: 1

ok, you win this time g4dget!

--
-pyrrho

What XML can't do.... by twoslice · 2003-03-18 02:31 · Score: 1

"Computer - Holodeck program number 5"

(7 of 9 nekkid, he he he...)

--

From excellent karma to terible karma with a single +5 funny post...

Re:xml by Uller-RM · 2003-03-18 02:33 · Score: 1

Very true. The article mentioned a similar problem with SOAP, using XML to encode parameters for RPC calls and transmitting them using HTTP. Utter tripe, but it'll probably become accepted by force since Microsoft uses it in .Net :\

Part of it is also the fact that XML's strengths are in hierarchical data. If I write something that works with tabular data, you know what I'm gonna use? CSV. Simple and works, and if it ain't broke, don't fix it. *grin*

(My biggest beef with XML actually came once I tried to write my own processor -- writing a fully compliant parser prevents cheap stupid recursive descent parsing; you have to use LR or LALR. And that's after you've put together the required code to handle UTF-8 and the other myriad encodings... I'm tempted to write a program that takes as input a XML schema and target encoding, and outputs a table implementing a NFA/DFA that blindly parses a data file and just chokes fatally on any errors.)

Re:He's not the only one! by Anonymous Coward · 2003-03-18 02:33 · Score: 0

Ha ha, didn't see the redirect.php? Fag.

what the hell are you talkin` about? by Ender+Ryan · 2003-03-18 02:35 · Score: 2, Insightful

It strives to excel at too many things at once, and becomes inefficient and complex as a result.

I agree with this, to an extent. If you don't like/need all the fluff, don't use it. XML is only as complicated and inefficient as you want it to be.

XML tries to eliminate the step of writing parsers for data, although writing parsers has never been a significant part of application development to begin with.

It's not just about writing parsers for a single program. What happens when you have several programs that read the same type of file? What if said file-type is somewhat complex. XML keeps things simpler and easier for these cases.

Its rigidity instead forces you to waste time taking the output of the parser (a complex tree) and putting it into meaningful form.

What on earth are you talking about? YOU define the format of your XML data. If it doesn't need to be complicated, don't complicate it!

XML document tree traversal = 10000x more complex than getting column data out of a ResultSet...

Again, what? Keep the XML simple, and it will be just as easy.

Unfortunately it is also a billion times slower to parse XML than it is to perform a medium compexity database query.

Then XML isn't the proper solution for your problem. Just because some dipshit tries to force XML to do things it isn't optimized for doesn't make XML any less useful.

*snip* the rest of your comments comparing XML to relational databases.

XML files are not high performance databases... Use the right tool for the job, and you will be much happier.

It sounds to me like XML isn't your problem. Your problem is the "genius" at your company that needs to be beat over the head with a clue stick. If I were you, I'd be sure to beat him hard.

--
Sticking feathers up your butt does not make you a chicken - Tyler Durden

WSDL - ugh! by TheSync · 2003-03-18 02:36 · Score: 1

OK, I don't find XML a challenge, but there is really a sharp learning curve in trying to describe even a simple Web service in WSDL.

Of course, if you write in in C#, it will make the WSDL for you, but writing WSDL descriptions of "legacy" Web services is quite painful.

WIMPS by Bill,+Shooter+of+Bul · 2003-03-18 02:37 · Score: 1

This is why Physics Majors will always make fun of computer science majors. Heaven forbid, any of you think.

--
Well.. maybe. Or Maybe not. But Definitely not sort of.

Stay on topic - problem isn't XML standard by cdthompso1 · 2003-03-18 02:37 · Score: 5, Interesting

Tim Bray's article, if you didn't read it, is right on the money. The last paragraph basically states that XML is the best alternative to the data interchange problem because it provides a consistent format. Some of you guys who are rounding up the mob and lighting buildings on fire calling for book burnings and the downfall of all XML have to read the article! You're not in agreement with Tim when you say, "Sure, I think XML sucks, too."

So to be clear, XML is here to stay. (An example of XML penetration: there is a working schema for using XML in the farming industry!) Just imagine the chaos that will insue once MS Office saves all documents in true XML.

My take on the problem Tim's really talking about: inconsistency and the proliferation of people who want to be the next prodigy in their area of expertise. There are so many parsers and interfaces, even within a language domain, because vendors want to put their own spin on everything. The alphabet soup that results confuses the hell out of people. This has even happened in the open source world, where I can do a Google search on "php xml parsing" and read articles on no less than 10 different approaches. For the average guy who has been told by a project manager, "We need to take these XML files from our business partner, extract and store the data in our database," you need a standard approach. Not to stifle thought and innovation, yes, you should take the initiative to understand whether an event-driven approach (SAX parser) or an in-memory object model approach (DOM parser) is right for the job. After all, you do get paid to do this, so earn your keep! But the XML community hasn't done a good job of specifying best practices and leading people by the nose to a solution. Every XML book I've seen furthers the confusion, with each other offering his opinion with a slight variation of how to do things, leading programmers/scripters/whatevers to use the approach they most recently read about, and not necessarily the one that time has proven out to be the most efficient.

Part of this is the divide between the .Net guys, the Java camp, the Perl/PHP folks, etc., but in the spirit of interoperability, maybe the XML promoters just need to dumb things down a bit to get some simple concepts and best practices into the hands of Joe Sixpack Programmer. Maybe a central authority, a la java.sun.com or php.net?

Re:xml by expro · 2003-03-18 02:39 · Score: 1

XML isn't intended for web pages. That's what you missed:

Clearly it IS intended for web pages. The only future of HTML at W3C is XML-based. The only modular form of markup today that allows combination of web standards in a web page is XML.

An idea... by Fnkmaster · 2003-03-18 02:42 · Score: 1

Okay, we all know DOM parsing is a hugely inefficient memory hog of a way to get data out of an XML file. This guy states, perhaps without lots of evidence, that SAX/event based parsing is hard to use.

Now I don't really agree. I've written my share of SAX parsing code ranging from simple to shockingly complex. The real problem is that as your problem becomes more complex, the state machine you build in your SAX parser is going to get more and more outrageous. Lots of booleans, or integers, or other miscellaneous state flags sitting around. It tends to make code that's unreadable by anybody except the author, and even then, I don't know if I could sit down and read some of the massive SAX-based Java-from-XML code generators I built a few years back without serious headaches. No, please don't tell me I'm a bad programmer, I am not interested in hearing unfounded criticism. My code is well structured and well documented, as much so as it could be, given that the structure of an event oriented parser is just plain convoluted.

Obviously if you are doing something really simple, none of this likely matters. And if you are doing something non-time or non-resource critical, you can generally get away with using DOM/tree-based parsing.

But it would be nice to have an alternative syntax that describes what you are ACTUALLY looking for when you are parsing a document. Something more readable than a big ole' event/callback state-machine mess. Alternative syntax (and semantics) for stream-based XML parsing. And that's what this guy is proposing, though his proposal is a bit strange, since it sort of looks like an oversimplified version of an event-callback parser, but maybe I just need to see a more complete concept or prototype of it than that one example. As for me, what I'd like to see is some "state-machine"ish way of describing what you are actually doing in the event parsing, that is compact and hopefully readable in a logical, linear way, so you don't have chunks of code in different methods all over the place flipping little state flags manually. But perhaps in the end any system ends up reducing to a variant of the existing event/callback parsing model, and you just can't gain any syntactic simplicity without major loss of expressivity. I just haven't thought about it enough.

Re:An idea... by 21mhz · 2003-03-18 06:12 · Score: 1

But it would be nice to have an alternative syntax that describes what you are ACTUALLY looking for when you are parsing a document.

Ever read about XMLPull?

Cheers.

--
My exception safety is -fno-exceptions.

C doesn't have it. by torpor · 2003-03-18 02:47 · Score: 2, Insightful

Really.

There's *still* nothing out there that can take my structs', parse them out to XML, then load them back again when needed, seamlessly.

The embedded sphere - where XML is *USEFUL*, and where *C* is *ALSO USEFUL* - has no chance with XML right now.

It's either libexpat and a monster callback module, or bust.

--
; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --

Re:xml by frisket · 2003-03-18 02:49 · Score: 2

>>XML isn't intended for web pages

Wrong. This is precisely what XML was intended for. Go and read the Spec.

Where we went wrong was in using XML for spreadsheet/database-style rectangular data, for which it was never designed, and for which is it grotesquely unsuited.

Meta XML by 4of12 · 2003-03-18 02:52 · Score: 1

As a twist on this, I know people who use XML to describe the syntax of configuration text files that are mostly just full of

name=value

specifications. The text files themselves are left as short, easy to edit by humans, but the computer learns the syntax from the XML.

What would be nice is an emacs mode for automatically shifting between "simple text file mode" and "fully packed in XML air bubbles mode". The former might have fancy highlighting, electric indentation, etc. based on the underlying XML. The latter could show you all the gorey detail, such as dates split up into microscopic elements that can be checked exhaustively in the XML Way for validity.

<date> <year> 2003 </year> <monthnum> 3 </monthnum> <daynum> 18 </daynum> </date>

--
"Provided by the management for your protection."

Re:Meta XML by Ed+Avis · 2003-03-18 03:02 · Score: 1

Yes, I don't see why editing XML documents should always require you to look at the element names and characters. That view is often useful, of course, but sometimes a more concise representation of the document could be better for editing. So elements might be shown as green boxes while elements are shown as red boxes. Or whatever.

--
-- Ed Avis ed@membled.com
Re:Meta XML by rabidcow · 2003-03-18 03:27 · Score: 3, Informative

This is bad XML design.

This would be better:
<date year=2003 month=3 day=18/>

I used to think XML was just horribly bloaty and ugly, now I think it's more like VB in that it's easy to make something that's very poorly designed.
Re:Meta XML by copec · 2003-03-18 04:01 · Score: 1

Thats a very insightful. This should be moderated up.
Re:Meta XML by Anonymous Coward · 2003-03-18 04:54 · Score: 0

Green and red? Using the <blink> tag perhaps? :P
Re:Meta XML by ftobin · 2003-03-18 05:21 · Score: 0, Troll

First, you forgot to quote your attributes, so what you specified is not well-formed XML. Second, most people would argue that you should not be putting 'real' data in attributes, but rather in elements.
Re:Meta XML by 4of12 · 2003-03-18 06:07 · Score: 1

So element attributes are the way to go?

I was just regurgitating a lesson I saw a couple of years ago where they recommended the bloated date, and said it was preferable to the less descriptive
<date> 2003/03/18 </date>

If you're right about the power of element attributes, then most UNIX config files are only a short distance away from valid XML if the name-value pairs are turned into attributes and the equal sign becomes white space.

--
"Provided by the management for your protection."
Re:Meta XML by rabidcow · 2003-03-18 06:37 · Score: 1

So element attributes are the way to go?

Well, honestly I don't know. XML using separate tags for everything is bloated and ugly. XML using attributes is much less so.

I personally would prefer the less descriptive method if there's unambiguous resolution. No one would split a floating point number into a separately labelled integer and fractional part because there's a simple, universal standard for expressing them together.
Re:Meta XML by rabidcow · 2003-03-18 06:51 · Score: 1

Second, most people would argue that you should not be putting 'real' data in attributes, but rather in elements.

Then I would argue that XML is bloated and ugly.

A small amount of bloat is ok, sure, but putting each piece of that in a separate element makes it huge. Even after compression you'll have a significant increase in size.

Also, you'll won't be able to easily read it manually because the 'real' data is lost in the noise of tags. (how often do you surf the web looking at HTML source?)
Re:Meta XML by ftobin · 2003-03-18 07:22 · Score: 1

Oh, I'm certainly not disagreeing with you about the size or readibility of XML. I just wanted to point out that if you're going to do XML, at least do XML right. Personally I don't think parsing is that difficult...that's what we have tools like bison and flex for.
Re:Meta XML by Anonymous Coward · 2003-03-18 11:54 · Score: 0

Bravo, now you have no data. Just meta-data. So now you have an empty document. What is the point?

Ok. Then what are the alternatives? by Anonymous Coward · 2003-03-18 02:55 · Score: 0

There arn't any.

XML is bad like Democracy is bad by Washizu · 2003-03-18 03:00 · Score: 4, Insightful

XML is bad like Democracy is bad. It's just better than the alternatives.

I had a problem at work when we switched from AutoCAD to Solidworks. Our manufacturing software couldn't read the new BOM files, which were Excel's .xls. Without ever looking at our system's BOM files before I wrote a program that read the .xls and built a proper XML BOM file our system could read. If our system wasn't using XML, who knows how long it would have taken me to figure out the intricacies of a proprietary file format.

--
OddManIn: A Game of guns and game theory.

Code vs. data by alispguru · 2003-03-18 03:03 · Score: 1

Also, one of the major ideas of XML is to separate code from data, as opposed to Lisp where code and data are the same thing. Similar syntax, different philosophy, I guess.

Early in the history of AI, there was a lot of argument about procedural versus declarative knowledge representation - whether it was better/more powerful to represent knowledge as code or data structures. The consensus they finally came to was that it really doesn't matter - any sufficiently complex declarative knowledge representation becomes something you can embed procedures in, and procedural systems need to structure their code (or else you can't reason about it) so much that it starts to look declarative.

The Lisp 'code is data' philosophy is just the acceptance of this consensus.

--

To a Lisp hacker, XML is S-expressions in drag.

Ease of SAX by porter235 · 2003-03-18 03:04 · Score: 1

For those who are afraid of SAX, may I suggest you take a look at a collection of articles available at xml.com

High-Performance XML Parsing With SAX
Transforming XML With SAX Filters
- There are others too. Easy to use and does pretty much what he wants.

Re:of course there is! (sorry for the prev post) by Anonymous Coward · 2003-03-18 03:09 · Score: 0

Your document is not well-formed!

<?xml version="1.0" encoding="bork">

should be:

<?xml version="1.0" encoding="bork"?>

Hey Ned Flanders!! by Anonymous Coward · 2003-03-18 03:11 · Score: 0

RTFA = Read The Fucking Article.

XPath makes XML bearable.... by tcopeland · 2003-03-18 03:11 · Score: 1

....here's a document:
<foo>
<bar>
baz
</bar>
</foo>

Here's the XPath expression to get all "bar" nodes:

/foo/bar

Nice and concise.

Over on the PMD project we're replacing many of our Java rules (find empty catch blocks, empty if statements, etc) with XPath expressions. For example, here's the XPath expression that finds empty if statements:

//IfStatement/Statement/Block[count(*) = 0]

Sweet, eh? Props to Dan Sheppard who came up with this excellent technique.

Tom

--
The Army reading list

Re:XPath makes XML bearable.... by Anonymous Coward · 2003-03-18 05:19 · Score: 0

So you've reinvented the hierarchical database? wow. People moved to relational for a reason, ya know.
Re:XPath makes XML bearable.... by tcopeland · 2003-03-18 05:42 · Score: 1

PMD isn't a database... it's a static code analysis tool.

Yours,

Tom

--
The Army reading list

RFC822 by semanticgap · 2003-03-18 03:16 · Score: 2, Insightful

Before XML there was (and still is) RFC822 which describes how headers are formatted in e-mail, HTTP and a slew of other protocols.

I've been down the route where I tried to use XML where something as simple as "key: value" would do, and before I knew it, my program became a bloat relying on third-party XML libs, the config files were only marginally human-readable and a lot of time was wasted thinking about virtues of DOM vs SAX. In the end I learned that using XML for sake of XML isn't worth it.

I think XML is OK if used appropriately - for example I think XML is perfect for something like storing word processing documents. But the idea that every config file and every bit of network traffic should be XML is stupid IMHO.

--
grisha.org

XPATH by boatboy · 2003-03-18 03:17 · Score: 1

((//PROGRAMMER[@KnowsXML='true']/@Skillz) >
(//PROGRAMMER[@KnowsXML='false']/@Skillz))

Now what's so hard about that?

XML is good for by Anonymous Coward · 2003-03-18 03:24 · Score: 0

describing datafiles with a determined structure. E.g. fixed length files, delimited files. I use XML in this way to configure my DB import/export program.

not hard... just slow. by AssFace · 2003-03-18 03:26 · Score: 1

I like using XML for certain things. I just find that it is really only useful for tasks that don't need to be fast and/or small. Working with XML can be annoying if you try to squeeze it into something it isn't ideal for - like trying to race an elephant against a cheetah.

--

There are some odd things afoot now, in the Villa Straylight.

REXML by leoboiko · 2003-03-18 03:27 · Score: 1

I have to agree, most XML APIs are incredible counterintuitive. The only one I like so far is Ruby's REXML (based in Electric XML for Java). If you like Ruby, and you're drowning in useless SAX/DOM code, give REXML a try.

--
Prescriptive grammar:linguistics :: alchemy:chemistry. Stop being a nazi and learn some science.

Since when is XML a "programming language" by callipygian-showsyst · 2003-03-18 03:27 · Score: 1

XML may be too tough for self proclaimed "dot com era" web monkeys, but it's an implementation--and simplification--of 30+ year old SGML concepts.

We use XML extensively as file formats; we have good DTD, and almost all the features of XML. I don't know what the big deal is; I could teach it to any experienced Computer Scientist in a few hours.

Maybe the folks who think XML is "too hard" aren't hiring well! You know what you get when you simplify a programming language because your hires are too stupid to understand C++? You get Java, a crippled language that's all hype and no substance.

--
Best Buy can have you arrested

Re:Since when is XML a "programming language" by crustBro · 2003-03-18 05:05 · Score: 1

Java, a crippled language that's all hype and no substance Bitter C++ programmer?

--
Entropy sucks.
Re:Since when is XML a "programming language" by callipygian-showsyst · 2003-03-18 16:30 · Score: 1

No, I'm not bitter.
It's just because, unlike you, I don't support child pornography, and sex with minors
(One inventor of Java, however, apparently does!)

--
Best Buy can have you arrested

Maybe it's just me... by Gibble · 2003-03-18 03:30 · Score: 1, Insightful

But wasn't the entire point of XML for data exchange. You use XSLT to transform incoming data into the format your software wants, your software doesn't NEED to be able to read an XML format, but it's alot easier to knock off an XSLT file to transform data coming in to work with your app, than coding your app to handle more than one file type.

You create another XSLT for outbound data to transform your proprietary format to XML so it can be consumed by another application, company, etc.

XML isn't made to be used as the be all, end all of file formats, it's made to be a simple, yet robust, generic format for transporting data between disparate systems running on any OS, in any programming language.

The other advantage is XML is self describing, I can glance at an XML file and see what all the data is and write an XSLT to get what I need out of the XML for my application alot easier than glancing at a flat text file for the same information.

And considering there is an XML implementation for nearly every language out there that can be had for free why are people bothering to write there own parsers? What a waste of time.

--
Gibble: Descriptive of an emotional state in which one's mind is scrabbling for some purchase on reality

but the plumbing work.... by oliverthered · 2003-03-18 03:37 · Score: 1

How many projects have you worked on when the interfaces change as frequently as the business requirements?

the plumbing may be hard and buggy, but it's easy to test (it's easy to produce manual input and output tests and use diff), and normally only has to be done once.

--
thank God the internet isn't a human right.

Re:but the plumbing work.... by Anonymous Coward · 2003-03-18 11:39 · Score: 0

How many projects have you worked on when the business requirements changed as much as the data.

Don't answer that. But your interfaces don't mean a thing unless their generic, and then they don't do a thing. XML doesn't mean a thing unless it's a fixed format, which is no harder to parse as a flat file or binary data, both of which have advantages XML doesn't.
Re:but the plumbing work.... by oliverthered · 2003-03-18 22:53 · Score: 1

There are plenty of free parsers for XML out there.

One of the hardest things I've found with XML is writing the data out in an XMLable way, the internal data structure and the XML data structure can be quite different.

--
thank God the internet isn't a human right.

plist by hendridm · 2003-03-18 03:41 · Score: 1

> Without XML, what would you normally do?

How about Apple's plist format.

Python iterators solve this problem by Internet+Dog · 2003-03-18 03:43 · Score: 1

He is letting Perl get in the way of writing clear code. Serveral Python packages are available for processing the stream as it arrives. The following example from an article by Uche Ogbuji, "Simple XML Processing With elementtree" [1], shows something akin to his perl examples coded using a Python iterator approach. Note that the examples has no regular expressions for recognizing XML syntax. That layer is abstracted out of the object processing. The articles is worth a read if you want to see how easy XML programming can be. import sys from elementtree.ElementTree import ElementTree root = ElementTree(file=sys.argv[1]) #Create an iterator iter = root.getiterator() #Iterate for element in iter: #First the element tag name print "Element:", element.tag #Next the attributes (available on the instance itself using #the Python dictionary protocol if element.keys(): print "\tAttributes:" for name, value in element.items(): print "\t\tName: '%s', Value: '%s'"%(name, value) #Next the child elements and text print "\tChildren:" #Text that precedes all child elements (may be None) if element.text: text = element.text text = len(text) > 40 and text[:40] + "..." or text print "\t\tText:", repr(text) if element.getchildren(): #Can also use: "for child in element.getchildren():" for child in element: #Child element tag name print "\t\tElement", child.tag #The "tail" on each child element consists of the text #that comes after it in the parent element content, but #before its next sibling. if child.tail: text = child.tail text = len(text) > 40 and text[:40] + "..." or text print "\t\tText:", repr(text) [1]http://www.xml.com/pub/a/2003/02/ 12/py-xml.html

It goes like this by Anonymous Coward · 2003-03-18 03:57 · Score: 0

Programmer A: "I have a neat, simple idea. You see, if we do [A] then things become simple."

Programmer B: "Cool! That makes [B] and [C] much easier! Thanks!"

Programmer A: "You're welcome!"

Programmer B: "So, it may as well make [B] and [C] part of your standard, too, since they are also great ideas."

Programmer A: "Well, erm..."

Programmer C: "Yeah, and since [D], [E], and [F] are also possible, we should incorporate that as well. You DO want to be a team player, don't you?"

Programmer A: "Yeah, but, well..."

Programmer B: "Yes, and to ignore [B][C][D][E][F] is just plain ignorant."

Programmer A: sigh

In this case A = XML, BCDEF = XSL, XSLT, XPath, XHTML, XSD, XML Schema, yadda yadda yadda

...but a markup language by jjga · 2003-03-18 04:05 · Score: 1

Of course not. It is a markup language, hence its name.

Why not abreviate the element close tag? by Zaiff+Urgulbunger · 2003-03-18 04:11 · Score: 1

I've only just thought of this so before that "oh no, we can't do that 'coz" moment hits me, why do I have to close with ?

Surely if XML dictates that the tags must be balanced, then why can't I close anything with </> ?

That would at least reduce the size of XML files a little (or a lot in some cases).

A simple question - just begging for someone to point out the bleddin' obvious flaw... if there is one!

Re:Why not abreviate the element close tag? by Gibble · 2003-03-18 05:47 · Score: 0

For readability and it makes parsing an XML file easier.

Try and find the bug if you forget to close a tag with , but if all your closing tags are labelled finding the bug is simple.

--
Gibble: Descriptive of an emotional state in which one's mind is scrabbling for some purchase on reality
Re:Why not abreviate the element close tag? by Zaiff+Urgulbunger · 2003-03-18 06:34 · Score: 1

For readability and it makes parsing an XML file easier.

But does it?

Try and find the bug if you forget to close a tag with , but if all your closing tags are labelled finding the bug is simple.

I'll agree with that. But thats only an issue if you're manually editting the XML using a text editor, so why doesn't the standard make it optional? - use the long form if you're creating it manually for easier debugging, but allow the short form if you're confident it will still be valid XML (as in balanced tags).

XML in PHP by horza · 2003-03-18 04:13 · Score: 1

A lot of people do find XML quite scary, which is why a module was created that can be compiled into mod_php which has a non-threatening interface. It's quick, Open Source, and free to use for any purpose. A quick example of opening a config file, changing a value, and saving it again:

$xmldoc = xml_load("myconfig.xml");
xml_setelementvalue($xm ldoc, "server.httpd.domain", "localhost");
xml_output($xmldoc, "myconfig.xml");

There, that's not too scary is it? ;-)

Phillip.

--
Property for sale in Nice, France

@#$%! XML by QuackQuack · 2003-03-18 04:16 · Score: 1

I've never tried to use XML in a programs, but from a user's standpoint, I hate it.

I've seen many programs start using XML config files. I often have need to automate things, which sometimes involves writing scripts that alter a config file. If the config file is of the VAR=Value type, this is pretty easy to do, but if it's XML, you can either try to fudge it, or have to parse the &#$# thing, which makes the task more complex than it needs to be.

I've also seen a movement to replace EDI with XML. Now EDI standards already do everything that XML is supposed to do, but EDI is compact, and relatively easy to debug. XML explodes the size of these files from a few kilobytes to several meg!

--
By reading this sig, you agree to the terms of my sig license.

StAX by Anonymous Coward · 2003-03-18 04:22 · Score: 0

See also StAX, a stack-based API for XML.

StAX extends the SAX api providing support for the delegation of XML sub-trees to sub-tree specific handlers. It's great at pulling out useful bits from long XML streams, and is used extensively in bioinformatics applications.

Replace fgets by Spazmania · 2003-03-18 04:26 · Score: 1

Sounds to me like what you really need to do is replace fgets with fgetxml. fgets stops at the end of line. fgetxml would stop at the end of the next tag instead (i.e. stop at ">" instead of LF.) End-of-line has no meaning in an XML file, so why process it with line-oriented I/O?

Or has this already been tried?

--
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.

XML in a (very tiny) nutshell by tupshin · 2003-03-18 04:26 · Score: 1

1) DO use XML for configuration files for your configuration files if your configuration options aren't easily encompassed by a simple name,value pair model.
2) DO use XML for data interchange wherever the producer is not necessarily the consumer.
3) DON'T try to perform "calculations" on large XML data sets. (this seems to be the pitfall that Tim Bray is falling into).
3) DO convert large datasets to a relational structure if that is a more natural form for manipulating them. (SQL is a QUERY language, XML is not).
4) DON'T dismiss the benefits DTD/Schema validation too readily.
5) DON'T assume that somebody editing XML data has to understand or even be aware of the XML model/underpinnings. Give them a *view* of the data appropriate to the task.
6) DON'T be complacent about the SAX vs. DOM (and never the twain shall meet) dilemma. Check out Ruby's REXML and Perl's XML::Twig and be happy.
http://www.germane-software.com/software/r exml_sta ble/
http://www.xmltwig.com/xmltwig/
7) DON'T misunderstand XSL. If you don't understand why it's a declarative language, don't try to use it for arbitrary information manipulation.
8) DON'T dismiss too readily the value of named closing tags for validation/editing sanity(you know who you are, you s-exp people).

It takes more than a set of tools by apankrat · 2003-03-18 04:30 · Score: 3, Insightful

> However, the article identifies a clear gap in the tooling and that gap needs to be addressed for XML to become a widespread success, instead of another buzzword hype.

It takes more than a set of good tools for a technology to become 'a widespread success'. A clear justification why XML is better than existing standard marshalling techniques would be a good starting point. ASN.1 DER, simple container LSB serialization and others.

I'm probably beating the dead horse here but XML has at least two properties rendering it useless for any performance-aware application:

(a) unlike, say, TLV it does not allow effeciently skipping parts of the data you dont need or aware of. I.e. in order to skip the section, you need to read and parse it first.

(b) XML's is a lazy man ASN.1 DER. It's all there in much more compact and elegant form. The only 'drawback' in the eyes of XML crowd is that it's binary. Sure, everyone knows that encoding numbers as strings is a definite way to improve upon the performance and scalability of everything from network protocols (SOAP, BXXP, UPNP) to a basic document processing. Right on.

The bottom line is that XML has probably reached its acceptance limits. Whoever accepted XML for granted or stuck with it or is not willing to learn about alternatives will keep on whining about tools being sucky. That's life, but OTOH it's only the small part of it.

--
3.243F6A8885A308D313

XML: Really, really, REALLY fat person size by Anonymous Coward · 2003-03-18 04:32 · Score: 0

NT

XML Technology by hackus · 2003-03-18 04:43 · Score: 1

XML Technology has issues, but, more importantly, many of them are caused by the application community themselves, for the following reasons:

1) XML is designed for large organizations. It is not really designed for small businesses.

Large business generate MORE data. More data and large amounts of it, and its meaning is something that scales well with XML. However, XML doesn't scale well to the little.

Small businesses usually cannot afford the people, or the time required to meta data EVERYTHING they do.

It is also, probably impractical.

2) XML because of its attempt to make data portable, requires a great deal of work to maintain. Companies have to build "standards" dictionaries in thier organizations, to organize the definition of data, and someone is charged with making sure reuse of that dictionary is put to good use.

Otherwise your XML definitions becomes a junk pile, sort of like a Lotus Notes database of unimaginable complexity and abysmal organization.

Also, because of the complexity of the problem: Meta Data representation of your data, and the as it was hoped for STANDARDS, could be used to organize, industry wide these dictionaries.

Primarily so you don't have to make your own.

This has become a pipe dream however, because XML usually attempts to define the DIFFERENCES between two organizations data, and as a result a industry specific, multiple standard, that everyone could use, becomes a standard that everyone modifies to suit there own needs in thier respective industries. (i.e. You had tags for energy industry, dairy, etc.)

Which is what should happen. Right?

Technology should allow you to make your business process UNIQUE, not commoditize it.

But in short, industry lead XML dictionary standards have not been as widely applicable to business problems as many thought it would be.

Thats OK, though, really, as even if your business partner doesn't use the same tag, or the same tag to represent equivalent information at least you can build database relations to do the conversion for you. With very little writing of code. (See my message below)

3) Most companies fall short, even when they build XML applications, that they make the mistake of not building a general purpose object framework to support thier XML dictionary.

Oh, many companies I talk with THINK they are, but they really are not building a general purpose object framework. This creates the typical problems associated with building large amounts of code whenever they have to implement thier XML data dictionaries in applications.

XML requirements/specifications are particularly harsh on organizations that have software "coders" that do not understand the following:

a) Object Inheritance (i.e. it is understood...but POORLY so.)
b) Poor understanding of Functional Decomposition of objects. (i.e. Programmers have a poor grasp of how to combine objects.)
c) Poor understanding or NONE AT ALL of Pattern construction in software design. (i.e. Factory patterns, Singleton patterns, iterator patterns)

Without COMPLETE understanding of the above, an XML framework will quickly degrade into a "junk framework" of static functions and structured design principles, decreasing reuse to almost ZERO.

There is something about XML that causes enourmous coding issues if you attempt to solve business software problems with structured design techniques. That something is due to the data. Since the data is abstract to begin with, people writing software attempt to describe the data as descrete in functional method definitions. For example, writing a method to translate a specific tag. That is the wrong way to approach it, and normally you should be using PATTERNS, more sepcifically, Factory Patterns to create/define XML tag actions and values.

Otherwise, you spend WAY TOO MUCH TIME writing code.

Structured design is too feature poor to handle an XML framework of any sort of usefullness.

--
Got Geometrodynamics? Awe, too hard to figure out? Too bad.

XML Digester Anyone? by Anonymous Coward · 2003-03-18 04:44 · Score: 0

Check out the XML Digester from the Apache Jakarta project, it makes parsing xml and populating data objects from xml very easy. The problem is just as someone else stated - people are using sledgehammers (SAX and DOM parsers) to swat at flies. Perhaps the Digester should be ported to Perl?

David Charboneau

Java XML Parsing by SurfTheWorld · 2003-03-18 04:46 · Score: 3, Interesting

Let's decompose the XML parsing "problem" (if one actually exists) into smaller components that we can reasonably discuss. XML parsing is too broad a topic to intelligently discuss, but if you limit it to XML parsing in Java you suddenly have a topic small enough to be manageable. So let's discuss Java parsing in XML.

When XML was first introduced, there were no standard libraries in the JDK to facilitate parsing. What's more, the few projects out there varied wildly in how you actually used their DOM tree or SAX callback mechanism. This isn't necessarily a Bad Thing (tm), it's the same problem every emerging technology faces: immature tools. This is basic biology - lots of competing implementations (life forms), each struggling for community (resources).

So, time goes by, and eventually a handful of implementations emerge dominant. Some dominate due to performance, and some dominate because of ease of use of the API. The victors in this game then sometimes go through a merging process of their own, where the performance victors lend technology to ease of use API victors. After a lot of merging (and flames usually), one or two projects emerge out of the XML kingdom as the dominant players. In my opinion, in the world of Java these are Xalan (Xerces) and Dom4J.

During the maturation process, Sun comes along and looks at the technology and says "Wow this XML stuff is really here to stay. What implementations are out there, and what similarities exist between them? How can we facilitate growth of these projects?" They realize that certain classes (like org.xml.sax.InputSource) are common entities in both projects (even if the class InputSource doesn't exist), and they standardize it. For a reference to all of the XML standards implemented in the JDK, do a search on java.sun.com for JAXP, JAXM, and JAXB (just to name a few).

At this point, the XML projects come back and work in support so that they can be "JAXP compatible" (again this is part of the biological process of evolution). This insures that the projects works well with whatever Sun ships in the JDK.

In the end (which is really where we are now) you end up with a pluggable architecture, where the JDK provides some common functionality or interfaces that are implemented by open source projects.

Java XML parsing was damn hard back in the day - you had to marry your code to a specific project. But these days with the standardization that has taken place (thanks Sun!), as long as you write code that makes use of the JAXP specification you can plug in any JAXP-compliant parser into your app and things *should* work.

The difficult problem is getting other entities (Application Servers for example) to get up-to-date with the standards. WebLogic 6.1 comes with a non-JAXP compliant parser, and thus doesn't work with the latest JDK, Xalan, etc.

--
Do it for da shorties

Programming *is* a hard task by Martin+Spamer · 2003-03-18 04:46 · Score: 1

The problem in not XML as such, but programming parsers is hard, really hard. It one of the most difficult programming tasks in computer science, more difficult that Graphics, Compression even Crypto. So all the effort of the parser developer goes into getting the S/W right and not making the API's simple to use.

Re:Programming *is* a hard task by Anonymous Coward · 2003-03-18 10:09 · Score: 0

The problem in not XML as such, but programming parsers is hard, really hard.

Actually parsers are relatively easy to code if you have a complete grammar to work from.
Re:Programming *is* a hard task by Alan+Shutko · 2003-03-18 10:33 · Score: 1

The problem in not XML as such, but programming parsers is hard, really hard.

Um, no, it's not. Parsing languages which you define has been basically understood for years. It's trickier to parse XML than it used to be, but only because XML has grown, not because the task is inherently difficult.

Yep, we're basically in agreement... by alispguru · 2003-03-18 04:53 · Score: 1

Though I'd say you're far too nice regarding goal 7...

If I were designing a better XML, here are the things I'd try:

* Dump attributes. The semantic difference between text/data and attribute/metadata makes some sense in SGML, but is hopelessly bogus for XML. Make everything elements.

* Replace closing labeled tags with a generic "close-element" tag like </>. This should get you back the terseness you give up by making attributes into elements.

This would turn:

<foo bar="baz"><mumble>grumble</mumble></foo>. ..

into

<foo><bar>baz</><mumble>grumble</></>. ..

which is close enough for my taste to:

(foo (bar "baz") (mumble "grumble"))

--

To a Lisp hacker, XML is S-expressions in drag.

Re:Yep, we're basically in agreement... by mrmag00 · 2003-03-18 07:40 · Score: 1

I'm completly ignorant on this subject, but that gets amazingly confusing and loses readability factor real quick.

complex statements would become hell to deal with by a human.

XML is a bad database by crustBro · 2003-03-18 04:56 · Score: 1

And that about sums it up. We use XML only when we're integrating with some external system that talks via XML. Internally, we always put data into a database rather than an XML document.

--
Entropy sucks.

Complex solution to a complex problem by 3247 · 2003-03-18 05:04 · Score: 1

The problem with XML, as stated by the article (Yes, I've read it), is that parsers are complex to interface with. This of course, is due to the flexibility XML provides:

You can have a complete hierarchical structure, which is far beyond what typical name/value pairs (or relational tables) provide. If you need a structured config file, a non-XML parser and the interface to it will get complicated, too.
XML parsers can parse any XML, not only the subset defined by a single DTD, so the parser has to return everything as it does not know what is just structure and what is the important data.
It might be a good idea to create a tool that takes a DTD and a binding between the XML structure and the in-programme data (such as structures, arrays, objects) and creates the necessary parser and interface.

--
Claus

But what subset are you using? Can't call it XML by Ars-Fartsica · 2003-03-18 05:05 · Score: 1

Thats the problem. If you say you are using XML, then you are using the XML recommendation described by the W3. Anything else is...not XML.

This is the whole point. If you are trying to address the standard, you are dealing with a very complex set of details.

There is an alternative... by oren · 2003-03-18 05:13 · Score: 1

See www.yaml.org. YAML is an project that evolved from SML-DEV. SML-DEV attempted to define a subset of XML that would be both useful and simple enough to avoid XML's biggest headaches.

After much wrangling (this was about the same time XML came up with the namespaces rules that blew up any chance for a reasonable data model for XML), the best we could come up with was Common-XML (http://www.simonstl.com/articles/cxmlspec.txt). While it does avoid some of XML's built-in boobytraps, and I'd strongly recommend any XML user to read it, it doesn't solve the inherent problem - XML is not a good match for common programming data structures, and at the same time *data* XML files are not very human readable.

It isn't XML's fault, really; XML is a great mark-up language. However, it sucks as a data serialization languege, for the above reasons. So, figuring one should use the right tool for the right job, two of us SML-DEV people (Clark and myself) decided to give up on XML compatibility and try to design a data serialization language from scratch. We immediately combined efforts with Brian, the author of Perl's Data::Denter (and Inline::C).

The result is YAML (YAML Ain't Markup Language). After almost two years of working on it, the spec has stabilized and is as good as frozen (it is in "last call" and we plan on announcing a release candidate in April), there is healthy participation in the mailing list, implementations in Perl and Ruby, and active work on additional languages.

YAML is great for data serialization, configuration files, messaging, etc. Take a peek - you might like what you see. (OK, this is a shameless plug for my open source project. That's a valid use for Karma if I've seen any...)

Really! by GoatEnigma · 2003-03-18 05:20 · Score: 1

And with XML, you create a text file and put in whatever tags you'll like that day....

XML is hard by mugnyte · 2003-03-18 05:25 · Score: 1

Saying anything is "hard" is not really a challenge to a language. Languages per se do not solve problems alone, they convert an algorithm into text. The grammar changes, but it's not a big deal.

The pre-packaged libraries/modules/class objects you incorporate can make complexity encapsulated. I think this is what this guy needs. But since he addresses they already exist and there is no standard answer, it's because...

XML has been made so flexible that the "standard" is a large set. Much larger than, say, a programming language grammar. Everything starts with simplicity, and then Need/Desire expand it. So, we get C -> csh -> C++ -> Java -> C# etc. as the needs for programs and platforms become re-prioritized.

XML, on the other hand, imposes a model, but then leaves a grammar open to its use. There's elegance there. However, the need to process data in bulk from many parsers must go away. I'm ignoring the Much Sadness(tm) rant about callbacks. Callbacks are simply another programming model, and I consider it elegant when used correctly.

If any particular XML file is too large for processing, even OS's learned that runtime libraries were a happy addition. Break it into multiple files, for example.

Complexity be damned. It brings out the innovation in us.

mug

always a by Anonymous Coward · 2003-03-18 05:34 · Score: 0

>>' XML has always a divided response among

always a long way to go until submissions are proofread reliably.

Scary but not surprising by Anonymous Coward · 2003-03-18 05:38 · Score: 0

One of the co-creators of XML not only doesn't like it, but

1) Doesn't know what it's for. He's still looking for a reason to use it, but concluded that since he invented it, we all might as well use it anyway

2) Doesn't understand how it is parsed. He states that writing an XML parser "isn't that hard" and gives as evidence that many people have done so. (not him, of course.

3) Realises that the best thing about XML is that it makes searching regexps easier. It's almost as if it was some kind of "markup" or something, to help you find sections in a document.

4) Is naive, not just about programming (take a look at his code samples) but about many other things (see his rants on business and truth.) His attitudes seems best described as "hopeful optimism combined with wishfull thinking and oversimplification"

How many XML books and articles have you read that start out:

Wouldn't it be great if there was a uniform data format so that we wouldn't have to do anything. We could just smile at a computer and it would tell us how to hook up with that hot chick over in marketing.

And then progresses to:

Look, it's easy. If we put these tags around words -- it's just like HTML! -- then we know what they mean. 49.99 becomes the <price> of the <book> (in currency=USD), of course!

And then we're let in on the secret:

Well, XML is actually alot more arcane and complex than HTML. It is based on SGML, which was written in the 1970s--ooh! Aren't you sick of how many HTML weenies there are now. You can't even seriously demand <rate segment="hour" currency="USD">60</rate> for webpages anymore, damn frontpage!

And finally, we're left to speculate that what if XML were really like HTML, and you had links, and tags could have meaning, and heck, maybe it will even search itself someday.

Then the second wave of books came. Telling us how to use a parser. But of course never telling us what its for and why we'd want to parse something. Wave 2.5 had a minor footnote saying "Don't use DOM on really large documents. It turns out that you actually have to use physical computer resources, like memory, when parsing XML"

And now we're in the third wave, Which started out simply enough with "look, we can replace flat files with XML and still retrieve the data!" No more properties files, or reading by line. Now, instead of escaping whitespace, you can escape parentheses, quotations, apostrophes, ampersands, question marks, exclamation points, semicolons (and colons), and I'm sure I'm missing some, but who cares... it looks just like HTML! So now, the weenies who were afraid to edit config files can now edit config files that look just like HTML!

The other, more diabolical part of this wave is the illusion that those promises about linking, searching, and so forth have reached fruition. Look, all you have to do is brute force. Its almost as easy as regexp, and you don't have to remember all those silly little regexp characters that look like binary files -- you can use XML instead!

XML::Twig by amoe · 2003-03-18 05:47 · Score: 1

Don't know about the solution in other languages, but Tim should give XML::Twig a try. Memory efficient tree parsing, and a joy to use if you're used to thinking in Perl. You can get over your fear of callback-based APIs by using anonymous subroutines. The only thing it doesn't do is standards - which Tim seems to discard anyway. So go get it from CPAN, and be happy.

--
You look beautiful! Incidentally, my favourite artist is Picasso.

ridiculous by Ender+Ryan · 2003-03-18 05:49 · Score: 1

I have written many programs utilizing LibXML, for parsing XML files over several megs, all which run several times in under a second. For fun I tested LibXML with much larger XML files and it performed quite well.

Your `xmldiff' example is ludicrous. It is most definately not 5000 times slower, xmldiff is obviously doing something extroardinarily stupid. Furthermore, diff is simply reading the files simultaneously byte by byte and comparing them, xmldiff has to do much more processing because it's comparing XML, not the raw data. That's like comparing diff to a C++ compiler.

Galeon handles it's own bookmarks file in the blink of an eye.

XML does not take so much more processing time than parsing any equally complex text data.

XML isn't meant to replace RDBMs, RDBMs aren't meant to replace flat text lists, etc. Comparing different tools for doing the wrong job is just ridiculously silly.

Use the right friggin tool for the job... jeez...

--
Sticking feathers up your butt does not make you a chicken - Tyler Durden

Re:ridiculous by mcelrath · 2003-03-18 05:55 · Score: 1

Fine, point me to a better xmldiff utility.
But no parsing of a 650k file should take 45 minutes (it's up to 45 minutes CPU now).
In other projects like this I have directly compared regex parsing and tag-based parsing. You are right, doing it "right" takes about a second for most reasonable tasks. Using regexes to accomplish the same thing I can do most tasks in 0.01 seconds.
Again, parsing XML is a CPU-intensive task, thereby making it useless for anything which requires moving a large amount of data. It comes down to having to parse the entire tree to get at any tiny piece of data. (exactly as the author of the original article suggests)
-- Bob

--
1^2=1; (-1)^2=1; 1^2=(-1)^2; 1=-1; 1=0.
Re:ridiculous by Ender+Ryan · 2003-03-18 06:13 · Score: 1

Again, parsing XML is a CPU-intensive task, thereby making it useless for anything which requires moving a large amount of data. It comes down to having to parse the entire tree to get at any tiny piece of data. (exactly as the author of the original article suggests)
I agree. XML does not lend itself well for handling the job of a database. XML is great for documents, configuration data more complex than key/value pairs, etc.
I think you are overstating the CPU-intensity of parsing XML. A good parser will handle pretty large documents before it starts to get slow. I have seen plenty of programs that handle fairly large(>10MB) documents as well or faster than comparable binary formats.
But I do think that people who talk about XML databases, etc. are smoking some good crack :)

--
Sticking feathers up your butt does not make you a chicken - Tyler Durden
Re:ridiculous by mcelrath · 2003-03-18 07:03 · Score: 1

It is an O(n^3) solution in search of a problem.
Another horrible example is XML-RPC. You want RPC to be fast, dammit.
And I still want an xmldiff that runs in a reasonable amount of time. I want to share my bookmarks among my computers.
10MB is not "large" by today's standards. And I simply don't believe you on that one. Any reasonable binary format will have some kind of internal structure (like an index of pointers) that will allow accessing the data without parsing the entire document. Only comparing parsing-the-entire-document to parsing-the-entire-document does XML come out close (and it will not be even because of the overhead of parsing the tags themselves, which a binary format does not have to do). For all other possible tasks it is slower (because all other tasks require parsing the entire document). A solution which can only reasonably handle data up to 10MB on modern hardware should not be considered a success...
-- Bob

--
1^2=1; (-1)^2=1; 1^2=(-1)^2; 1=-1; 1=0.
Re:ridiculous by Ender+Ryan · 2003-03-19 01:12 · Score: 1

And I simply don't believe you on that one. Any reasonable binary format will have some kind of internal structure (like an index of pointers) that will allow accessing the data without parsing the entire document.
No no NO! You are not thinking! I have looked at a couple XML document formats, and many of them use multiple files, zipped up into one file. This could easily contain some type of index, and I would bet many do. This would allow a program to only parse the pieces of the document it needs, after looking it up in the index file. This is quite simple and easy to implement.
Is this going to be slower than a _well designed_ binary format? Of course. But the key here is "well designed". Most simply aren't... And it's not going to be slower enough to be a problem, especially considering that there are high performance libraries for unzipping files on the fly and many XML parsers that are very well optimized.

--
Sticking feathers up your butt does not make you a chicken - Tyler Durden
Re:ridiculous by mcelrath · 2003-03-19 05:32 · Score: 1

Do you have any pointers as to the computational difficulty of parsing XML? Someone else provided a pointer to expat, and while it has impressive benchmarks, it is really meaningless. That it can parse a small xml file in 0.01s is not impressive if it goes like 0.01*N^3 for a file of N bytes.
For the record, xmldiff is STILL running (that's > 22 hrs). "Extraordinarliy stupid" is right. But nonetheless I want to do diffing of xml. The only other tools I can find are proprietary. This can't be that hard.
Also, galeon does not handle its bookmarks file in "a blink of an eye", it often sits for several seconds while the interface hangs, munging its bookmarks.
Also be careful with your comparisons. Giving XML an advantage by comparing it to a poorly designed binary format is not a reasonable comparison.
I have not seen the case you mention, multiple XML files zipped up, one of them an index. But that's an interesting idea.
-- Bob

--
1^2=1; (-1)^2=1; 1^2=(-1)^2; 1=-1; 1=0.

YAML is one by whytheluckystiff · 2003-03-18 05:52 · Score: 1

YAML: http://www.yaml.org/ YAML Cookbook: http://yaml4r.sf.net/cookbook Take a look at YAML. If you've done XML work, you'll see a million great uses for it. It's very simple to learn, rather speedy to parse, and gaining implementations in the Ruby, Python and Perl communities.

Re:YAML is one by Colonel+Panic · 2003-03-18 06:55 · Score: 1

YAML is great. Easy to read _and_ write. It's what XML could have been. Also, it seems to me that perhaps it's more compact than XML (and certainly it can be compressed to much smaller sizes) which is important for various RPC and distributed object schemes.

ps. fuck namespaces by Anonymous Coward · 2003-03-18 05:56 · Score: 0

XML does not have namespaces. XML has a reserved character, the colon (:) that can be used in place of an underscore in an element name. XML parsers look for this colon, and treat any tag foo:bar, as if it were a separate tag from bar alone. Or even foo. Nevermind that foobar foo_bar fooBar and FoOBar are all different tags as well. There is no namespace in XML!!!

GREAT! by Pope+Raymond+Lama · 2003-03-18 06:00 · Score: 1

"XML Co-Creator says XML Is Too Hard For Programmers"

Just GREAT. ANd now, what to do with the f****
xhtml standard which totaly f***** up web page
authoring?

Would the author post a "mea culpa" to W3C, and
ask for a withdrawal of xhtml? I guess not. And
if that stuff catches on, there goes hand-crafted
html down the toilet.

Or any of you ^It is easy to spend a few days
getting used this java library to interface
with XML" would likes to go typing

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

instead of
<hmtl&gt;

???

--
-><- no .sig is good sig.

Re:GREAT! by Gibble · 2003-03-18 07:12 · Score: 0

You do realize the difference between xhtml and html is virtually nothing but a properly formatted html document with matching opening and closing tags.

--
Gibble: Descriptive of an emotional state in which one's mind is scrabbling for some purchase on reality

Newline delimited text is bad too! by gammoth · 2003-03-18 06:04 · Score: 1

This is amazing because I found parsing newline delimited text to be "irritaing, time-consuming, and error-prone" as well!

Isn't it intriguing how the we reached the same conclusion using such profoundly different technologies? Gee, the more things change, the more things stay the same...

VB is rarely the right tool for the job by Gerry+Gleason · 2003-03-18 06:04 · Score: 1

It's nice that you and your company see nothing wrong with writing enterprise applications in a non-portable language, but it doesn't make it a good engineering decision. The biggest weakness is that you are using a proprietary language and environment, so the system is likely going to need to be rewritten in a more portable technology within 5-10 years, if not sooner.

And if you think MS is too entrenched to worry about them going away, just take a look at IBM. They used to own the industry, now they are just a big player; no longer the trend-setter. Even if MS manages to adapt, VB is unlikely to be a stable platform that can be relied on to run your enterprise apps without continually adapting to the changes they make.

Re:VB is rarely the right tool for the job by EastCoastSurfer · 2003-03-18 07:37 · Score: 1

It's nice that you and your company see nothing wrong with writing enterprise applications in a non-portable language, but it doesn't make it a good engineering decision.

Good engineering decisions do not always equal good business decisons. For the parent posters original application portability may not even be an issue. As long as everyone understands that writing something in VB will mean non-portable where is the problem? Always assuming C/C++ is the best language for a job is just as bad as always assuming VB can do it. When you sit down with the requirements in front of the person paying for the project decisions, prioritys, and expectations must be made and set. Portablity was probably just something that right now is not needed and just added cost and time to the project.

The biggest weakness is that you are using a proprietary language and environment, so the system is likely going to need to be rewritten in a more portable technology within 5-10 years, if not sooner.

When the time comes you revisit your original decisions and act on them then. If the app is still functioning and in use 5-10 years from now the first question is why change it? If it needs more features that the current framwork cannot support(this can happen no matter what language you originally wrote it with), then you look at the business case for those features. Perhaps portability is now crucial and the business case supports it. With the things(usage patterns, design issues, etc...) you learned from doing it in VB, it will be that much easier to move it to a more portable language.

Document format by mivok · 2003-03-18 06:09 · Score: 1

One of the things that XML (to me) seems really suited for is what a lot of hype made it out to be in its early days: a document format for web pages.

Nowadays everyone claims the XML should be machine processed, and rearely required to be written by humans, but I would love to just be able to type something like:

<blogpost>
<name>mivok</name>
<date year="2003" month="3" day="18" />
<subject>XML sucks</subject>
<body>
Blah Blah
</body>
</blogpost>

and then have a CSS file apply styles to each of the tags as required, and just display it.

As it is now, with a bit of XSL to convert <tagname> to <div class="tagname> and wrap the document with surrounding html tags, and using .name { style: whatever; } instead of name {style: wahtever;} (i.e. stick a dot before to change from tagname to class entry) in the stylesheets, pretty much the same effect can be achieved.

The advantage if this is all documents are validated as a requirement of being displayed.. no more invalid html because its not possible. (Yes I know about xhtml, and its very nice - at least xhtml 2 is, but if browsers were forced to choke on invalid data, then in the process of testing web page display - people do test their pages dont they? - they would discover an error and correct it).

And of course its then easy to convert the data into any other xml based format with a couple more stylesheets.

My only problem is that I've not found a decent stylesheet parser that will just take a file, run it through one or more stylesheets, and display it, that will run over cgi, and not require any weird libraries to be installed, or XXX version of php that my isp doesnt happen to run, and allow me to say something like 'yeah, but before you transform it, just include this xml file here for a header and footer'
But then again, I havent looked amazingly hard.

Boring.... by shadowpuppy · 2003-03-18 06:10 · Score: 1

This really isn't all that hard a problem. Mangle the SAX processor to build a DOM tree for each record and when you get a closing tag call a doSomething function. Put it in a library and now all you have to do is write the proper doSomething functions for your tasks.

Or you could just url encode your XML record and put one per line. It's not very elegant or standard. But it'll work and should be dirt easy to impliment.

Hooray! by tedgyz · 2003-03-18 06:13 · Score: 1

I thought I was the only one that thought XML sucks! As everyone has pointed out, the file format is not the problem. It is the APIs to parse them that are painful.

I work in a company that was dizzy with XML love. Some misguided tech leads evangelized it as the solution to all our problems. Anyone with half a brain knows that it doesn't solve problems. XML just provides a data format.

XML is just a file format that gives you a regular syntax and saves you from the chores of parsing. DTDs give you symantics, but they are not part of XML. They must be created by you, for your project.

I was so frustrated with Java SAX parsers, I wrote some Java classes that load the XML as a big String and then use String operations to get and set certain tags. I became a happy programmer. This of course only works on small files, but for my situation it was sufficient.

--
"No matter where you go, there you are." -- Buckaroo Banzai

Re:Hooray! by Anonymous Coward · 2003-03-18 11:36 · Score: 0

Use the DOM apis, much better than the JAX shit.

Java Properties is a nice alternative by tedgyz · 2003-03-18 06:15 · Score: 1

I have found the standard Java Properties file format and API solves 90% of the problems that XML zealots would claim XML is good for solving.

--
"No matter where you go, there you are." -- Buckaroo Banzai

Not really a joke anymore by duck_prime · 2003-03-18 06:16 · Score: 2, Funny

It's now official. C++ creator admits it was all a hoax!

In a stunning move, C++ creator Stroustrup identifies the fine line between a ridiculous self-parody of over-engineering, and soul-destroying evil, and pole-vaults over it.

Repeat after me:
You don't overload whitespace.
You don't overload whitespace.
You don't overload whitespace.
You don't overload whitespace.

JAXB to the rescue! by hieronymouSteve · 2003-03-18 06:17 · Score: 1

Parsing a tree structure of any kind can sometimes be difficult because of the looping and recursion that is often required. I have written lots of code that has parsed XML structures with DOM parsers in Java. I don't think it is very difficult, but for the beginner it could be somewhat daunting.

Lately, I have been using XML schema and JAXB to greatly simplify my life. JAXB can take a schema (grammar) and create objects that represent the elements and attributes in an XML document. Once these objects are created, and this is pretty darn easy, the XML can be unmarshalled into the objects with a couple lines of code. Then, navigating the XML document is simply like travesing any other document object model. There are no tag names to remember! No run-time errors that are hard to track down. Plus, you can always run the newly created classes through JavaDoc to get the API for the classes that JAXB creates and passes to you.

Now I no longer write parsing code at all. My code space is more centered around the domain model instead of utilities. Try it, you might like it ;)

Re:JAXB to the rescue! by ktorn · 2003-03-18 13:11 · Score: 1

Where are mod points when you need them?

Thanks for post anyway, it probably saved me many hours of no-so-fun code writting, as I'm just getting into Java-XML programming.

Re:The API is XPath by tungwaiyip · 2003-03-18 06:30 · Score: 1

The API that you describe exists. It is XPath. The next post of a C# example illustrate a possible use of XPath API.

XPath is especially great for getting a single value. It elminate that need to walk a DOM tree of or use callback. However it does not help on a more general case of stream processing a XML file.

XML processing is hard, but XSLT is lot harder! by tungwaiyip · 2003-03-18 06:37 · Score: 1

Hard to diagnosis. Very hard to visualize the XSLT processing. And then there is the XMLized scripting language you have to learn. I wonder how many people uses XSLT vs printf() for generating XML.

Re:Is Open Source the answer? by Dastardly · 2003-03-18 06:46 · Score: 1

Now that we have XML and standard libraries for reading XML, it makes handling these documents a snap. Any program that needs to read them can simply have an XML parser plugged into it. The integrity of the documents themselves is maintained by the fact that they don't work if they're not properly marked up. So all these documents work, 100% all the time, and writing programs to read said documents is very simple and not prone to errors.

I agree, I was waiting for some one from document land to chime in here. I think the big complaint seems to be applying XML to something other than documents.

For me the big advantage fo XML is as a simplification of SGML. The parsers are simpler. XSL and XSL:FO are significantly easier than DSSSL and FOSI (at least to me). Basically, XML has all the advantages of SGML while getting rid of a lot of the complications.

So, for everyone bitching about XML, please bitch about it possibly being shoehorned into apllications it isn't desgined for. And, if you don't like it as a document format, try writing a Perl parser for SGML.

Dastardly

What about YAML? by Colonel+Panic · 2003-03-18 06:59 · Score: 1

I'm surprised to see only one posting about YAML. It seems to have several advantages over XML:
1) it's easy for humans to read and write
(and for that matter, if you're programatically generating YAML it's as least as easy to generate as XML)
2) It's more compact than XML which is important for serialization in an RPC or distributed object scheme. I suspect that YAML also compresses a lot more than XML, too (compress->serialize->decompress)

Re:The API is XPath by Ed+Avis · 2003-03-18 07:04 · Score: 2, Interesting

But XPath, at least its implementation in current languages, takes a string as its path. If you specify an element which doesn't exist in the XML then this error will not be caught until run time. Whereas if the compiler knew about the grammar of the XML file it could tell you immediately 'there cannot be a element at this level' or 'no such attribute'. You could even hit Tab in your editor to see what the available subelements are at the current point in the tree.

Also, knowing the grammar (DTD or XML Schema or whatever) of the XML will help generate more efficient code, better than an XPath implementation could be because the general XPath has to work with all possible XML files, not just those restricted to a certain grammar.

It's like the difference between the putative code

int x = a.b[6]->c["hello"];

which is checked at compile time and compiles down into efficient code, and

int x = tree_query("a/b 6/c 'hello");

which walks some data structure at run time. It's better if the language can help you with the data structures.

--
-- Ed Avis ed@membled.com

WHY XML TRULY IS SHIT by Rwfresh2002 · 2003-03-18 07:31 · Score: 0, Flamebait

If you don't know WHY you need the data you plan on handling, or how to HANDLE the data you need then XML WILL do nothing but complicate things. If you do know what data you need and how you are going to handle it then XML is useless. Take the time to understand your data path instead of wasting time building ambigious data structures. I don't understand the idea of building applications to capture unknown data sources to do things with the data.. And if you do know what the data is then skip the XML parsing bloat and DO IT. You don't need to be a super genius to intuitively know that XML is shit. If you can make your project work without it (all projects can).. save yourself the hassle.

Not hard just misused by awol · 2003-03-18 07:52 · Score: 1

XML is not hard, it is just a big piece of toilet for many (several?) of the tasks for which it is being proffered. A fundamental example is that it is increasingly being touted as a messaging grammar, which is just bollox, it is bytey (ie bloated on the wire), expensive to encode, expensive to parse and it ain't a grammar which means that the touted benefits of "meta"information fail to materialise.

The thing that gets me is the whole "ties" disparate systems together crap. One system talks about the "colour of objects" and the other talks about the "hue of items" and there ain't nothing about XML that helps with mapping the fact that "colour = hue" and "object = item" other than a programmer and XML adds _zero_ value to that process, XSL is _just_ a toy version of a compiler compiler. Use a real grammar to solve real grammatical problems.

end rant.

--
"The first thing to do when you find yourself in a hole is stop digging."

No, you'd use xpath silly. by semios · 2003-03-18 07:56 · Score: 1

No, you'd use xpath silly. For instance, I made myself a simple XSLT and shell script that uses Xalan to easily get access to different XML elements. $ xpath usage: xpath [-sx] match-path file.xml [output file] -s value-of -x copy-of $ xpath -s "/users/user[@name='shane']/@passwd" /etc/passwd.xml foo $ xpath -x "/users/user[@name='shane']" /etc/passwd.xml <user name="shane" passwd="foo" home-dir="/home/shane"/>

Plagarising hoaxes? by ChaosDiscord · 2003-03-18 08:39 · Score: 1

1. C++ is clearly off topic.

2. This is a hoax. Worse, this is any old hoax. Someone has just filed off the original dates provided and changed them to 2003. The first mention I can find is this 1998 post which might represent the original version.

3. By failing to link to any original source or otherwise provide attribution (other than the incorrect claim that it's a from an interview with IEEE's Computer), you are at best infringing on someone else's copyright, or at worst misrepresenting the work as your own.

Three strikes, you're out.

The article does humorously point out some of C++'s shortcomings, but to just repost it here now is wrong.

--
Search 2010 Gen Con events

jingoistic by Anonymous Coward · 2003-03-18 08:44 · Score: 0

I know what type of calendar you got for christmas.

Sometimes a motorcyle is better than a bus by Anonymous Coward · 2003-03-18 08:44 · Score: 0

This is why XML parsers/generators, once they get into entities and DTDs and so on, become really a lot more complicated than they would need to be if XML just stored a tree of elements

Thats why I use small parsers like SmallXMLParser in Java when speed and simplicity are most important. Its free and open source.

Push/pull low-memory hybrid - plug for my XMLIO by Stele · 2003-03-18 08:52 · Score: 1

Okay, it's C/C++ only, and non-validating, but my XMLIO processor was designed to avoid his concerns. It's a push/pull hybrid (you tell it what you want, so you pull only what you need, but it can push elements/data at you and automatically pack into your data-structures).

It's designed not to even bother parsing sections of the XML stream you aren't interested in, which makes it perfect for low-memory situations (it is used by various cell-phone and test equipment manufacturers).

My project is using XML. by sirrube · 2003-03-18 09:43 · Score: 1

I am working on a project right now in C++ that has to send / read back an XML file - I don't really see many issues with managing the data, after I parse the XML file using MSXMLDom objects and placing my needed data into my classes that manipulate the data. The most current DTD I am working on however is a bit overdesigned to the point of stupidity but having the foresight, I managed to eliminate writing alot of code by capitalizing on inherent propertyies of an xml node. As far as doing this in perl or some other language I am not sure how much time it takes to generate the structures to manage a DTD, but in my experience it only took me 1 1/2 days to create all the code / classes I needed to read and manage 60 different elements not counting the number of attributes / enumaritions / lists that each element may have. To me this is pretty much a fair trade off when you include the readibality of the XML file versus managing a TXT file or some other type of container scheme. Dont get me wrong Im just as lazy as the next programmer and i am kinda pissed that it took 1 1/2 days away from me when I could have been surfing pron or /.

Re:Not worth it-How many licks does it take... by Anonymous Coward · 2003-03-18 10:04 · Score: 0

You're argument could apply equally well to windows you know?

Sounds like he wants to 's/regexp/xpath/' in Perl by Anonymous Coward · 2003-03-18 10:07 · Score: 0

Perl was designed to be a powerful text processing tool. The core operations and expressiveness for dealing with text processing have been elevated to the level of first-class language features, in Perl, whereas XML support is provided via a library (module) and is much less mature.

XPath is an expression language designed to express structures and patterns in XML, similar to the way regexps were designed to describe patterns in unstructured text.

In his prescribed solution for the "Scripting Basket", I think XPath is precisely the Perl enhancement for which he was grasping.

I should add that I sympathize with his complaints about having to be forced into either a streaming or in-memory parsing model. For processing chunks of large, relatively flat data, it might be ideal to catch an event for the larger subtrees, and then be able to fetch all of the children for walking or other direct manipulation.

Article - Said it years ago... Nobody listened by Anonymous Coward · 2003-03-18 10:33 · Score: 0

XML Journal published my article (renamed).

http://seapod.org/writing/markup-madness.html

Seems to be the case in practice as well.

Steve Klingsporn
steve at buzzlabs dot com

Tools: Castor, JAXB and Pull Parsers by kupci · 2003-03-18 11:24 · Score: 1

If we take his basic question, "XML is too hard for beginners", well you're correct there are lots of tools & libraries. Here are a few: http://castor.exolab.org/ From the website: "It's basically the shortest path between Java objects, XML documents and SQL tables. Castor provides Java to XML binding, Java to SQL persistence, and then some more. " Pretty amazing I say. Sun's JAXB is similar. As for Pull parsers, Dennis Sosnoski has some interesting articles at IBM Developer Works: http://www-106.ibm.com/developerworks/xml/library/ x-injava/#6

plagiarism? by pwarf · 2003-03-18 11:45 · Score: 2, Insightful

I am not the author of the post you responded to, but I felt compelled to comment.

Plagiarism, in the most commonly used sense, is taking credit for someone else's words or ideas. Since he posted as an anonymous coward, he is unable to take credit. Therefore, he didn't commit plagiarism in the usual sense.

He deserves the lesser charge of failure to cite. As long as we are throwing out accusations, I would accuse you of libel http://dictionary.reference.com/search?q=libel
, but since he's an AC, I can't claim that it damages his reputation. Hmm, never mind. :)

Re:The API is XPath by Anonymous Coward · 2003-03-18 11:50 · Score: 0

XPath eliminates the need to walk a DOM tree?

That's hirarious.

Re:xml by SoupIsGoodFood_42 · 2003-03-18 11:52 · Score: 1

By separting the content from how it is displayed makes it easier to display it in pretty much any format. By taking a single XML document you could create a page that looks great on Mozilla, great on IE, a WAP enabled phone, Opera, Microwave, Fridge - whatever!

Exactly. It should be noted that this is also the purpose of CSS. Of course it depends on the complexity of the page. If your page is complex and requires quite different content (maybe less menu items for PDAs or something), you way need to go XML + XSL -> XHTML + CSS. Otherwise you can just go XHTML + CSS.

Americanization/cultural censor of parent comment by Anonymous Coward · 2003-03-18 12:38 · Score: 0

while()

freedom...

Well, it might be easier if.... by Anonymous Coward · 2003-03-18 13:01 · Score: 0

...people stop referring to it as a language. It's not.

And any "programmers" that have trouble with XML probably aren't worth hiring or keeping around, anyway: it's a sure sign of a poseur. Businesses need to broom these idiots out of their payrolls, and let us real programmers get on with our work.

This mirrors what I've heard VB/Coldfusion dummies whine about C, Perl, and Java - it's "too hard". If it's too hard, pay me more, let me do it, and have the company fire your dumb ass.

Re:xml by cicho · 2003-03-18 15:46 · Score: 2, Insightful

Moderators on crack, the parent is not a troll, he's just about right.

Read any introductory article on XML, or the first chapter of a book - it's so plain and simple and inviting and looks like a great idea. By page 50 of the book you're crawling through a dense pile of industrial trash. A book on XML I bought lists over thirty classes in OpenXML implementation - over THIRTY classes, that's hundres of methods; do I want to to dig into this just to read and write a simple file of records? Where simple and robust alternatives exist? Hell, no.

--
"Only the small secrets need to be protected. The big ones are kept secret by public incredulity." - Marshall McLuhan

Re:Let me get this straight... by ShortRound · 2003-03-18 16:33 · Score: 1

It's kind of funny how once we're occupying Iraq, something like 80% of the active army will be in use and we won't have the resources to deal with any new threats.

or any friends to help us.

That's funny isn't it?

Um, that IS a joke by dozer · 2003-03-18 16:48 · Score: 1

That's an old april fool's joke.

Repeat after me: Before I freak out...
I will always check the date.
I will always check the date.
I will always check the date.
I will always check the date...

I agree.... by Anonymous Coward · 2003-03-18 17:57 · Score: 0

I also am not the author of the post you responded to, but I felt compelled to comment.

Plagiarism, in the most commonly used sense, is taking credit for someone else's words or ideas. Since he posted as an anonymous coward, he is unable to take credit. Therefore, he didn't commit plagiarism in the usual sense.

He deserves the lesser charge of failure to cite. As long as we are throwing out accusations, I would accuse you of libel http://dictionary.reference.com/search?q=libel , but since he's an AC, I can't claim that it damages his reputation. Hmm, never mind. :)

As usual... by Anonymous Coward · 2003-03-19 05:15 · Score: 0

Be sensible, keep things simple, use a DB when it makes sense, use XML when it makes sense.

If I see another XML standard (take X-Links for example) that makes something simple like rocket science I'll go mad.

being the IT guy in an apt building by perfessor+multigeek · 2003-03-21 05:20 · Score: 1

Oh, without question, it has potential. But I have a rude question to ask. Have you ever run an IT department? Running infrastructure for non-techies reliably means, "You should put in X moronic thing. My fourteen year old nephew, who's really smart, saw it on television and he says that it's *much better* then what you're proposing."
How things *could* be bears only the most trivial resemblance to how they end up when every little budget item has to be approved by people who say that "the floppy on my hard drive must have a virus".
I was making about the same amount that you were (>$60K) with great bennies and plenty of opportunity for advancement and I wouldn't go back to doing IT management without a commitment of $20K or more to be spent at my sole discretion per year and an assistant I would choose and train whose schedule was entirely mine to decide. Users are idiots. Or at least enough of them are to make a job like that look to me like the next closest thing to purgatory.

Rustin

--
Data is the lever, rigor the fulcrum, brains the force that drives it all.

Re:being the IT guy in an apt building by crazyphilman · 2003-03-21 08:34 · Score: 1

Well, actually, you're building a straw man based on your experience in corporate tech support, which is a totally invalid and inappropriate way to think about this. It shows that you didn't understand the situation I was describing at all.

Furthermore, knowing that I'm a programmer/analyst, it IS rude of you to ask, "Have you ever run an IT department" because you're not asking a question, you're making a statement: You're saying, "You don't know what you're talking about". That's rude even for Slashdot.

Most of this post just seems very arrogant and egotistical to me... Oh, you wouldn't take an IT management job unless they fall all over themselves kissing your butt with staff and money? Fine. I'll take fries with that, thanks, and hurry up with that coke or I'll snitch on you to your sixteen year old supervisor. Sheesh... Come back to reality! Even during the boom, tech support guys were considered a dime a dozen. You're not REALLY that impressed with yourself, are you?

But, leaving your attitide aside, let's address the point you tried to make. In setting up an apartment network, assuming we have a can-do attitude instead of your fuck-you attitude, what real challenges would there be?

First of all, you'd have to maintain a central server to act as a firewall and router for the complex. That would be completely under your control, and it would not allow physical access for anyone but you and the building manager. You would probably either set it up as a Wi-Fi hotspot (much more likely) or as the center of an ethernet LAN, in which case you would run cable to each of the apartments and give each one a wall jack. Your responsibility to each tenant would begin and end at providing that wall jack. Their PC is *THEIR* problem. If you use Wi-Fi you don't even have to run cable, or visit an apartment. Just give each tenant an information sheet telling them what settings to put on their Wi-Fi software, so they can get hooked up.

Note that individual tech support can be PURCHASED, but it doesn't come with the apartment.

Does this show you how silly your complaint is? Now, quit grandstanding about how terribly hard a tech-support job is and come back down to earth. Or, if you hate the idea so much, *don't do it*! Go sell flowers in the airport or something. I hear the krishnas are hiring...

--
Farewell! It's been a fine buncha years!

apt IT guy reality check by perfessor+multigeek · 2003-03-21 10:19 · Score: 1

My. Testy, aren't we?

Well, first of all, you posited a case where Another possibility is, we could be able to trade LAN admin skills for free rent, building-manager style. Apartment complexes might start building up their own hotspots and such, and they'll need someone to handle the tech support. Handymen at complexes get free rent, so does the super, why not the tech guy?
Hm. Handyman. Building Manager. If you really think that any position that can be compared to those will be free of luser bullshit then may I suggest an antidote.

"Can-do attitude?" If you really are so naive as to think that the right attitude is all that is needed to end up with a well architected system, then along with your ignorant, superior I Am A Real Geek, You Are A Mere Peon trash-talking, well, I'l take that as a big unambiguous "no" to my question. You clearly have no clue, let alone experience getting budget allocations, departmental approvals, and, hardest of all, the continuing support of people who see computers as magic boxes where anything that isn't what they want is YOUR FAULT.
"I erased my hard drive with Norton and now all my files are gone. Fix it."
"I installed unapproved, bootleg, security software, lost the manual, and forgot the password. Fix it."
"Somebody I don't like has a better computer then me. Fix it."
"We refused your budget allocations for five years running and now we can't use the current software or cool new web sites. Fix it. Oh, and don't spend any money or change any configurations or reduce any other existing capability to do it."

Silly?
Dime a dozen?
Arrogant and egotistical?

Fuck you and the government job you rode in on. I don't know quite how you turned a reasonable question asked with careful commentary on its sensitive nature into some penis measuring contest but, well, you clearly don't know shit about actual operations work, let alone operations management.
I ask you again, what experience do you have actually running a support department? How many users, let alone department heads have you ever had to report to re support? What is the largest organization or project supporting non-techies that you have ever been responsible for?

I don't want to hear about your filling in for a month or two once answering phone calls. Have you ever in your life been the senior person, the person at whose desk the buck stops, for any sort of operations? Any support of non-techies at all?

You actually think that your server would be some some sort of sanctum sanctorum? Whoever owned that building would most likely have keys, passwords, and overrides to everything you did. And when the owner chose a service provider whose bandwidth fell apart at key times you would get to kiss the ass of every influential tenant who felt like berating somebody.
You actually think that, as an employee of the building, you could just give tenants "an information sheet" for wireless and then be free from blame when *they* fucked it up? Yeah, right.

Look, I don't know much about you as a programmer. You clearly don't know shit about me.[1] But you have made it mighty clear that you don't understand what a senior tech support job is. I made a point of specifying that I personally would not take that sort of job. Why you so emphatically are displaying a stick up your ass the size of the federal deficit doesn't even interest me very much.

You want to show me how wrong I am? Go for it, baby. There are plenty of buildings these days that include "digital services" in the rent. Find me people holding the sorts of jobs we're talking about and get *them* to agree that I'm building a strawman. Until then, well, when discussing a subject that's already been declared fraught, try not to get snippy with people who know far more then you do. Sometimes we bite back.

[1] I'll give you a big hint: there's a reason that I could take your exposed conduit proposal

--
Data is the lever, rigor the fulcrum, brains the force that drives it all.

Re:apt IT guy reality check by crazyphilman · 2003-03-21 18:20 · Score: 1

My, my my! NOW who's testy??? Heh heh heh... You gotta lay off on the coffee, pal. Temper, temper...

However, risking another massive, diarrhea of the keyboard flame from you, let me make a couple of salient points:

You started this by mouthing off, telling me I didn't know what I was talking about. This is rudeness personified, and I said so. I'll say it again: you were rude. You even SAID you were being rude, while being rude, so you knew you were being rude and did it anyway. So I find it a little bit curious that now you're all up in arms that I took exception to your comments.

You wanna be the crusty, bitter old man who tells the young guy that he's got to get his head out of the clouds and stop dreaming, and get a job, and so on, you go ahead. I'll be the young guy who completely ignores you and has a great time doing so.

Your corporate experience, AGAIN, and I'll keep saying this until it sinks in, has NO BEARING on whether it's possible to enjoy a gig working with an apartment complex. The two are apples and oranges. FIRST OF ALL, I live in a rural area where life is just a little calmer and slower than it (apparently) is in whatever hell-hole you're living in. Second, apartment complexes are VERY DIFFERENT here from what you're apparently used to. Things are, like, casual, dude. The people are NICE. The landlords don't fuck with people here. Christ, where do you live, NYC, L.A, Boston, something like that? Here's a hint: GET OUT BEFORE YOU GET ANY WORSE. It's poisoning you. Seriously.

You've got a totally negative attitude, you don't think anything is going to work, you're all doom and gloom, you think you know everything... So, OF COURSE you think it won't work. OF COURSE you think it'll be hell. OF COURSE you'll look for every possible unfortunate consequence; that's all people like you EVER see. Who cares? It isn't an accurate view of the world; it's just a dark one. You wanna be trapped in that doom and gloom bullshit all your life, go ahead. Count me out.

I'd also like to point out that if I'd known I'd start a class war by reminding you how little weight a tech support resume carries these days, I'd have kept the truth to myself and let you preserve your illusions. Man, they must be deep and closely guarded.

As far as most of this post goes, it's really just a temper tantrum, isn't it? You just completely lost your shit, over a couple of harmless little baiting comments. How sad and immature.

By the way, about the three things you took exception to, i.e.

"Silly?
Dime a dozen?
Arrogant and egotistical?"

You've done a great job of PROVING EACH ONE in this long-winded, overly nasty, foolish reply.

Again, give up the coffee, and move someplace a little less hectic. You're going to have a heart attack, a stroke, or both, if you keep up this way.

--
Farewell! It's been a fine buncha years!

FLAME by Martin+Spamer · 2003-03-25 06:51 · Score: 1

The problem in not XML as such, but programming parsers is hard, really hard.

Um, no, it's not. Parsing languages which you define has been basically understood for years. It's trickier to parse XML than it used to be, but only because XML has grown, not because the task is inherently difficult.

<flame>
Really ?

You are either very smart or very stupid and my money is on the latter since I stated that programmering parsers difficult and you contradicted me and stated that parsers are well understood.

Well I understand parsers pretty well, well enough to understand the distinction between LL, LR, LALR(0), and LALR(1) parsers. However I also understand them well enough to know that most programmers cannot deal with anything other than a simple recursive descent parser.

Well programming parsers is difficult so difficult it is actually at the edge what is possible by human programmers which is why parser generators such as yacc and bison are necessary.

However If you are really are as smart as you make out feel free to point out the obvious ambiguity in my statements above and prove me wrong!

</flame>

brainfuck! by Anonymous Coward · 2003-03-28 03:52 · Score: 0

brainfuck!
brainfuck!
brainfuck!

Anonymous Coward-- exposed! by CausticPuppy · 2003-03-28 04:47 · Score: 1

Damn, and all this time I've been thinking that this "Anonymous Coward" person was one of the most brilliant (and most prolific) posters on this discussion board. Now you've demonstrated that he's nothing but a fraud. I think I've lost all of my respect for Anonymous Coward.

--
-CausticPuppy "Of all the people I know, you're certainly one of them." -Somebody I don't know

Re:Commas suck by Hognoxious · 2003-04-01 06:18 · Score: 1

Comma delimited works fine, until you order some parts with description like "Bolt, Hex, Brass, 30mm, fixing for the use of", and all the other fields get shunted across, and the PHB is wondering why it doesn't work.

And yes, I've seen that happen.

One advantage of the comma is that it's easier to hardcode as a literal in your program than a tab, and on this ocassion that was why it had been done that way.

Then there's European number formats...

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."

Last Post! by alpg · 2003-04-01 19:14 · Score: 0

Mr. Jones related an incident from "some time back" when IBM Canada
Ltd. of Markham, Ont., ordered some parts from a new supplier in Japan. The
company noted in its order that acceptable quality allowed for 1.5 per cent
defects (a fairly high standard in North America at the time).
The Japanese sent the order, with a few parts packaged separately in
plastic. The accompanying letter said: "We don't know why you want 1.5 per
cent defective parts, but for your convenience, we've packed them separately."
-- Excerpted from an article in The (Toronto) Globe and Mail

- this post brought to you by the Automated Last Post Generator...

Slashdot Mirror

XML Co-Creator says XML Is Too Hard For Programmers

562 comments