Slashdot Mirror


Tim Bray on the Birth of XML, 10 Years Later

lazyguyuk writes "Tim Bray posts a lengthy blog on the birth of XML, formalized as 1.0 in Feb 1998. 'XML is ten years old today. It feels like yesterday, or a lifetime. I wrote this that year (1998). It's really long. The title was originally Good Luck and Internet Plumbing but the filename was "XML-People" and I decided I liked that better. I never got around to publishing it, so why not now?'"

48 of 260 comments (clear)

  1. Classic by Gothmolly · · Score: 5, Funny

    Young Buck: Hey, we have a data exchange problem between two systems, lets use XML !
    Greybeard: Ok, but now you have 2 problems.

    --
    I want to delete my account but Slashdot doesn't allow it.
    1. Re:Classic by smittyoneeach · · Score: 5, Insightful

      In defense of XML, the parsing problem is handled.
      Best wishes on solving the semantic snarls.
      XML, like all good approaches, handles mechanism, not policy.

      --
      Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
    2. Re:Classic by fireboy1919 · · Score: 3, Interesting

      In defense of XML, the parsing problem is handled.

      To me that says that XML handles a problem that wasn't there. Parsing problem for pretty much everything is almost universally solved by regex...

      I don't really care about the XML format. Personally, I'd be happier if it were stored in binary. The thing I like is the DOM tree as a data construct, XPath as a means of addressing, and XQuery as a means of getting parts out of it. (XSLT is okay, but from my experience, it's a lot clearer to represent a transformation as a series of productions than it is to use XSLT...perhaps a production-oriented approach that used XPath addressing?)

      With those, you've got a good mechanism for serializing, reading, and deserializing objects, classes, and all manner of other things.

      There are only a few problems with this:
      1) Non-ancestor relationships and references (i.e., having the same node as multiple locations in the XML document) are not covered by XML, but are possible with objects.

      2) Attributes in XML have no obvious mapping to objects...so what do you do with them?

      I wish we could use something like XML (in that it could use DTDs as schemas, and had support for DOM methods along with XQuery and XPath), but with a more effecient format (binary), and with the ability to encode references.

      That would be just about perfect.

      --
      Mod me down and I will become more powerful than you can possibly imagine!
    3. Re:Classic by oyenstikker · · Score: 4, Insightful

      There are only a few problems with this:
      1) Non-ancestor relationships and references (i.e., having the same node as multiple locations in the XML document) are not covered by XML, but are possible with objects.
      You can with refids and keys.

      but with a more effecient format (binary)
      It is wonderful to be able to easily read and edit the data in a text editor. If you want it more compact for storage and transmission, compress it. I understand that a binary format could lead to more efficient processing and parsing, but I think the benefits of readable text outweigh the efficiency.
      --
      The masses are the crack whores of religion.
    4. Re:Classic by Flambergius · · Score: 3, Insightful

      To me that says that XML handles a problem that wasn't there. Parsing problem for pretty much everything is almost universally solved by regex...

      XML doesn't handle parsing. XML makes parsing easier; in fact so easy that parsing XML isn't a problem anymore.

      For an expert, I think XML and regex are complementary techniques. For anyone other than an expert regex are way too brittle. Ordinary people need to be able to operate on their data, it can't require voodoo. (Not that XML in all its arcane application is anything close to plain English, but it's much better than custom data formats and regex.)

      --
      Computers are useless. They can only give you answers - Pablo Picasso
    5. Re:Classic by shutdown+-p+now · · Score: 2, Informative

      To me that says that XML handles a problem that wasn't there. Parsing problem for pretty much everything is almost universally solved by regex...
      God, no... another Perl hacker...

      Regex are not a solution to everything, and most certainly not to writing fast parsers!

      (Not that XML is easy to parse fast, but that's another story. You still don't write a JSON parser using regex.)

  2. XML and Interfaces by PIPBoy3000 · · Score: 2, Insightful

    I realize the XML is used for a lot of things, but whenever my fellow developers learn that the vendor is shipping us some interface in XML, the groans are audible. About half the time, their XML format isn't quite standard, and we've got to dig around for utilities to try and work with it (or write something custom). I'd say the vast majority of our interfaces are good ol' delimited text files.

    For other purposes, XML is great and very readable, but I'm not sure it makes sense to use it everywhere.

    1. Re:XML and Interfaces by MBCook · · Score: 4, Informative
      Here are some of the "fun" things I have run across in other people's (almost certainly custom) XML interpreters/producers:
      • Tags must be upper case
      • Tags can't be upper case
      • You must put line breaks between elements
      • There can't be any whitespace between elements
      • It's import to URL encode the XML before it gets sent from them to me
      • You don't need CDATA blocks, just put the ampersands and >s right in there, it'll be OK
      • Your XML should all be inside a CDATA block in container XML
      • No tags can self-close
      • Self closed tags need a space between the slash and bracket
      • Self closed tags can't have a space between the slash and bracket

      That's just what I can think of off the top of my head. We've seen quite a bit of crazy stuff. If everyone would just use one of the already written XML producers or parsers (the big ones, the ones that work) life would be much easier around here from time to time.

      --
      Comment forecast: Bits of genius surrounded by a sea of mediocrity.
  3. Re:10 Years and still waiting by CRCulver · · Score: 4, Informative

    Do you maintain a website? XML has been a godsend for those who want to maintain web and print output side by side. By keeping your data in an XML format, you can use simple XSL stylesheets to generate multiple types of output. See e.g. O'Reilly's XSLT Cookbook for dozens of very real-world examples (it's probably in your library).

    That's just one example of how XML technology has made coding easier. Others I'm sure will point out others.

    If you aren't a developer, then I'm not sure XML was supposed to directly revolutionize your end-user experience.

  4. Java and XML, bad tastes that are worse together by Omnifarious · · Score: 4, Insightful

    I've recently taken a job at a primarily Java shop. After seeing XML used and abused for ant, maven and various other things I've grown even more disenchanted with it. And now I've also gotten the chance to see that not only does Java represent a poor trade off between the annoyances of a strongly typed language and the speed of a dynamic interpreted one, it has a horrible mess of dependency issues that nobody really solves besides.

    I'm much more hopeful about technologies like Thrift and/or D-Bus than I ever was about such abysmal abominations as SOAP, or the only slightly better XML-RPC.

    The Java XML world seems like this little closed ecology of mutual masturbators who all come up with more Java and XML 'solutions' to problems that never existed before they started using Java and XML.

    I see the value of XML for long-lived documents that don't spend a lot of their life on the wire. And possibly for config files, though IMHO it is too ugly and unreadable for those. But as a general tool for Internet plumbing it's awful.

  5. Re:IVE BEEN WAITING SINCE 1998 by halivar · · Score: 3, Funny

    Looks like you're going to have to wait a little longer. Try holding your breath, this time.

  6. Oblig by mariuszbi · · Score: 5, Funny

    XML is like violence.. when it doesn't work, use some more!

  7. Java and XML - Addendum by Omnifarious · · Score: 3, Insightful

    And, of course, my post is incomplete with reference to my little rant on why CORBA and other forms of RPC are bad. Both Thrift and D-BUS are pretty close to the ideal solution I describe later. They focus on message content over semantics and are extremely easy to parse. SOAP and XML-RPC fail on both of those counts. They are about semantics (you are making a remote function call that does some specific thing, not sending a hunk of data that has some particular content) over content and they are a huge pain to parse.

    1. Re:Java and XML - Addendum by cjonslashdot · · Score: 3, Interesting

      CORBA uses IDL for interface definition. Therefore, you don't even have to write code to parse it: the parsing code is generated automatically. So the arguments about parsing are non issues. With regard to content, one can define content in IDL very easily. I have not used the APIs you refer to (e.g., Thrift), so I cannot comment on those. I will say this though: when I used to write apps 10 years ago using CORBA, it took me so little time to throw a system-to-system interface together that I almost didn't even think about it. The same with EJB, except that persistent EJBs were flawed and so EJBs lost credibility even though the API model (similar to CORBA) was (and is) extremely easy to use. Then people started wanting to communicate across firewalls, and OMG didn't get its act together and make IIOP capable of traversing firewalls before people got hooked on hand-coding HTTP messaging, which then led to XML messages and SOAP and Web services. The right answer was to fix the OMG spec for pushing IIOP through firewalls in a standard way. Nowadays, whenever I have to create an inter-system interface and the options involve SOAP or Web services or some other XML-based interface, I groan and it takes me ten times as long to get the interface built and reliable. That is not progress. I will look at Thrift and the other API that you mention, and we may disagree on some thing (e.g., the value of type safety), but I agree with you that XML-based messaging has been a huge, huge step backwards.

    2. Re:Java and XML - Addendum by Omnifarious · · Score: 4, Insightful

      CORBA is a minor pain to parse. From what I could tell you could just sit down with a spec and code up your own parser for ye-old random language in a day or two. But that's not my major issue with it.

      My major issue with it was that it promotes designing distributed systems that focus on the semantic roles of the participants instead of the data moving around. In fact it discourages programmers using it from even thinking of what they're doing as sending messages to some system many milliseconds away. Among other evils this leads to all kinds of interesting issues with threading and concurrency that didn't even have to exist.

  8. YAML and JSON by goombah99 · · Score: 2, Insightful
    I'm perpetually surprised every-time I see a new implementation of XML. For example, macintosh plists, many of which replace older ad hoc Unix configs, are in XML. Why oh why do people use XML for data centric, quasi-human readable configuration files when YAML is the ideal solution for this. And for web usage, where perl, python, and ruby abound, why would would people not use YAML since it's so easy to parse with just regular expressions, and because you don't have to instantiate the multi-megabyte structured data entire file just to grep out one record. And in the day of java script and web 2.0, there's JSON. So why does this ponderous obsolete dinosaur XML persist.

    Perhaps I'm being too negative here. I sound like a troll. But really folks, do yourself and the rest of us a favor and read up on JSON and YAML. You''ll see I'm being only too kind and generous to YAML.

    --
    Some drink at the fountain of knowledge. Others just gargle.
    1. Re:YAML and JSON by ral8158 · · Score: 2, Insightful

      Actually, OS X uses plists because XML, which is more widely known than YAML and much easier to learn, is built directly into the Cocoa API.

      Using an XML file basically consists of the following code:
      NSError xmlError = [[NSError alloc] init];
      NSXMLDocument doc = [[NSXMLDocument alloc] initWithContentsofURL:@"Put your URL here" options:NSXMLDocumentTidyXML error:&xmlError]; //Handle errors around here

      Then you can basically do anything with your doc object. You can insert a child at a certain index, you can ask it for the root element, you can set the DTD to something else, you can apply XSLT to get transformed XML or HTML markup back, you can validate it against its DTD, you can delete children at a certain index, etc. All of these actions take one line. It's a really beautiful, simple system. YAML is... not.

    2. Re:YAML and JSON by tjansen · · Score: 2, Informative

      As you say, YAML is a specialized markup-language (data-centric, almost human-readable) and not a good choice for many use-cases (document-centric languages like XHTML and DocBook, combining languages with XML namespaces). In other words, it can not replace XML, it's just another syntax to learn. It needs a completely new infrastructure: new parsers, new editors, new schema description language, new translation languages and so on. Is that really worth it, only to make editing files with a simple text editor easier?

    3. Re:YAML and JSON by fireboy1919 · · Score: 2, Insightful

      In other words, it can not replace XML

      That's pretty much completely wrong. YAML's functionality is a superset of XMLs while being easier to read & understand (because the *basic* usage of it is exactly the same as XML's, but with a simpler syntax). It just hasn't been adopted anywhere except configuration because that's the easiest niche to move into.

      it's just another syntax to learn.

      That's a stupid thing to say. Anybody that can't learn the syntax of either XML or YAML in less than five minutes shouldn't be working with either of them. They're both ridiculously simple to understand.

      It needs a completely new infrastructure: new parsers, new editors, new schema description language, new translation languages and so on.

      That is true, and probably the reason we won't be moving to YAML for quite a while.

      --
      Mod me down and I will become more powerful than you can possibly imagine!
    4. Re:YAML and JSON by cliveholloway · · Score: 5, Funny

      <reply xmlns="Slashdot:Comment">
          <paragraph>
              <sentence>What?</sentence>
              <sentence>Are you telling me that this isn't the preferred way of presenting data?</sentence>
              <sentence>Honestly, this &amp; SOAP are two technologies that have made my life so much more &quot;interesting&quot; as a developer.</sentence>
              <sentence>Fucking XML...</sentence>
          </paragraph>
      </reply>

      --
      -- Trinity in high heels carrying a whip: The donimatrix - there is no spoonerism
    5. Re:YAML and JSON by goombah99 · · Score: 2, Interesting

      Actually, after looking at that reference card, YAML is much more complex than I thought it was.. compared to that, XML is simple (provided you ignore all that outdated crap like DTD/Doctype, processing instructions) and just use elements, attributes and built-in entities.
        Well good for you, for actually looking. But as you say about XML, most of the time you only use the base elements in YAML too. In YAML those are "-" for arrays, ":" for hashes, and "|" for block quotes. YAML streamlines things even further by getting rid of close-tags and it mostly dispenses with attributes being special data and having to live in tag, and just merges them all into the payload area, putting all data and attributes on equal footing.

      Here's another document to look at that's a great 1-page introduction to YAML in action.

      But sure I agree YAML does not have a lot of pre-written stuff out there for exploiting it. My original lament was that XML is the default choice when it's a poor choice. For Configuration files, and document headers, and simple output from most programs YAML makes a far superior choice both in human readability and for fast parsing.

      --
      Some drink at the fountain of knowledge. Others just gargle.
    6. Re:YAML and JSON by msuarezalvarez · · Score: 2, Insightful

      I've never understood why people complain about XML as you do.

      Are you generating XML by hand in your applications? Are you not parsing it using some standard library into an abstract tree or using a standard library to transform XML documents into sequences of events, in exactly the same way lex tokenizes a string of characters? Are you generating it by concatenating strings?

      SOAP is complicated, but that has nothing to do with XML.

      XML does exactly one thing: it allows you to pretend that data is provided to you in the form of an abstract data structure instead of as a sequence of bytes, taking care of encoding issues, namespacing, and what not---assuming, of course, that you are using proper tools. How is that bad?

    7. Re:YAML and JSON by bytesex · · Score: 2, Interesting

      It's bad because people ARE generating XML by hand, which, according to the spec, they should be able to do, making a lot of syntactical mistakes in the process (to which it is prone). Plus; it's terrible to read. It's also bad because on the machine side, it takes a lot of effort (CPU cycles, parser-programmer effort) to decipher. In other words, it the worst of both worlds. It's the Visual Basic of formats: you can really only use it with GUI tools, but you can't really do what you really want to do with it in the way you want to do it.

      --
      Religion is what happens when nature strikes and groupthink goes wrong.
  9. Re:Java and XML, bad tastes that are worse togethe by MBCook · · Score: 2, Interesting

    I do a lot of Java and XML. I don't know what you're using for a library, but I'd suggest JDOM.

    As for the abuses for Maven and Ant... yeah. I'll agree. There are a lot of things that seem to use XML just because they can. I know there is some theory behind why they use them (machine readable, blah blah blah) but for most things it's just a giant pain for the complexity you get. Maybe if you were trying to build Windows with Ant.

    --
    Comment forecast: Bits of genius surrounded by a sea of mediocrity.
  10. Re:10 Years and still waiting by Coelacanth · · Score: 2, Interesting

    Excellent point, and I'll take it one step further. When coupled with XSLT and other WS-* standards, you have an extremely flexible way to connect otherwise absurdly different applications (See Sun's OpenESB and JBI standard).

    The hatred for XML, I think, stems from frequent, ugly misuse. Here's one basic, freakin' obvious rule: if a human, at any time at all, has to read or manually edit an XML document, you're doing it wrong. Just because it's ASCII doesn't mean it's human-compatible.

  11. Re:10 Years and still waiting by CRCulver · · Score: 3, Informative

    Just like LaTeX! Reinvention is a wonderful thing.

    LaTeX is restricted to certain types of print output. It emphatically cannot output HTML easily. Just look at the umpteen thousand threads on comp.text.tex where someone complains that

    latex2html</ecore> can't handle anything more than a handful of quasi-default LaTeX packages. Plus, Unicode support in LaTeX has been shoehorned in and is still incomplete (though xetex is making strides), while at least XML was designed around Unicode. And then there is the fact that XML encourages semantic markup, while LaTeX contains non-semantic tags like <ecode>\textit
    .
  12. Re:Java and XML, bad tastes that are worse togethe by GodfatherofSoul · · Score: 5, Interesting

    Yay! Nothing like the combination of XML and Java to bring out the haters. Incompetent use of a language/API doesn't equate to a bad language/API. I can show you plenty of crappy C/C++ code freely browsable in some open source libraries. Does that mean C++ sucks? Hell no.

    My experience with Java+XML you ask? OFX servers for financial institutions. Without name dropping, check out the list of banks, brokerages, tax services, and credit card providers (Quicken) out there successfully serving up client data. I guess we're all circle jerking while you're downloading your account information into Quicken or Money.

    Some good uses for XML:

    • Ephemeral representations of atomic, structured data; usually for transport.
    • Config files. More verbose and the syntax is far better at keeping you from fat fingering a setting and blowing up your app. If you can't clearly read XML, you need glasses.

    Some bad uses for XML:

    • High volume, rapid response data streams; like say an on-line multiplayer game (though I've never benchmarked this)
    • Unbounded data streams; e.g. streaming media
    • Databases

    I have to admit, I'm clueless about your Java dependency issues. The only way I can see that ever happening is if you're dumping all of your classes into the default top-level package; and that's major user error if you are.

    --
    I swear to God...I swear to God! That is NOT how you treat your human!
  13. Here, let me fix that for you ... by trolltalk.com · · Score: 4, Insightful

    If everyone would just use one of the already written XML producers or parsers (the big ones, the ones that work) life would be much easier around here from time to time.
    If everyone would just went back to using simple delimited ascii text life would be much easier around here.

    1. Re:Here, let me fix that for you ... by kyz · · Score: 5, Insightful

      I have, and I can tell you that it's a waste of time.

      It amazes me how something that looks so simple can have so many corner cases, and how they can be solved so differently by different implementations.

      CSV is fine if you want to store data that has no quote marks, commas, carriage returns or linefeeds. For everything else, please use a better specified format, preferably one that has a formal definition. Like XML, for example.

      --
      Does my bum look big in this?
    2. Re:Here, let me fix that for you ... by trolltalk.com · · Score: 2, Informative

      "Ever tried parsing CSV?"

      All the time. Its not that hard. Also, if you're worried about such things as quoting, etc., you can always use fixed-width fields - makes indexing, looking up, and modifying values REAL FAST. Compare that to the mess of xml.

    3. Re:Here, let me fix that for you ... by thrillseeker · · Score: 3, Interesting

      I knew we would (d)evolve to punch cards eventually.

    4. Re:Here, let me fix that for you ... by CaptainPinko · · Score: 3, Insightful

      ASCII doesn't even support the letters needed by the majority of the world's language.

      --
      Your CPU is not doing anything else, at least do something.
    5. Re:Here, let me fix that for you ... by trolltalk.com · · Score: 2, Informative

      If you know how many fields there are in each record, then why did you need a special record delimiter to begin with? Sounds like a design mistake, which isn't surprising since it was ad-hoc...

      Wrong - the special null delimiter is needed only for variable-length (and zero-length) fields and records. For fixed-length fields and records, no delimiter is needed.

      For example: First Name\0x00Last Name\0x00Age0x00\0x00

      Joe\0x00Blow\0x0042\0x00\0x00
      Mary\0x00Doe\0x0024\0x00\0x00
      \0x00Cowboyneal\0x00\0x00\0x00

      In the above example, Cowboyneal has no first name and no age.

      What's so hard to understand about that? For a fixed-length field?recordset, just include a header ... FirstName:10:LastName:10:Age:3\n
      Joe_______Blow_______42\n
      Mary______Doe_______24\n
      __________Cowboyneal___\n

      Both are human-readable, both are easy and intuitive to parse out, the second one is self-documenting and fully supports random access, etc (and neither one is new - the first is used on most *nixes, with either a : or | instead of a null, databases have been using the latter format for decades).

      By contrast, xml is an abortion. Heck, I'll go further - xml is the ultimate triumph of navel-gazing over real-world experience.

  14. Re:Java and XML, bad tastes that are worse togethe by bckrispi · · Score: 4, Insightful

    I'll take an Ant XML build file over an "is that a tab or a space" Makefile any day...

    --
    Xenon, where's my money? -Borno
  15. Re:10 Years and still waiting by EMN13 · · Score: 3, Informative

    I use it in web development constantly, and have for about 8 years. It's great for documents mostly since it's much easier to process than a home-grown set up.

    You want to transform the document, you can use any of a number of techniques, and trivially guarantee that the resulting document is at least syntactically valid. If you use a home-grown format (or HTML), you'll need to resort to regular expressions, or a custom parser - which works fine up to a point. Regex's are error prone (it's quite difficult, for instance, to make an untrusted HTML document safe with regex'es), and parsing is difficult, and doesn't solve the transformation step very elegantly - wheras XPath and others are absolutely brilliant for quickly distilling the stuff you need from a document.

    But on the parsing side... take a look at ANTLR, it's just great :-).

  16. Your comments seem tainted with inexperience. by sidragon.net · · Score: 3, Insightful

    In general, if you have data to be structured and serialized, XML is one way to do it. If you think XML a poor choice, then could you suggest an alternative? Incidentally, that suggestion should not imply that everyone reinvent their own formats (again).

    [N]ot only does Java represent a poor trade off between the annoyances of a strongly typed language and the speed of a dynamic interpreted one ...

    Would you provide evidence aside from personal anecdotes, and possibly consider evidence to the contrary?

    [Java] has a horrible mess of dependency issues that nobody really solves besides.

    Perhaps you meant “modern software” instead. Any complex application these days relies on dozens of libraries and services to perform tasks. Not quite sure where exactly you are having difficulties, so I cannot elaborate further.

    [XML] is too ugly and unreadable ... But as a general tool for Internet plumbing it's awful.

    XML is intended for consumption by machines first, people second. You might also argue that in-memory data structures are ugly and unreadable.

    1. Re:Your comments seem tainted with inexperience. by argent · · Score: 2, Informative

      If you think XML a poor choice, then could you suggest an alternative?

      Depends on the problem you're trying to solve.

      A hell of a lot of the stuff I'm seeing in XML these days would be better off as token-separated self-describing tables (tables where the column names are the first row), or a modestly extended token-separated format like CSV.

      For binary data something derived from Electronic Arts semi-self-describing interchange file format is good, examples in current use are MIDI File Format and Portable Network Graphics...

      For arbitrary self-describing data there's always ASN.1.

      For tagged arbitrary chunks of data descendants of RFC-822 are common.

      For shallow-nested keyword-value data there's Microsoft's INI files.

      And, of course, Lisp S-Expressions do absolutely everything XML does, more compactly, and are easier to parse.

      Incidentally, that suggestion should not imply that everyone reinvent their own formats (again).

      But XML doesn't solve that problem. I've found that the amount of code it takes to extract data from an arbitrary XML file even with an XML parser at hand is not significantly less than the amount of code it takes to parse and extract data from any other self-describing format.

  17. Re:XML was formalized? by Jerf · · Score: 4, Insightful

    Yes. XML was formalized. It is strictly defined and easy to check for compliance (with the right tools). Only a little bit of the definition has passed out of common usage, mostly focused around DTDs.

    If you encounter a file that claims to be XML, but does not meet the XML standard, then it is not the XML standard that is to blame. The claim is wrong and the file is not XML.

    XML is not a fuzzy-wuzzy adjective that can be applied willy-nilly to anything and magically turn it into "XML". It is not a marketing term or English Professor term. It is a rigidly specified engineer term for a document format, and a given document is XML if and only if it meets that format.

    If someone wants to hack together a half-assed parser or emitter of any language, they will. I've seen half-assed XML parsers, I've seen half-assed JSON parsers, I've seen half-assed HTML parsers, I've seen half-assed YAML parsers, I've seen ... you get the idea. If a standard can't solve the problem, you can't count the lack of solution against it.

  18. Re:Java and XML, bad tastes that are worse togethe by fartrader · · Score: 2, Informative

    Java is clearly moving away from the massive over-use of XML in everything from configuration to messaging. From Java 5 onwards, annotations are rapidly becoming the configuration mechanism of choice, where infrastructure configuration is placed in the source code directly, in a way thats significantly less obtrusive than writing code to manage things like persistence and transactions yourself, and significantly easier to follow than placing it in many XML files. Anyone who has migrated from EJB 2.1 to 3.0 for example should be much happier now that the various XML files needed to get it to run are going the way of the dodo. This use of annotations to replace XML is an emerging trend popular in many frameworks, from EE 5 through to Hibernate and Spring. On the messaging side there are a slew of code generation tools and XML-to-POJO (annotation-based) mappings that keep you away from raw XML - yes its another layer of abstraction but it keeps you away from the coding horrors of SAX, DOM, and yes even the comparative simplicity of JDOM.

  19. Re:Why is editing XHTML "doing it wrong"? by colmore · · Score: 2, Insightful

    xhtml is one very small dialect of xml.

    when you are entering html style markup tags, you are using xml. but xml is a much much larger subject than that. hand editing a website is fine. (if the documents are getting huge, it should be split into smaller files and automated somehow, anyway) hand editing, say, Open Office's xml format or any of the fairly arcane XMl formats used for interprocess communication.

    XML is sort of designed to be the second best data format for any application. There are a lot of times when something like /etc/passwd is more legible and appropriate. And there are times when the volume of data requires binary. XML is good because it is widely known and when the originating application is lost, the data can still be (with moderate difficulty) understood.

    It's very similar to Java really. It got hyped for a specific web use that didn't really materialize, but it's ability to be generic, widely-spoken, and safety-checked means it has found widespread use across the entire computer industry in places that aren't quite as visible to end-users as simple web application or document formats.

    --
    In Capitalist America, bank robs you!
  20. Re:10 Years and still waiting by iamacat · · Score: 3, Insightful

    Here is another obvious rules: If a computer, at any time at all, has to parse or generate XML in large amounts, you are doing it wrong. There is really no need to resend the same string 100000 times, encode multi-megabyte binary data as BASE64 or lose floating point precision by encoding to or from strings. If need be, an efficient binary format can represent the data with an arbitrary schema. Communicating parties can exchange their schemas at runtime and avoid sending attributes that the other end is not going to use.

  21. Re:10 Years and still waiting by TheRaven64 · · Score: 4, Informative

    Does anyone still use latex2html? All of the TeX users I know who care about HTML output switched to tex4ht years ago. It produces a variety of XML formats, including XHTML (with MathML) and OpenDocument.

    --
    I am TheRaven on Soylent News
  22. Re:Regex by TheRaven64 · · Score: 5, Informative
    You fail Computer Science 101. Regular expressions are exactly as expressive as finite automata. A finite automaton is incapable of solving the matching brackets problem, since that requires a potentially infinite number of states in order to keep track of the number of open brackets in an input stream. Because of this, a regular expression can not be used to parse any XML schema that allows an arbitrary depth of nesting, since parsing such a form with would require counting the open and close tags to make sure they match, which is not possible with a regular expression.

    This is why regular expressions are typically used for lexical analysis (tokenisation) not syntactic analysis (parsing).

    --
    I am TheRaven on Soylent News
  23. Re:Regex by WilliamSChips · · Score: 2, Informative

    No, you cannot with a regex. If you can, it's not really a regex, it's something different.

    --
    Please, for the good of Humanity, vote Obama.
  24. Re:Java and XML, bad tastes that are worse togethe by CoughDropAddict · · Score: 3, Informative

    So you're the guy who shits tabs in random places in source files, because you haven't figured out how to set up your editor to show you the difference. Please stop doing that. Tabs and spaces are different characters, even if the language you're using today treats them the same. If you're a VIM user, please learn to use "list" and "listchars."

  25. Re:Regex by the-matt-mobile · · Score: 2, Insightful

    How is this insightful? Yes, from a strictly comp-sci definition of a "regular expression", you are exactly right. But this is not a comp-sci class and this is not a theory lesson! In the real world where real programmers write real (crappy) code, a parser that parses only regular languages is not very useful. All modern regex parsers handle more than just regular expressions - back referencing, depth parsing, lookahead/lookbehind are all common features of modern regex engines that violate the rules of parsing a "regular language" using a simple memory-less DFA/PDA state machine. Real regex parsers use (GASP) *memory* to do their parsing. So, while you wallow in semantics and theory, people are out there are doing real (and granted silly) things with regex parsers because they can. For the purpose of this discussion, the original poster is right that it is possible (through incredibly unholy) to determine well-formed-ness of XML via a modern regex parser even through XML is not a regular language.

  26. Re:10 Years and still waiting by David+Gerard · · Score: 2, Insightful

    It Depends. We have systems that are arranged in a long content chain. One machine sends data to the next machine, maybe by pull, maybe by push. Next machine does ... something ... with it, passes it to next machines. Maybe the developers talk to each other, or remember why their predecessor made the system do that, or maybe they don't. XML is really Just The Thing for the job. And the fact that it can be tweaked by a human (e.g. the sysadmin who has to fix a broken thing) is fantastically useful.

    --
    http://rocknerd.co.uk
  27. Re:Java and XML, bad tastes that are worse togethe by Omnifarious · · Score: 2, Insightful

    The answer to one particular parsing stupidity is not to introduce a different, altogether different set of parsing stupidities to fix it. XML is not a programming language, and making it into one is a pretty distressing and contorted thing to do.