Slashdot Mirror


Microsoft Releases 'Caller-ID For Email' Specs

gfilion writes "Microsoft has released a draft specification for Caller-ID for email, 'to address the widespread problem of domain spoofing' - the concept is similar to SPF, but is using XML. There's already an Caller-ID to SPF converter in the works. A few weeks ago, Microsoft discussed compatibility between the projects with Meng Weng Wong (SPF's project leader), but most SPF users are against using XML, so nothing has come of it thus far." We recently covered a brief article mentioning Microsoft's anti-spam work, though this is a clearer indication of their intentions. Update: 02/26 21:36 GMT by T : NewsForge is carrying a brief article with FSF counsel Eben Moglen's take on the draft; Moglen says it is "encumbered with unclear and unnecessary patent license claims."

39 of 430 comments (clear)

  1. At least by pubjames · · Score: 3, Interesting


    At least this is one area where MS will have a real problem using their monopoly to enforce a closed standard. A solution that doesn't work for people that don't use MS software just isn't going to fly.

    Having done work on (opt-in) HTML newsletters for clients, I know that email clients used are really varied - more varied than web browsers for instance.

    1. Re:At least by pubjames · · Score: 4, Interesting

      RTFA - Microsoft proposes a standard which any vendor can implement and provides a license for its use on the website describing the process. There sis nothing client specific about the implementation.

      I did read the article. But MS has a history of breaking standards to create customer "lock-in", and also trumpeting open standards when in fact what they finally implement isn't open at all (Office "XML" for example). What I'm saying is that, in this case it would be difficult for MS to do that because email client software is very varied.

    2. Re:At least by cavemanf16 · · Score: 2, Interesting

      Well, it's really called GnuPG, but you're right, it is the standard that basically states: "the sender's signing key validates against the original key you trusted by signing it with your own key." I've started signing all of my emails in Thunderbird using the help of the Enigmail plugin and encrypting any files I attach in my emails with the help of WinPT. I know this post looks like a giant plug for these "products," but since they're all free, open source software which I have no affiliation with, it's simply me trying to get the word out that there IS a manner in which to get your emails to your friends in a trusted, reliable manner, and hopefully convert a few of your friends and family to using the same method in the future. We wouldn't have to worry about address spoofing if email gpg signing was a defacto standard of every email client! Plus it would be a lot safer and difficult to circumvent (ultimately) than Yet Another Format for email.

  2. two things by WegianWarrior · · Score: 5, Interesting

    Whats to stop a spammer from signing up for a free email account with a false name, blast out a few thousand messages, drop the account (it'll be closed anyway by abuse), wipe hands and repeat?

    True, I see how this may help stop some spam, but it also means (if I understood the article correctly) that everyone can find out where I mail from... and in some instances that could be a problem too.

    --
    Everything in the world is controlled by a small, evil group to which, unfortunately, no one you know belongs.
    1. Re:two things by zero_offset · · Score: 2, Interesting
      In addition to what that other guy posted (accounts having daily limits), sending mail through those types of systems is generally just too slow to be of interest to dedicated spammers.

      A couple years ago I wrote a bunch of software for very large e-mail runs -- not spamming related, but the lists were in the high hundreds of thousands -- and to successfully blast out hundreds of thousands of e-mails in any reasonable amount of time requires quite a bit of planning, software built for that purpose (our evals showed even the well-known and venerable lsoft offerings perform abysmally for these purposes), not to mention having a fairly hefty chunk of bandwidth at your disposal.

      --

      Slashdot quality declines as the number of hot grits posts decreases. - Provolt's Law, Apr-09-2005

    2. Re:two things by kinnell · · Score: 2, Interesting
      This is why most spammers are still relying on open relays and zombie machines.

      Which begs the question, how does this solution deal with zombie machines, given that these are being used more and more to send spam? It shouldn't be too difficult to set up a trojan remailer which uses the user's email account to forward spam. Wouldn't this be declared as valid, and presumably laying the blame on the user.

      --
      If I seem short sighted, it is because I stand on the shoulders of midgets
    3. Re:two things by m00nun1t · · Score: 2, Interesting

      Maybe it's not absolutely perfect. But what protocol is? Here's a list of other protocols that have major problems:
      TCP/IP
      HTTP
      SOAP
      FTP
      SMTP

      If /. was in charge of releasing protocols, the internet would never have happened. There's always someone finding a problem. Well, guess what, there is always a problem.

      Instead of complaining, contribute, find a good place to start with and improve it over time - that is what has happened to all the above protocols.

    4. Re:two things by Snowmit · · Score: 2, Interesting

      True, I see how this may help stop some spam, but it also means (if I understood the article correctly) that everyone can find out where I mail from... and in some instances that could be a problem too.

      That's true in the real world too. They're called postmarks. You may have seen them stamped on your snail letters.

      Don't like it? The don't send email that complies with the standard and hope that the people receiving are willing to read letters from people who aren't complying. Or use a messageboard. Or a webcafe.

      --
      I have a lot of opinions about Cyborgs and Architects
    5. Re:two things by EJB · · Score: 2, Interesting

      I can see that this can cause problems as a consultant. You're connected to the network of customer A, and have to send an e-mail to customer B.

      You don't necessarily want customer B to know that you also work for customer A.

      - Erwin

    6. Re:two things by dbc · · Score: 2, Interesting

      Corporations are not going to be blocking mail based on a lack of SPF, Caller-ID, or anything.

      ??

      Why do you say that? It doesn't make sense to me. Corps large enough to have 1+ mail admins already are up to their armpits in deployed and operational spam and virus filtering tools. SPF doesn't have much downside for them, only upside. Maybe *tiny* companies where the mail server is a 1 hour a week of some programmer that has been saddled with playing net-admin during his lunch hours will be slow to get this rolling, but it seems to me that companies with actual IT staff will be pretty quick about it.

      Roving user is not an issue for big companies, since the road-warriors need to VPN into the corp net to get to the mail server anyway, so viola, they are no longer "roving" as far as SPF is concerned.

      Feel free to convince me that I'm wrong. Use data, actual experience, and facts. OK -- I realize that using any of those three is a risk to one's karma. Post AC, if you need to :-)

  3. MSXML experience by RobertB-DC · · Score: 3, Interesting

    I've had the unfortunate experience of attempting to generate XML using Microsoft's MSXML object. What a piece of crap! In an attempt to completely abstract the format, the objects are obfuscated beyond reason. Even the simplest things require ridiculous complexity: just to escape-out special characters requires instantiating a new "entity" element in the middle of the text string element.

    And I still haven't figured out how to make the thing give me a CRLF at the end of each element. No, XML doesn't require the whitespace, but it would have sure made it easier for my clients to read the file!

    But the worst part is that I *succeeded* in using MSXML. Now, if I wanted to go back to just writing a text file (which I do!), I can't -- my code is tangled up in the objects to the point that it would take a complete rewrite.

    That's the simple reason why, every time I hear about Microsoft doing something with XML -- like this proposal to use XML as part of email identification -- I cringe in ph33r.

    --
    Stressed? Me? Of course not. Stress is what a rubber band feels before it breaks, silly.
    1. Re:MSXML experience by chrisbtoo · · Score: 4, Interesting

      And I still haven't figured out how to make the thing give me a CRLF at the end of each element. No, XML doesn't require the whitespace, but it would have sure made it easier for my clients to read the file!

      Tell me about it. My favourite part is when you try to load one of their MSXML-generated files into their Visual C++ 6.0 product and it bitches about lines being greater than 2048 characters long and how it's going to shove random line breaks in the middle of tags.

      Thanks, MS!

      --
      Registering accounts later than some other chrisb since 1997
    2. Re:MSXML experience by Cereal+Box · · Score: 2, Interesting

      What I meant was that every decent XML parser requires you to handle the XML tree in some manner other than messing with raw text, like the original poster seems to think the optimal way to do things. SAX or DOM -- either way you're going to have to deal with all sorts of objects representing things like nodes, text, etc.

  4. Re:Why not? by leerpm · · Score: 2, Interesting

    Why not have *real* caller-ID for email authentication? Before you can get on my white-list, you have to call a phone number for some sort of challenge-response

    So every person that wants to email you, now has the added burden of phoning some system and following the voice menu options? I think that most people will simply not bother and won't send the email at all.

    Email is a great tool and easy to use. Even existing challenge-response systems have been found to have many problems. Let's not ruin email, by taking away the best parts of it. Any authentication needs to be seamless and the details should be hidden from end-users.

  5. Re:Imagine when Hotmail gets this by liquid-groove · · Score: 2, Interesting

    As part of an overall spam identification and scoring system, the MS standard and the Yahoo proposed standard are both interesting pieces of the puzzle. They are hardly solutions to the spam problem in and of themselves and unilateral implementation of either protocol as an absolute requirement for acceptance of incomining communication by either Hotmail or Yahoo would likely be met with a varacious subscriber backlash which would result in decision being revered within hours.

  6. Because it would not work... by Matthias+Wiesmann · · Score: 4, Interesting
    Why not have *real* caller-ID for email authentication? Before you can get on my white-list, you have to call a phone number for some sort of challenge-response. Caller-ID could be part of this.
    I really don't see the point of including the phone in the system. Processing voice calls is complex and expensive and has no advantage over online processing. Either the thing is done manually, and would be damn expensive, or it is automated and would have no advantage over doing it over ip.

    Did you consider that e-mail are used outside the US? I am certainly not going to pay a trans-atlantic call each time I want to send an e-mail to a new guy in the US. What about people that don't speak English? What about people who don't have a phone, or don't have a number on a system that supports caller id? With the advent of IP phones, this would become more and more common.

  7. Spoofing SPF? by mmerlin · · Score: 3, Interesting

    I guess the Joe-Jobbers will be hard at work trying to find all the ways of spoofing SPF.

    Zombie writers will be in even greater demand from the spam factories.

    Apart from spammers using zombified users email accounts, are there any other possible ways around SPF?

    Having read the executive summary and skimmed a few pages, the general precepts make sense.

    At the very least, the transitional phase of mass implementation of SASL or similar (which IMO should be mandatory for mail servers anyway) is a Good_Thing_(tm)

    Granted it will take a lot of time and effort for the second phase to be reached, but anything which cuts down on spam gets my vote!

    --

    smile, it makes everyone else wonder what you're up to :-)
  8. microsoft.com already doing this by ergonal · · Score: 4, Interesting

    Not sure if this is mentioned in the .doc, but _ep.microsoft.com already appears to be doing this:

    _ep.microsoft.com. 1H IN TXT "<ep xmlns='http://ms.net/1' testing='true'><out><m>" "<mx/><a>213.199.128.160</a><a>213.199.128.145</a> <a>207.46.71.29</a><a>194.121.59.20</a><a>157.60.2 16.10</a><a>131.107.3.116</a><a>131.107.3.117</a>< a>131.107.3.100</a>" "</m></out></ep>"

  9. Good idea by broothal · · Score: 4, Interesting

    This is a good idea, and we (tinw) has discussed this many times before, and various implementations already exists (that is - verifying the sender domain, not the specific MS implementation).

    Now, what bothers me is this line:

    Microsoft believes that it has patent rights (patent(s) and/or pending applications(s))

    Given the latest stories on how easy it is to patent everything "over there", I am pretty sure MS is granted this patent. Now I don't know about you, but this geek ain't licensing nothing from MS.

  10. Damn advertising-like clause again by rjw57 · · Score: 4, Interesting

    In the license Microsoft grant implementers there is the following nasty clause:

    If you distribute, license or sell a Licensed Implementation, this license is conditioned upon you requiring that the following notice be prominently displayed in all copies and derivative works of your source code and in copies of the documentation and licenses associated with your Licensed Implementation:
    "This product may incorporate intellectual property owned by Microsoft Corporation. If you would like a license from Microsoft, you need to contact Microsoft directly."


    Isn't this incompatible with the GPL?

    --
    Rich
  11. Re:If Microsoft cared about SPAM... by no+soup+for+you · · Score: 3, Interesting
    If Microsoft cared about SPAM......allowed a user to disable a the javascript popup function in the browser

    I think that's a pretty expansive definition of SPAM. Does everything annoying become SPAM? I see popups as advertising (and something that mozilla effectively killed for me), and SPAM as fraud.

    --
    If you blog it...
  12. Re:XML... in its place. by Tinidril · · Score: 2, Interesting

    Sorry, I don't care what tools are available, parsing a comma delimited file when the records are reasonably simple in structure will always be easier. XML is really only usefull when the data resists structure.

    Documents are really the only place where I can see XML adding any benifit. ( Unless more bits in the stream are considered benifit. )

    --
    XML is the best data format; unless your data needs to be read or written by a human or a computer.
  13. Re:XML... in its place. by wfberg · · Score: 5, Interesting

    Sort of. You don't REALLY need a DTD - you only need one if you are validating the XML. XML can still be used as a generic ad-hoc hierarchical data format... of course you'd only want to do so because by now XML parsers are pretty ubiquitous and it makes it as good a choice as P-lists, or any other ad-hoc format.

    Assuming you don't have a DTD, you don't have a specification of what's in the files syntactically, let alone semantically. Maybe you can reverse engineer most of this (the tag "name" is likely to contain a name, etc.) but there will always be freakish exceptions and ambiguities that even DTDs and XML-Schemas don't address.

    And the overhead of using XML is enormous.. All those possible encodings, character sets, namespaces, etc. S-expressions are really much, much nicer is you just want to parse without a formal syntax specification. And they've been around "forever".

    Most irksome though, are so-called "XML databases".. Argh! I suppose the people who think that's a good idea also love "CSV databases" or "XLS databases"..

    --
    SCO employee? Check out the bounty
  14. Re:MSXML by pandrijeczko · · Score: 3, Interesting
    To be perfectly honest, if MS used their own proprietary XML extensions, I don't see how it would work anyway.

    It's a fact of life that MS Exchange lives in corporate environments but ISPs and everyone use sendmail (or a sendmail derivative) for mail routing over the Internet.

    It's actually in MS's interests to work with sendmail on an open protocol to do spam filtering properly (whatever that protocol is ultimately).

    Remember that TCP/IP is an open standard and MS supports TCP/IP open protocols like FTP, HTTP, POP3, SMTP, etc. already in their products so this is no different.

    --
    Gentoo Linux - another day, another USE flag.
  15. We can stop Zombies too... by aug24 · · Score: 2, Interesting
    Take a look at the spf faq, section starting "What about the cracked, open-proxy DSL machines that are spam sources today?"

    The skinny is: while spf on its own can't do prevent zombies from sending mail, if the upstream host routes port 25 through its own servers it can control this.

    For example, my upstream hosts, Nildram, block all port 25 traffic outbound and inbound unless and until they have checked your (static) ip for open-relay-ness and then put you on a whitelist.

    If all ISPs were like that, and spf were to become widely adopted, spam would be toast.

    J.

    --
    You're only jealous cos the little penguins are talking to me.
  16. Re:Why we shouldn't use XML here... by doofusclam · · Score: 5, Interesting

    Oh Pleeeeeze yourself.

    I ain't bashing Microsoft and I don't spell it with a '$' either. I've spent the last 14 years programming using their tools and operating systems, so quit with thinking i'm an OSS zealot.

    So read my comment again - i'm not bashing them, and at least they're doing something about spam. But for such a simple datastream, with the throughput needed, it seems unnecessary to bloat it (cpu and memory wise) by having to use an XML parser, regardless of which evil/non evil company designed it.

    Would YOU like your mail to be delayed because some bright spark decided to go all trendy and use XML in the mail processing rather than something which just does the job?

  17. Summary by dskoll · · Score: 5, Interesting

    Basically, it's a very poor re-implementation of SPF, with all of SPF's disadvantages and none of its advantages.

    Under the MSFT scheme, the TXT records are verbose, likely requiring several records where SPF will probably fit in one. They have a hare-brained scheme to parse Received: headers to get around certain problems. Their scheme is absurdly complex.

    And neither SPF nor MSFT's scheme do anything about spam coming from <>, cracked Windoze machines, or "valid" throwaway accounts. They also make forwarding more difficult than it should be.

  18. Website about anti-spam standards by taubz · · Score: 1, Interesting

    Is there a website out there that tracks the different technological solutions to spam, with pro/con explanations?

  19. MS 1, SPF 0 by TA · · Score: 2, Interesting

    Wow. I looked at MS' proposal as well as SPF's, and darn if MS didn't do much better.
    First: SPF's webpage is mostly slogans about how it makes the world better, but you have to dig around a lot to find out how their scheme works. Mostly you'll just find more of the same self-hugging and no real technical info.
    Secondly: MS' scheme seems simple enough, just one addition to DNS (list those mailservers allowed to send mails from your domain), and a very nice, standard-compliant way of handling the mobile-user problem:
    If you're away from home and you're sending from your name12@somefreemail.com account, and you want your From: line to be your standard Me.Myself@my-own-domain.cx, whatever actual account you're sending through, then just make sure that your Sender: is name12@somefreemail.com and you're set. This is a nice alternative if you can't list your freemail ISP's mailserver in your DNS (maybe you don't know its IP address, or it's changing all the time).
    Maybe SPF's scheme is similar, but they sure didn't mention any Sender: header there. Seemed to be some home-cooked up non-standard header, and a lot of talking about forwarding not working etc.
    The only thing I didn't like with MS' scheme is the XML thing, why would you want to put XML in your DNS records? Nothing else in DNS is XML. Oh well.

  20. Poor Name by BeBoxer · · Score: 2, Interesting

    Given the effectiveness of caller-id when it comes to the spammers of the phone world, I don't think it's the best model. Basically, caller-id allows anybody who has a PBX connected with digital trunks to the network to forge whatever caller-id information they want. Most telemarketers left it blank. Lots of legit companies send the id information for their main switchboard number, no matter what actual phone line the call is travelling down.

  21. Re:What is a PGP signature? by Greyfox · · Score: 2, Interesting
    That's what I'd do, but it still requires you to get and process an E-mail before you decide if it's rejected or not. Maybe combined with some other solution to reduce the network load it'd provide you a 100% effective filter.

    I haven't seen a mail filter that will bounce E-Mails based on whether or not they're encrypted to your obnoxiously large PGP key that takes 30 seconds to encrypt to on a 2GhZ pentium or signed by someone on your whitelist. I suppose one could be written...

    --

    I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

  22. Re:Pure FUD by swillden · · Score: 3, Interesting

    There is something called copyright law. Microsoft or any other company cannot just go and resell your software on their own terms.

    Unless you grant them a license.

    Which appears to be precisely what their license requires you to do. It's not clear to me precisely what you're licensing to them, maybe it's just any patents you hold on the techniques used, but it doesn't say that. What it says is that you grant them an unlimited license to "make, use, sell, offer to sell, import, and otherwise distribute Licensed Implementations", which certainly sounds like you're giving them permission to do what they like with your software.

    I may be misreading this, but that's what the plain language seems to say. I'd want to get a legal opinion before I'd interpret it any other way.

    --
    Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
  23. And that is why by mdfst13 · · Score: 4, Interesting

    And that is why Microsoft is using it I'm sure. They have a bunch of nice GUI tools that parse XML, so anything they do now has to be XML.

    It's the same as the way they do email. If I switch to source edit view, my simple text message (e.g. Got It.) balloons into ten lines of generated HTML gobbledygook. Yes, I really need to specify the font for *each* line...even the ones that are blank.

    I really hope that the standard is not set by MS. Something very simple (this is who can transmit for this domain) could turn into something ugly. I can write SPF declarations by hand. Chances are that their XML declarations will be twenty times as long and will need tools to create them. Yes, the XML parsing tools are ubiquitous, but a simple format doesn't require a parsing interface to feed you info. I see no reason not to make a human readable interface.

  24. Funny licence. by rew · · Score: 2, Interesting

    from my understanding of the licence: If I want to implement a compliant implementation, I can go right ahead. (as long as I promise not to bother MS about patents that I might own on this technology).

    If I then sell or distribute the software I wrote: Fine.

    You however get to pay MicroSoft to use my software.

    Oh, and they've included a GPL incompatible advertizing clause.

  25. callerid_email.doc is an abomination of verbosity by max+born · · Score: 1, Interesting

    Doesn't sendmail already have a similar feature turned on by default? You have to explicitly enable "accept_unresolvable_domains" in your sendmail.mc file or mail from servers with no reverse lookups will be rejected.

    According to

    bash# for x in $(antiword callerid_email.doc); do echo $x; done|wc -l

    this is a thirteen thousand word document.

    Can someone explain in a sentence or two what's different about what MS is proposing and what sendmial already offers?

  26. Re:Email standard proposal by dangermouse · · Score: 2, Interesting
    For god's sake, go take a networking class.

    MIME isn't damage, MIME is a hack to fix the crippled SMTP message format. Maybe you are only interested in sending ASCII text messages-- and that's very hardcore of you and all-- but the rest of the world is interested in sending pictures, documents, text in languages other than English (well over a hundred, and you're fucking well right the "Standard" should support them), etc., and your underdeveloped message format just can't properly deal on its own. Maybe you should read up on the subject.

    Text itself isn't a drawback-- XML is generally represented as text-- but a message format that is defined only for transmitting text just doesn't cut it now that we're out of Green Terminal Land and into the World Where People Use Computers to Do Stuff.

    And you're missing the point of my remark about XML libraries. The problem is not that parsing email is hard, but that there's no standard for an internal representation of an email message, and if there was it would probably be completely non-interoperable with the rest of the world. XML has the DOM and SAX, among others. This means a whole world of functionality, in the form of libraries and technologies that understand XML via DOM or SAX, is available to the program author. You can transform the message into another format using XSLT, access and modify the message content and headers with XPointer, find references to and merge in external resources with XInclude, extend the message format using namespaces (thereby allowing anyone who doesn't care about your extension to safely ignore it), transform the message (with XSLT) into XHTML and provide rich formatting with CSS (both of which can be found in reusable libraries), and so on and so forth.

    You use XML, you get all of the above essentially for free. You go with some application-specific grammar, and you can either limit your email to plaintext or you can reinvent all of those wheels. But I know how much you reet haxorz hate usability and interoperability... maybe we can hook you all up with some nice teletypes.

  27. Re:XML... in its place. by Tinidril · · Score: 2, Interesting

    Yes, I have worked with real data. Why is it that so many people on slashdot assume that if someone disagrees with them that they must be ignorant?

    By moving from comma-delimited to XML you don't solve the problem, you just move it. What happens if someone includes text in a record that just happens to close your field? I know there are answers to that, but they are not very different from those with comma separated lists.

    BTW: To my knowledge Microsoft is the only developer brain-dead enough to try and solve the comma-in-a-field problem with quotes around the entry. But then again they are the ones who are trying to use XML for everything now, so I guess it fits.

    The correct way to do it is escape them with slashes, which is way less complicated than you make it sound.

    ',' becomes '/,'
    '/' becomes '//'
    NEWLINE becomes '/n'

    Thats it! Any other escape sequences would just be for added human readability, and would be needed in XML for the same purpose.

    Your comments really underscore my problem with XML. It claims to fix many problems, but in fact it just makes them more opaque. (Much like OOP, but thats another matter.)

    At least you stayed away from the idiotic notion that I always hear about XML providing a standard format for structuring data. In reality it is no more standard than plain text. Which of these is correct?

    <LUSER><UID>12<UID><NAME>Biff</NAME></LUSER>
    <LUSER UID="12"><NAME>Biff</NAME></LUSER&g t;
    <LUSER UID="12"><NOMBRE>Biff</NOMBRE></LUSER>

    And Isn't this easier to read?

    LUSER,12,Biff

    IMHO: XML is excelent in a DocBook like implementation where the data will not fit into a clean record structure, but for all other implementations that I have seen it is snakeoil. It's more dificult for humans, more dificult for machines, and claims to fix a lot of problems that it just sweeps under the rug.

    BTW: I manage a data retention system (not a relational database) that stores about 50GB/day and has to be kept on local storage for a full month. The data is replicated between two remote locations and backed up daily. If I had to move the data from comma-delimited to XML, our costs would more than double for bandwidth, storage, and labor (switching tapes). That doesn't even include the extra processing that would need to be done to reference the data. I'm not sure my boss would call that "a few bytes".

    --
    XML is the best data format; unless your data needs to be read or written by a human or a computer.
  28. Re:XML... in its place. by jonadab · · Score: 2, Interesting

    Wow, that was weird. It looked fine in preview. Let's try this again...

    The difference is that XML-handling libraries all handle this automagically
    (usually by encoding angle brackets within text data). Yes, it's possible
    to have a library that does other escaping schemes automatically, but
    there's still the issue of human-readability...

    > <LUSER><UID>12<UID><NAME>Biff</NAME></LUSER>
    > <LUSER UID="12"><NAME>Biff</NAME></LUSER&g t;

    These will parse out to the same thing. And yes, if the records are all this
    simple, and all *the same*, XML is unnecessary. But the minute the records
    get even remotely complex, especially if some of the records have bits of
    information that other records don't have, the human readability gets lost
    in a sea of stuff like

    LUSER 7125,Johnson,Biff,G.,,Jr.,,462-3203,44833
    LUSER 6784,Johnston,Maria,,Taylor,,,468-1708,44833

    Then you need a better structure than CSV. Is XML the only option? No.
    But XML has the advantage of being fairly intuitive and strongly resembling
    something (HTML) that everyone and his dog (thinks he sort of) knows.

    <luser id="7125" zip="44833">
    <name last="Johnson" first="Biff" middle="G." suffix="Jr.">
    <phone>462-3203</phone></luser>
    <luser id="6784" zip="44833">
    <name last="Johnston" first="Maria" maiden="Taylor">
    <phone>468-1708</phone></luser>

    Yeah, it's longer. One order of magnitude longer than the CSV, not so much
    longer than some of the other options. In many circumstances, the extra
    length is a good tradeoff. I don't understand the desire to bash XML every
    time it comes up, just because it's a buzzword. Sure, it's a buzzword, and
    using XML doesn't really add inherent value, but it doesn't detract, either.
    I still maintain, it's a perfectly valid choice.

    > BTW: I manage [stuff]

    That's nice. I'm TCG at a public library. We work daily (as does every
    library) with a format called "MARC Records", and let me tell you, XML
    looks mighty attractive.

    --
    Cut that out, or I will ship you to Norilsk in a box.
  29. Re:XML... in its place. by Tinidril · · Score: 2, Interesting

    The difference is that XML-handling libraries all handle this automagically

    How is that different? I could write a library to parse a CSV in about 10 minutes. Oh wait that is different. How long does it take to write a decent XML library? How many lines create how many bugs?

    I think your points are fair, and not knowing anything about "MARC Records" I can't really comment on how XML would work for it.

    I believe that there are good aplications for XML, but my reaction to it comes from the fact that people try to apply it in all sorts of places where it doesn't belong. (Like in a network protocol to validate emails) Bad programing bothers me because it makes bad programs that I may be forced to use at work. If this takes hold I will end up involved in tracking down email problems, and instead of being able to use a simple split command to break down the data I'll have to deal with mountains of useless tags.

    My favorite mis-application of XML was made by Cisco for a network load-balancing device. They built an XML interface for bringing servers in and out of rotation, and it was the only way to automate the process. It never worked right, and even there own tools could never do the job reliably. I don't know how many hours I spent pulling my hair out on that one. A high-school kid should have been able to write that interface in 10 minutes, but using XML it was a nightmare.

    We're probably closer in our thinking than our posts let on. I still don't see a single problem that XML solves for structured data, but for documents it has no equal. In the real world I'm sure there are all sorts of places where the line between structured data and document data is blury.

    BTW: I love your "Lamejoke Generator".

    --
    XML is the best data format; unless your data needs to be read or written by a human or a computer.