Slashdot Mirror


Tim Bray on Microsoft Office

jgeelan writes "The co-inventor of XML, Tim Bray, has been talking about the newly XML-enabled version of Microsoft Office, code-named 'Office 11' and tells XML-Journal that 'when the huge universe of MS Office documents becomes available for processing by any programmer with a Perl script and a bit of intelligence, all sorts of wonderful new things can be invented that you and I can't imagine.'"

232 of 495 comments (clear)

  1. Yay Evil Monopoly Of Doom! by Sneftel · · Score: 3, Interesting

    Wow, I was way off when I predicted that Microsoft would further obfuscate their Word format. This seems to be in all respects a Good Thing.

    StarOffice has used XML for their native file formats for some time now; I wonder if this means we'll see an even better-quality translator between the two formats?

    --
    The opinions stated herein do not necessarily represent those of anybody at all. Deal with it.
    1. Re:Yay Evil Monopoly Of Doom! by OrangeSpyderMan · · Score: 3, Insightful

      Wow, I was way off when I predicted that Microsoft would further obfuscate their Word format.

      They won't have to. Since they are going the SQL server way for their filesystem, they can happily give away the hold they have on file formats, since they are going to have a stranglehold on accessing those files. You want an open file system? Here you go (and MS has a lot to gain by doing this - they instantly give Word access to most other data formats) - but don't think anything other than a microsoft OS will actually be able to access the files - thanks to our new deliciously obfuscated method of storing data on a disk. Reverse engineering kernel level SQL data (how a bit of crypto, for DRM of course, thrown in) will probably be even harder than reverse engineering file formats was. And impossible to do legally (say hi to all those DMCA guys out there.)

      --
      Try NetBSD... safe,straightforward,useful.
    2. Re:Yay Evil Monopoly Of Doom! by Jeremiah+Cornelius · · Score: 4, Insightful
      I don't beleive any of this crap is goingto happen from MS. Not for a New York second.

      You'll be DMCA'd out of the loop for trying, and the format will validate itself with 'Palladium' features in software, or some such.

      However, the mind reels at the idea of managing PowerPoint and Excel files from emacs!

      --
      "Flyin' in just a sweet place,
      Never been known to fail..."
    3. Re:Yay Evil Monopoly Of Doom! by tonywestonuk · · Score: 5, Insightful

      So, what happens when somone want's to email an XML enabled Word document...... Does it somhow become encrypted on its way out of the database, remains scrambled on it's way over the internet, and reassembles itself into nice XML once it arrives on the recepients computer?.... Doesn't sound like XML to me?!

    4. Re:Yay Evil Monopoly Of Doom! by jsse · · Score: 5, Funny

      I don't beleive any of this crap is goingto happen from MS. Not for a New York second.

      Dark-masked B.Gates approaching you:
      "I find your lack of faith....disturbing."

    5. Re:Yay Evil Monopoly Of Doom! by DNS-and-BIND · · Score: 2, Troll

      They'll simply add "features" to XML, enabling a Microsoft extension of the standard. The new MSXML will by copyrighted by Microsoft.

      --
      Shutting down free speech with violence isn't fighting fascism. It IS fascism!
    6. Re:Yay Evil Monopoly Of Doom! by thelen · · Score: 5, Insightful

      Okay, so it'll be harder to mount a windows partition effectively, but this doesn't affect transmission of documents, especially if they're stored in an XML format. As for me, I think it's more valuable to have files that I can read outside of their native filesystem rather than have a readable filesystem filled with unreadable files.

    7. Re:Yay Evil Monopoly Of Doom! by passthecrackpipe · · Score: 5, Insightful
      No you were not. MS routinely uses XML to encapsulate (proprietary) binary data. In the case of the MSOffice file format, this is especially true, but to a lesser extent this also goes for stuff like BizTalk etc (that has a terrible license attached to it). If Ms is *really* serious about using open formats, and using XML in their Office suite, they should put their money where their mouth is and join in the OpenOffice File format project. Most of the opensource players are working their already, and the EU is also set to join. I assure you that mature participation of Microsoft would be very welcome.

      Of course, this will never happen. Instead, MS will continue to push their own "open" XML based file formats. Microsoft Kerberos, anyone?

      --
      People who think they know everything are a great annoyance to those of us who do.
    8. Re:Yay Evil Monopoly Of Doom! by Anonymous Coward · · Score: 2, Insightful

      No, and 8bit binary files .DOC files don't just become scrambled either.

      XML don't make things easier to parse, you still have to figure out what means, just as you would have to figure out 04 07 in a binary file.

    9. Re:Yay Evil Monopoly Of Doom! by JohnFluxx · · Score: 2, Informative

      XML does support encryption of its data...

    10. Re:Yay Evil Monopoly Of Doom! by OrangeSpyderMan · · Score: 3, Interesting

      It will indeed be harder to mount the partition. It may also be harder to use that XML data, since what we may be talking about is XML encapsulation of binary, proprietary, encrypted file formats. Don't necessarily think you're going to receive at the other end a plaintext file with a few tags - what you will receive will have been through a closed kernel "request" to an encrypted database "filesystem", a proprietary DRM system (hardware and software) - and you genuinely believe there just gonna bang it out as plaintext at the other end?

      --
      Try NetBSD... safe,straightforward,useful.
    11. Re:Yay Evil Monopoly Of Doom! by vandemar · · Score: 2
      Dark-masked B.Gates

      Also known as Darth Fences of Microsith, sworn enemy of the Jedix, and commander of the NTie fighter squadrons.

    12. Re:Yay Evil Monopoly Of Doom! by Perl-Pusher · · Score: 3, Insightful
      There is also the fact that microsoft loves to put stuff in their Eula. I can also imagine anyone producing a reader for the "encrypted XML" running afoul of the DCMA.

      "Doesn't sound like XML to me?!"

      Sure it is! It's XML with Microsoft Security Extensions!

    13. Re:Yay Evil Monopoly Of Doom! by OrangeSpyderMan · · Score: 2

      Indeed, but then neither were OSes, word-processors, spreadsheets, browsers... etc Never stopped Microsoft trying to bolt down their implementations of each one as hard as they can. The problem isn't with "dbfs" - it's a damn good idea, it just that it also happens to be a damn good way of obfuscating the data if you don't want to play fair, and past experience has proved that MS don't. I am more than willing to be proven wrong.

      --
      Try NetBSD... safe,straightforward,useful.
    14. Re:Yay Evil Monopoly Of Doom! by passthecrackpipe · · Score: 2, Funny

      OH NO!!!! An anonymous poster, who I don't know and will never meet has decided to NOT pledge his/her alliance with me on an Internet-based forum. I am shocked! My self-esteem has plummeted! How will I survive this massive blow to my ego!?!

      --
      People who think they know everything are a great annoyance to those of us who do.
    15. Re:Yay Evil Monopoly Of Doom! by Abreu · · Score: 2

      There is already a msxml.dll file in windows... It is used by Internet Explorer to parse xslt documents in a non-standard way, driving me nuts ...and forcing me (again) to program websites for two (sometimes three) platforms instead of one.

      --
      No sig for the moment.
    16. Re:Yay Evil Monopoly Of Doom! by OrangeSpyderMan · · Score: 2

      What makes you think that a file system will just support SQL request from a client. Because it uses SQL engine? Christ you guys have learnt nothing from MS's behaviour over the last 10 years. At what stage have they ever modified anything to help 3rd party products integrate? One example, anyone? You think all that data is just gonna be sitting there, freely accessible to anyone who can write an SQL request? Palladium, anyone? The DMCA was designed from the ground up to make it worthwhile for large companies to implement encryption and DRM. What you describe is not a 'dbfs' but a database, and for it to work that way, you'd still need a filesystem to organise data physically on the disk. That is not what this is about - this is about using a db engine (not a db) to organise and access the physical data on the disk. This means the db ain't just gonna be a db, but it's going to have to implement whackloads of fs stuff too, just to write the data to the disk, and the chances of MS not taking this opportunity to obfuscate data are very very slim indeed.

      --
      Try NetBSD... safe,straightforward,useful.
    17. Re:Yay Evil Monopoly Of Doom! by donutello · · Score: 5, Informative

      What a bunch of pseudo-technical garbage!

      I have a Masters in Computer Science with a focus on databases and storage technology and very little of what you said makes any sense to me. There's nothing easier than getting at data stored in SQL. Where I work, we've shipped a few products where we didn't document the schema because it was too complex and we didn't feel we could support it. Within weeks, almost all of our major customrs had it reverse-engineered anyway. SQL is very easy to get at!

      kernel level SQL data

      There's no such thing. SQL data is stored in tables. You use queries to get at it. Period.

      Also, your story doesn't make any sense. The article says Office 11 is in Beta already. IIRC, the SQL Server and Palladium stuff in the OS doesn't come until Longhorn. Do you think they will actually release a version of Office which won't work until their next OS (who knows when that will be) is released and adopted? How will they make money off all the people who recently upgraded to Windows XP then?

      --
      Mmmm.. Donuts
    18. Re:Yay Evil Monopoly Of Doom! by FatherOfONe · · Score: 4, Insightful

      Not that I totally disagree with your point, but with ".Net" people will be discouraged, or it will be far more difficult to send the actual document. My guess is that some future version of Office will default to "Send the shortcut".

      Now they of course will change Office for the Mac to read from those servers... The data WILL be stored in XML on those servers, so coders will have an easy time with it.

      You bring up an interesting point about paranoid people and Microsoft. I have followed Microsoft fairly closely over the last ~18 years and feel comfortable saying that they have never worked with any "standard" out there. They have ALLWAYS developed their own. Can you name an example of any "standard" software technology they have adopted and not changed? A perfect example of this would be ZIP. Why doesn't Microsoft use it instead of CAB files? There are many many more I could use as examples if you would like.

      Microsoft has an internal saying "If it is not ours destroy it".

      My point is this. A company that has for 18 years been trying to lock people in to their technology, will cause some people to be a bit paranoid.

      --
      The more I learn about science, the more my faith in God increases.
    19. Re:Yay Evil Monopoly Of Doom! by sketerpot · · Score: 2, Insightful
      Sure it is! It's XML with Microsoft Security Extensions!

      That reminds me of something that MS has been doing for quite a while now: the file type reported for any HTML files is "Microsoft HTML file" (your system may vary). Will XML become Microsoft XML? I hope not.

      If everything about this really is kosher, though, then everybody give a great big "Thank You!" to MS!

    20. Re:Yay Evil Monopoly Of Doom! by spitzak · · Score: 3, Insightful
      why would Microsoft bother creating an XML file format if it was just an "encapsulation of binary, proprietary, encrypted
      file formats"? What would be the point? A PR move to say that they use XML?


      YES! Now you are starting to get it!

      I can't think of any reason to adopt an XML format if it wasn't at
      least a little more open then the binary file formats they've been using.


      How about for a "PR move to say they use XML". In addition it is obvious how to make an XML that is exactly as obscure, by putting the entire contents of the old format into a binary block.

      Also, how would a "binary, proprietary, encrypted file format" fit into everything else Microsoft is doing with .NET? Wouldn't Microsoft
      want the content of a document to be open enough so that it could be read and processed by applications using .NET's XML libraries?


      No, of course not. You would only read Word documents with the special "read a Word document" interface. It might use the XML libraries underneath, but big deal. Be assurred you will be unable to reconstruct all the contents of the document by any kind of perverted arrangement of calls to the "read a Word document interface". (though not just a complaint abount MicroSoft, I think .NET, DCOM, CORBA, KCOP, etc all pervert the idea of "object orientation" by making elaborate communcation protocols which are only "object oriented" because they call some part of the protocol an "object". Real object-orientation means there is some commonality of functionality, and the only instances I can think of that really work are the original Unix where everything known then (terminals, printers, tapes, disks) used the same read/write/seek calls, and Plan9 which tries to extend this to networks and file systems).

      Explain to me why Microsoft would want to prevent you from sending your self-generated Word documents to another computer? What possible sense does this make? Is it because they hate their customers and want to piss them off so they won't use Microsoft products any more? Has RedHat paid Microsoft to include technology that will piss off all Windows users?

      Ha ha, very funny. Of course you will be able to send a Word document to another computer. It will still be an unreadable Word document. If they can obfuscate things so that the destination computer also has to be running Windows, all the better. You seem to be under the weird delusion that "other computer" meant "other computer running Windows" when in fact I'm sure every other poster here knew it meant the exact opposite, ie "other computer not controlled by MicroSoft".

    21. Re:Yay Evil Monopoly Of Doom! by Anonymous Coward · · Score: 2, Insightful

      No, of course not. You would only read Word documents with the special "read a Word document" interface. It might use the XML libraries underneath, but big deal. Be assurred you will be unable to reconstruct all the contents of the document by any kind of perverted arrangement of calls to the "read a Word document interface".

      I think you are getting near the point.

      A big problem with MS Office right now is that the file formats are such a mess that Nobody can parse the documents without MS Office, and that includes Microsoft.

      If MS wants to get into the content/groupware market, they NEED server-side processing that doesn't rely on running a single-threaded 15MB WINWORD.EXE process.

      Using an XML format allows Microsoft to build a clean C# component implementation of "Read a Word Document", or a "Save SQL Server Data As Excel" without being fucked by their own file formats.

    22. Re:Yay Evil Monopoly Of Doom! by Abreu · · Score: 2

      Thanks for the info, I will investigate

      --
      No sig for the moment.
    23. Re:Yay Evil Monopoly Of Doom! by pizza_milkshake · · Score: 2

      I agree that MS will find some way to make it just as hard, of not harder, for non-MS apps to read from and write to Word docs. I was actually thinking they'd use lawyers -- they'll copyright or trademark the format (they've talked about doing this in regards to .NET services) and try to sue anyone that writes software that uses it. My $.02

    24. Re:Yay Evil Monopoly Of Doom! by spitzak · · Score: 2
      I would agree that their current format is too much of a mess. However I suspect they will not learn from this and will make a .NET interface rather than publishing low-level details of the file format. They will not do this because of some evil plan, but because of an actual misguided belief that they are making things easier with the high-level interface. Eventually the implementation in .NET will become such a mess that they will have to replace it again.

      I do find it shocking how myopic the MicroSoft defenders who post here are. They are convinced that a .NET or VB interface that runs only on Windows somehow makes the file format "open" and thus fail to see anything wrong with these .NET solutions. Sorry, if that was true then the fact that you can run Word on Windows and read the file would also define it as "open".

      "open" means I can interpret the bits without any proprietary software. If they want to provide some convinience routines to make it easier, that is fine, but there should not be a requirement to use these routines.

    25. Re:Yay Evil Monopoly Of Doom! by lostchicken · · Score: 3, Insightful

      XML can be whatever you want it to. XML does have standarads, but just standards for wrapping data with control codes, not what the control codes mean.

      While StarOffice may use an XML word processing format, it won't be what MSFT will use.

      --
      -twb
    26. Re:Yay Evil Monopoly Of Doom! by hondo77 · · Score: 2

      Who said Microsoft is interested in using open formats? They're interested in using XML--not necessarily the same thing. What business reason is there for Microsoft to join the OpenOffice Source Project? They're the market leader, everybody else has to worry about working with them, not the other way around.

      --
      I live ze unknown. I love ze unknown. I am ze unknown.
    27. Re:Yay Evil Monopoly Of Doom! by thelen · · Score: 2

      .NET, DCOM, CORBA, KCOP, etc all pervert the idea of "object orientation" by making elaborate communcation protocols which are only "object oriented" because they call some part of the protocol an "object". Real object-orientation means there is some commonality of functionality, and the only instances I can think of that really work are the original Unix where everything known then (terminals, printers, tapes, disks) used the same read/write/seek calls, and Plan9 which tries to extend this to networks and file systems).

      I believe you mean "real-object orientation" not "real object-orientation". Object-oriented programming is characterized principally by the conjunction of data and behavior, which the protocols you denigrate adhere to rigourously. In contrast, your idea of object orientation appears to mean a unified manner of interacting with real objects. But grouping common functions is merely sane programming, not a particular paradigm, and certainly not what the rest of the world means by OOP.

    28. Re:Yay Evil Monopoly Of Doom! by jimbolaya · · Score: 2
      Let's not get paranoid here! Regardless of how the low-level bit-by-bit format of the disk is set up, you can be sure that Microsoft will provide a library for, hmm, reading files off the disk. Unless you believe that Microsoft will be the only publisher of software that can read and write to the disk.

      I almost hate to mention a few of the ways you could get files off the Longhorn file system...FTP, HTTP, Samba, e-mail, ISO-whatever-it-is CD-ROM...is anybody really worried about this? Seriously?

      --

      There ain't no rules here; we're trying to accomplish something.

    29. Re:Yay Evil Monopoly Of Doom! by spitzak · · Score: 2

      That's a description of XML itself, not of how Word will use XML to store files.

    30. Re:Yay Evil Monopoly Of Doom! by mbogosian · · Score: 2

      If everything about this really is kosher, though, then everybody give a great big "Thank You!" to MS!

      Let's see, if I remember correctly from high school logic...

      A: everything about this really is kosher
      B: everbody give a great big "Thank You!" to MS!
      Given: A is FALSE

      A -> B val
      F -> T T
      F -> F F

      So as long as we thank them anyway, we still have a true statement. It's our only course of action, so here goes:

      "Thank you sir, may I have another?"

    31. Re:Yay Evil Monopoly Of Doom! by Corporate+Drone · · Score: 2
      donutello says:

      I have a Masters in Computer Science... There's no such thing (as "kernel level SQL data"). SQL data is stored in tables. You use queries to get at it. Period.

      I appreciate that you have a Masters in CS. I have a Bachelor's in CS. (What does that prove, btw?)

      Let's add one point that you managed to skip over, in your analysis: SQL data is stored in tables. You need authority to the tables in order to make any sense of the data. You use queries to get at it. Period.

      ok ... so, if you don't have access to the tables without permission from the OS (and you better believe that only a Microsoft OS will have permission), then you have two choices: (1) break the authentication scheme, or (2) translate the raw bits of a proprietary DB.

      so, if I'm MS, I give up my stranglehold on the Word format, replacing it with a stranglehold on the authentication to query via SQL. In other words, you have play by my rules to have the right to use queries to get at data. Period.

      that being the case, what's your point, then?

      --
      mmm... yeah... You see, we're putting the cover sheets on all TPS reports now before they go out...
    32. Re:Yay Evil Monopoly Of Doom! by donutello · · Score: 2

      You're a fucking idiot. I have a Masters with a focus on databases and storage technology.

      And yes, you blooming idiot, you need authority over the tables and presumably you are a frigging sysadmin and therefore automatically have that authority. There isn't a database out there (barring storing encrypted data in the database which is decrypted OUTSIDE the database) which bars an admin from doing anything they want to with the data.

      You obviously understand nothing about databases so butt out.

      And there is still no such thing as Kernel level SQL data. That is just techno-babble used to fool idiots who don't understand what is being talked about - like yourself.

      --
      Mmmm.. Donuts
    33. Re:Yay Evil Monopoly Of Doom! by Corporate+Drone · · Score: 2
      You're a fucking idiot. I have a Masters with a focus on databases and storage technology. And yes, you blooming idiot

      wow... i so get it now. I mean, I was totally oblivious to your point, but then you called me a fucking idiot, and now it's all clear!

      p.s., thanks for clearing up what I missed in grad school. apparently, your school taught that when someone disagrees with you, you wave your degree around vigorously, and then use ad hominem attacks. so, so effective.

      p.p.s., btw, i started re-arguing the point, but ya know what? Forget it. Yeah -- you're right. I concede. Now take your degree and your attitude, and go back to your sandbox, like a good little boy...

      --
      mmm... yeah... You see, we're putting the cover sheets on all TPS reports now before they go out...
  2. Incompatibilities Once Again by robbyjo · · Score: 3, Insightful

    .... I guess it's just MSXML rather than THE standard XML. But we can figure it out with some "intelligent guesswork" now because the file would be human-readable.

    --

    --
    Error 500: Internal sig error
    1. Re:Incompatibilities Once Again by JaredOfEuropa · · Score: 5, Insightful

      It's just like the old SGML module for Word they used to have about 6 years ago. My guess is that there will be some significant drawback to saving documents in XML, such as loss of some formatting information. That would convince users not to save in the XML format... but that isn't the important thing to Microsoft.

      More significantly, there might be small incompatibilities, or ways that Word-created XML documents divert slightly from what is normal and proper in XML. Perhaps Word will make some (intentional) mistakes when reading back XML files generated in other applications, just like Word's old SGML module would choke on many proper SGML documents.

      Make no mistake: the fact that almost everybody is using Office and the associated file formats makes it very hard for a new contender to enter the office suite market. Microsoft must be aware of the power they have over the market with their Office file formats. Think of it: when you exchange files with other businesses, you have two realistic choices of file formats: Office or plaintext. And now Microsoft is introducing compatibility with an open and well-defined markup langauge, in favour of their proprietary language? I'll believe it when I see it.

      --
      If construction was anything like programming, an incorrectly fitted lock would bring down the entire building...
    2. Re:Incompatibilities Once Again by Qrlx · · Score: 5, Insightful

      Think of it: when you exchange files with other businesses, you have two realistic choices of file formats: Office or plaintext

      I think PDF is a viable (growing even) third option. Adobe is "evil" just like MS (remeber Sklyarov)... regardless, PDF is nice and it works well, and the files are way smaller than word docs.

    3. Re:Incompatibilities Once Again by DrXym · · Score: 2

      PDF is fine if you just want to print stuff out, but it contains absolutely none of the information of the original document that allows you to edit it.

    4. Re:Incompatibilities Once Again by arkanes · · Score: 2

      If someone could point me to a free(as in beer or speech, I'm not a zealot), stable (slap acrobat), fast, and easy to work with application for viewing and editing PDF files, on Windows, I'll be alot happier about PDFs.

    5. Re:Incompatibilities Once Again by greenhide · · Score: 2

      It's just like the old SGML module for Word they used to have about 6 years ago. My guess is that there will be some significant drawback to saving documents in XML, such as loss of some formatting information. That would convince users not to save in the XML format... but that isn't the important thing to Microsoft.

      Probably not, judging by my experience with documents translated to HTML. Although some features are lost (headers and footers, notably), MS Word has actually been pretty good about using styles and HTML comments to define all the remaining content--including Mail Merge and detailed styles and formatting.

      Most of the features that are lost were because those features don't exist on web pages at all (notably, headers and footers), so there wasn't a sensible reason for putting them into the output.

      But I think we can actually expect this XML format to offer almost all of the formatting of a standard Word document.

      And now Microsoft is introducing compatibility with an open and well-defined markup langauge, in favour of their proprietary language? I'll believe it when I see it.

      The key is behavior in reading those XML files. HTML is also an open format, and their documents can be saved in HTML format. But Word-specific features and many of their styles won't work except in Microsoft Word. Even if their documents are 100% xml compliant/compatible, that doesn't mean that the information contained in them is particularly useable in a non-Word application.

      --
      Karma: Chevy Kavalierma.
    6. Re:Incompatibilities Once Again by pomakis · · Score: 2
      PDF is fine if you just want to print stuff out, but it contains absolutely none of the information of the original document that allows you to edit it.

      I would say that 95% of the documents floating around on the web, being e-mailed to people, etc., are meant to be "read only", i.e., with no intention to be editable. PDF is a much better format for this purpose.

    7. Re:Incompatibilities Once Again by sir99 · · Score: 2, Informative

      GSView (requires Ghostscript) works pretty well on Windows. It's also free beer/speech, depending on which version you get (old versions get relicensed as GPL when a new version is released). As for editing, I don't know of anything besides acrobat that edits PDF directly.

      --
      The ocean parts and the meteors come down
      Laid out in amber, baby.
    8. Re:Incompatibilities Once Again by Abreu · · Score: 2

      And that's a Good Thing(C) most of the times!

      --
      No sig for the moment.
    9. Re:Incompatibilities Once Again by alfredo · · Score: 2

      One friend on MSN/Hotmail cannot read PDF's in his mail for some reason.

      In OSX PDF's is the way to go. I can turn any document into a pdf by hitting print, then hitting the PDF option.

      GSView is pretty neat too.

      Ignore MS and they will go away.

      --
      photosMy Photostream
    10. Re:Incompatibilities Once Again by King+Babar · · Score: 2
      It's just like the old SGML module for Word they used to have about 6 years ago. My guess is that there will be some significant drawback to saving documents in XML, such as loss of some formatting information.

      Um... You'd better hope that saving to XML loses some formatting information, since that's the whole point of *ML approaches: to separate content from presentation. A more charitable reading of what you say is that the style sheet you need to apply to the XML to render a Word document might be crippled. Could be.

      Frankly, though, I suspect that the *opposite* thing will occur. The style sheets won't be crippled, rather, they will be absolutely wonderful. So good, in fact, that you would not want to do without them. So powerful, that you will want to re-do your entire website in Word XML just to use them, and allow users complete transparency in going to-and-fro. Regular HTML and will become scarce on the web. PDF will also be seen as less necessary.

      Of course, there might be a catch or two...like, the style sheets will never be publicly available, and you will not be allowed to use them from a non-MS browser, and the XSLT that allows you to export to html and pdf won't work quite right...

      As always, you have to be very careful about what you ask for, since you might just get it.

      --

      Babar

  3. wicked :) by oo7tushar · · Score: 2

    I've been waiting for this. It's gonna allow me to goto a full Linux system and not have to pay any money...I hope.

    I'm wondering, can MS charge for licences to write tools that parse the XML documents?

    1. Re:wicked :) by Mnemia · · Score: 3, Insightful

      I doubt it. XML is specifically designed around interoperability, and I don't think MS can charge for use of a standard they don't own. That's why I think that they will break standards compatibility somehow.

  4. What will be the default save format? by leandrod · · Score: 5, Insightful

    The most important question, besides if the MS Word XML format will be well-documented enough, is if it will be the default saving format. Most MS Office users simply don't care enough to save MS Word documents in RTF, for example, even if it's more than good enough for the vast majority of the documents.

    Not the main issue on the article, but it is unfair to single someone as the inventor of XML, which is just a streamlined version of SGML which is an evolution from IBM's GML.

    --
    Leandro Guimarães Faria Corcete DUTRA
    DA, DBA, SysAdmin, Data Modeller
    GNU Project, Debian GNU/Lin
    1. Re:What will be the default save format? by Rinikusu · · Score: 5, Funny

      Stop right there.
      If you continue with that line of reasoning, someone's gonna demand that it be called SGML/XML.

      Grr.

      --
      If you were me, you'd be good lookin'. - six string samurai
    2. Re:What will be the default save format? by reaper20 · · Score: 2

      Not the main issue on the article, but it is unfair to single someone as the inventor of XML, which is just a streamlined version of SGML which is an evolution from IBM's GML.

      Why not? They list him as a co-inventor, meaning, he didn't do it all himself.

      I wouldn't simplify the comparison between XML and SGML. That's like saying the invention of the printing press was insignificant, since people already had a written language.

      XML makes SGML actually usable, and if this guy helped make it so, then he deserves a little bit of credit.

    3. Re:What will be the default save format? by shird · · Score: 2

      Something like XML would compress quite well though, so they would probably introduce some kind of 'compiled/compressed XML file' and the specs of how to decompress it. Similar to their compiled HTML format for help files. It would require an intermediate parsing step before it could be used in perl scripts etc though.

      --
      I.O.U One Sig.
    4. Re:What will be the default save format? by StefMeister · · Score: 5, Interesting
      According to this article on ZDNet, it wil probably NOT be the primary file format:

      To make that happen, Microsoft is turning to what some analysts say is a risky strategy. The company is adopting Extensible Markup Language (XML) as a second file format in all Office applications, to enable better data exchange between the productivity suite and back-end software, such as databases.
      --
      "Son, in a sporting event, it's not whether you win or lose, it's how drunk you get" - Homer J. Simpson
    5. Re:What will be the default save format? by Karellen · · Score: 2

      Don't you mean XML/SGML?

      As TCP/IP (TCP over IP; think 3/5 == three fifths == 3 over 5) is TCP running over IP, and GNU/Linux is the GNU toolchain running over a Linux kernel, surely XML is a document metaformat `running over' the earlier, more complex, SGML document metaformat.

      K.

      --
      Why doesn't the gene pool have a life guard?
    6. Re:What will be the default save format? by greenrd · · Score: 2
      sxw files aren't text format. They're zipped xml, hence compressed. But you're right - M$ == bloatware.

    7. Re:What will be the default save format? by evilpenguin · · Score: 2

      I'm just curious what it is about SGML that you think makes it "unusable?" SGML is perfectly usable. I don't think there is anything "easier" about XML for people writing documents. There's quite a bit easier about writing fully compliant parsers for each, but that's a different story. XML just makes some simplfying assumptions, but since every valid XML document is also a valid SGML document, I'm not sure how SGML can be said to be "unusable."

  5. I doubt it. by theLOUDroom · · Score: 3, Insightful

    I really have my doubts about wether Microsoft will allow "any programmer with a Perl script and a bit of intelligence" to muck around with Office documents.
    I'm guessing their XML document format will be just as hard to decyper and the current office formats.

    --
    Life is too short to proofread.
    1. Re:I doubt it. by sql*kitten · · Score: 5, Insightful

      I really have my doubts about wether Microsoft will allow "any programmer with a Perl script and a bit of intelligence" to muck around with Office documents.

      Why not? After all, the high-quality ActiveState port of Perl to Win32 exists because Microsoft paid for it, and you can download it for free. Not only that, but if you want to write your own code to manipulate Office documents, you have been able to do that for years in VBA - all the Office programs expose rich APIs. In fact, they are composed of Objects that you can instantiate and use in your own programs if you want - all MS care about is that there is a licensed copy of Office on the user's machine. One of the easiest ways to do charting is to simply reuse a bit of Excel, for example. From there it's a short hop via COM to any program you want.

      I'm guessing their XML document format will be just as hard to decyper and the current office formats.

      The fact that Office documents have been in a proprietary format in the past is actually unimportant, since the interfaces to the applications (and hence their documents) are well documented (check MSDN or Barnes & Noble if you don't believe me). So the reason that Microsoft are doing this is that they lose nothing and gain from making the platform even more attractive to developers.

    2. Re:I doubt it. by spongman · · Score: 3, Funny

      what are you talking about? you can 'muck about' with office documents right now with whatever language you want, Perl included. You don't need XML to do it.

    3. Re:I doubt it. by jkramar · · Score: 2, Informative
      That's all fine and good if
      1. you don't mind having to buy Office just to modify Office files
      2. you're on Windows
      . Actually, the various APIs are probably there on Macs as well. However, if you're on Linux, then you're stuck. OpenOffice, Abiword, et al. do a reasonable job of reading Office files, but can't quite read everything perfectly, and the fact that Office documents are binary dumps instead of nice, legible XML doesn't help. This is, as I think many readers have realized, a significant advantage that an XML file format would lend. If they carry this out, then developers of other apps, such as competing office suites and member programs, whether free software or not, will have a much easier time reading and correctly interpreting these documents.
      --

      true && more || less
    4. Re:I doubt it. by ianezz · · Score: 5, Interesting
      I'm guessing their XML document format will be just as hard to decyper and the current office formats.

      There are 2 problems with the current format of Microsoft Office file:

      1. Give the correct interpretation to the bytes representing the document content, in order to import the Office document in some other office suite using a different representation.
        This is mostly solved (thanks to years of trials and errors).
      2. Give the correct interpretation to the bytes representing the document itself AND all the extra cruft having nothing to do with the document contents that the Microsoft Office suite puts in, in order to generate documents readable by the various versions of the Office suite.
        This is definitively more difficult, as nobody knows Office internals and how they expect such additional data to be. StarOffice guys managed to make an acceptable job, at the price of years of trials and errors. It's like watching at a dump of your computer's memory, guesssing what's code, what's data, what's padding and the meaning of every byte...

      Now, do an XML format simplifies things? Well, yes, just as an RTF text is easier to manage than a pure binary format, but nothing prevents putting extra cruft in an XML document, so it's just that instead of having to use a hex editor, you now may use a text editor, but giving a correct interpretation of tags and attributes is something that only Microsoft can do, unless it publishes the full specifications (present and future: after all, XML is eXtendible, right?)

      Personally, I think that:

      • Microsoft is realizing that the current Office formats are getting out of control, so it wants to get rid of them, because mantaining backwards compatibility is becoming too much painful.
      • An XML-based format may be the right answer for Microsoft, in that all the subtles of parsing binary data simply disappear, while it may still make difficult to everyone else to understand what's the real meaning of data. Let's say <obscuretag_42 foobarizer="xyzzy"/>
      • Microsoft was not giving out the specifications of the formats of its Office suite before: should we now suppose it's giving out the DTD/Schema AND a good explanation of how to interpret it? I'd hope the answer is yes, but giving the company's precedents...
    5. Re:I doubt it. by Bartmoss · · Score: 3, Insightful

      I think you are dead on. Plus: a) XML is a great buzzword; b) it makes MS *seem* more "open" and "standards compliant".

    6. Re:I doubt it. by Penguin · · Score: 2, Interesting

      ... in fact, Microsoft has code examples for perl in their Knowledge Base:

      http://support.microsoft.com/default.aspx?scid=k b; en-us;Q214797

      (furthermore I'm impressed that a reply like "They'll probably do something evil..." would be rated as "Insightful")

      --
      - Peter Brodersen; professional nerd
    7. Re:I doubt it. by Anonymous Coward · · Score: 2, Insightful

      "After all, the high-quality ActiveState port of Perl to Win32 exists because Microsoft paid for it"

      That port existed well before the MS involvement in ActiveState.

      Here's the original story on Microsoft's role:

      "6/2/1999 -- Microsoft Corp. and ActiveState Tool Corp. (www.activestate.com) signed a three-year Perl Open Source development and support contract.

      As part of the agreement, ActiveState will add features previously missing from Windows ports of Perl, as well as full support for Unicode on Windows platforms."

      Source: http://www.entmag.com/news/article.asp?EditorialsI D=1633

      ActiveState has similar partnerships with many others: http://www.activestate.com/Corporate/Partnerships/

    8. Re:I doubt it. by commodoresloat · · Score: 2
      * Microsoft is realizing that the current Office formats are getting out of control, so it wants to get rid of them, because mantaining backwards compatibility is becoming too much painful.

      Since when have they maintained backwards compatibility? I just got a MS Word document yesterday that I couldn't open. I was using MS Word.

    9. Re:I doubt it. by greenrd · · Score: 2
      OK - I call your bluff. Please can you point me to details on how to manipulate an embedded diagram in a MS Word 2000 file using Java on Linux - thanks.

    10. Re:I doubt it. by khuber · · Score: 5, Insightful
      The fact that Office documents have been in a proprietary format in the past is actually unimportant, since the interfaces to the applications (and hence their documents) are well documented

      So you can read Office documents with other programs as long as you have Office and MS dev tools?

      You do see the folly in that, right?

      -Kevin

    11. Re:I doubt it. by Hooya · · Score: 2
      all MS care about is that there is a licensed copy of Office on the user's machine

      and that's exactly why it doesn't mean didly squat for me. you see, i don't use office. i do program a lot tho and it would be nice to be able to export my output in word/excel etc.. for others who do have office. if MS wants me to have a licensed copy of office as well even tho i have no use for it whatsoever, well i'll stick with PDF thank you very much. (yeah PDF is properiatory but the format is open; i don't need acrobat to be able to create PDFs now do i? check out FOP -- they've implemented a decent PDF libs -- in java tho. only thing 'missing' seems to liniarizing the object tree in the PDF for web viewing)

      i really don't care about XML to the extreme where i'd implement XML without any direct payoffs. so unless i can write the XML with a simple script (perl or otherwise) without the COM bindings. it's of no use to me if the lock-in moves from the document viewer to the program level.

      right now, i have a few apps that dump their output in either GIF/PNG/PDF as an option. i'd consider word/excel format if i could do that from the script without the 'proprietary' modules/COM components. otherwise this new event is a no-op. i'll stick with PDF for right now.

    12. Re:I doubt it. by buzzcutbuddha · · Score: 2, Insightful

      Oh I get it! We're beating on Microsoft for not opening up it's file formats earlier because WordPerfect and Lotus products are so much more open...oh wait....

    13. Re:I doubt it. by Avumede · · Score: 2

      Office program expose the API. To get the text out of a MS Word program, even if you have Windows and Office, you have to start up Word, which is really inefficient.

      Many programs that need to parse the documents still must resort to manual methods. If you were writing a program that needed to access the text from these files (a search engine, for example), you would want to crack it yourself. Using Word to do it would almost certainly be slower, and if you use the COM API's, you are restricting your program to only run on Windows. In fact no search engine that I know of uses COM to crack a document.

      The XML will be a vast improvement.

    14. Re:I doubt it. by sql*kitten · · Score: 2

      Office program expose the API. To get the text out of a MS Word program, even if you have Windows and Office, you have to start up Word, which is really inefficient.

      True, but you only need to do it once. You can use the API to extract the contents and then store them yourself in any format you want, then run the indexer over that.

    15. Re:I doubt it. by Reckless+Visionary · · Score: 2
      I just got a MS Word document yesterday that I couldn't open. I was using MS Word.

      Oh phooey, some Mac person probably put ".doc" on the end of a Quark file and sent it to you thinking that's how you changed a file type ;-)

      --
      I think I'll stop here.
    16. Re:I doubt it. by tshak · · Score: 2

      Microsoft was not giving out the specifications of the formats of its Office suite before: should we now suppose it's giving out the DTD/Schema AND a good explanation of how to interpret it? I'd hope the answer is yes, but giving the company's precedents...


      What precedents? The fact that they've always documented everything very well including API's to get at those Office documents? Sure, they don't document the binary formats, because they give you API's to get at the data. What about the precedents with .NET? Nothing to hide here, full DTD's and Schemas for all config files and for Web Services. What about IIS 6? All config is in well documented XML files. Also, what would be the point of preaching developer ease when they don't document their XML?

      --

      There is no longer anything that can be done with computers that is nontrivial and clearly legal. -- Paul Phillips
    17. Re:I doubt it. by spongman · · Score: 2

      i didn't say anything about linux, but you can do it with java.

    18. Re:I doubt it. by greenrd · · Score: 2
      Um, if you can't do it on Linux, it's not really Java. Perhaps you don't know the difference between Javascript and Java?

  6. Historical turningpoint? by haeger · · Score: 5, Interesting
    I just thought about someone saying that somewere, when you look back in history, you can see some historical turningpoint where tings just went wrong or right.

    One small such point is when IBM gave out the specs to their hardware for PC allowing everyone to clone it, while Apple did not.

    This could be such a point. Maybe in 10 years we'll look back at this and ask ourselves "Why the heck did MS XML-enable their Office app, releasing the hold that they had"

    Only time will tell I guess.

    .haeger


    I Play Hattrick

    --
    You are not entitled to your opinion. You are entitled to your informed opinion. -- Harlan Ellison
    1. Re:Historical turningpoint? by Bud · · Score: 2, Flamebait

      MEEP! Wrong! Phoenix Tech was first to license their reverse-engineered BIOS, opening up the PC-clone market.

      http://oufcnt5.open.ac.uk/~richard_lawton/Sectio n% 206%20Notes.html
      http://www.pbs.org/cringely/pulp it/pulpit19990930. html

      To address the point in your post, Microsoft has a huge penetration in the Office market and no amount of XML fidgetry is going to kick them out. Rather, they'll love it if a small sub-industry grows up around the MS Office XML standard. Then they will release the Office Document XML standard v1.1, then 1.2, then 2.0 and so on, releasing that information only to "trusted partners". No chance the StarOffice team is going to see the next version before it hits the market.

      THAT's what you learn if you look at history. (Which you apparently didn't do. Duh. You lose...)

      --Bud

    2. Re:Historical turningpoint? by BurritoWarrior · · Score: 4, Informative

      IBM release th framework in which to do so because of the governmental investigation they were under at the time.

      They didn't do it out of the goodness of their hearts, but they did indeed do it. It wasn't the complete bios though so Compaq had two teams...one team looking at the specs, and another (that could never look) building a clean room implementation.

    3. Re:Historical turningpoint? by Gerry+Gleason · · Score: 2
      IBM release th framework in which to do so because of the governmental investigation they were under at the time.

      I don't think there was any connection to the IBM anti-trust case. They had been trying to get into the small and 'home' computer markets for years. The DisplayWriter had some limited success. I don't think they realized that the big market for the machines was business, not the home. If you look at the first offering, this is pretty clear. It had casset tape support, not floppies nor any hint of a hard disk option. The configuration that was sold most often had most of the ISA slots filled so you could have: floppies, Monochrome text plus printer, extra memory, and ??? (memory fade). The XT wasn't far behind, and it had all of that and a hard disk as a base configuration.

      All the talk at the time was that if IBM had really known what they had, they would have tried a lot harder to control things and lock up the standard. It probably wouldn't have been the success it was if they had, but it's hard to know since it didn't happen that way.

    4. Re:Historical turningpoint? by n-baxley · · Score: 2

      IBM gave out the specs to their hardware for PC allowing everyone to clone it, while Apple did not.

      I would argue that IBM gained from that move. Take a look at how the IBM PC market grew because of that openess. While Apple grew, it was not at the same rate. Granted IBM's PC division is now not that much to write about, but that's because they didn't keep up. We may look back and see this as a big plus day for MS.

    5. Re:Historical turningpoint? by poot_rootbeer · · Score: 2

      One small such point is when IBM gave out the specs to their hardware for PC allowing everyone to clone it, while Apple did not.

      Bull. If Compaq hadn't reverse-engineered the IBM PC BIOS, there wouldn't have been any Gateways or Dells selling cheap PCs -- you'd be buying them from Big Blue, and paying twice as much.

      The open standards allowed third parties to develop expansion cards and peripherals for the IBM PC, but the same is true in the realm of Macintosh.

  7. *when* ? by Monty+Worm · · Score: 4, Funny
    when the huge universe of MS Office documents becomes available for processing by any programmer

    I beg you pardon? Smelly programmers can keep their hands off my documents. If I wanted you to have them, I'd have emailed them to you as plaintext. I wasn't aware the the Office license meant my documents were common property....

    --
    ... and today's pet project has ... been discarded for lack of time.
  8. The right time for MS by terminal.dk · · Score: 5, Insightful

    MS is trying to time this right.

    Right now they are seeing diminishing sales, possible shrinking market share. Most of the danish public sector is looking to save money using OpenOffice/StarOffice.

    MS needs to increase their compatibility with other options, as they would otherwise force customers to convert every single user away from MS at once, instead of OpenOffice coming in slowly.

    They can also hope, that their format is setting the standard, and the other companies will have to play catch-up rather than the other way around.

  9. imagination by selderrr · · Score: 5, Funny

    ...all sorts of wonderful new things can be invented that you and I can't imagine...

    When will MS ever learn that we don't WANT to imagine how wonderfull the MS Office Universe is ?

    1. Re:imagination by dbrutus · · Score: 2

      Great software is all well and good but I care about whether it comes complete with lawyers with a bad attitude attached. If so, no thanks unless I can't get out of it.

  10. WTF???? by jericho4.0 · · Score: 3, Informative
    from the article;
    The most important question, besides if the MS Word XML format will be well-documented enough,...

    WTF!? XML shouldn't need to be documented. The whole point is to create a human readable file that is parseble by computer. If MS Word delivers an XML file that I can't figure out, it's not XML.

    --
    "A language that doesn't affect the way you think about programming, is not worth knowing" - Alan Perlis
    1. Re:WTF???? by lovebyte · · Score: 4, Insightful

      Have you ever seen some complex XML file? Without documentation it could be as difficult as binary to reverse-engineer!

      --

      I'll do it for cheesy poofs.

    2. Re:WTF???? by Anonymous Coward · · Score: 3, Insightful

      That really depends on your definition of XML and human readable.

      <?xml version="1.0">
      <document>
      jMyB38QAAMETWFjs7IQAAQEVkJBNq0jEAAW
      RvbGWTYBAADARUaGlzRG9jdW1lbnQ8nhAAC
      udGrTEAAC8BATwAAADMAv8AAgEABABIAAAA
      </document>

      is valid xml, just like a uuencoded file is valid ASCII and human readable.

      But if other M$ products are any indication it won't be that bad. I parsed some Visio stuff and the data was more or less readable. The drawing data (or previews, didn't care) was still encoded though. I expect it to go a little like M$ html did.

    3. Re:WTF???? by jericho4.0 · · Score: 2

      Yes I have. And that's exactly my point. I understand XML as a product of thinking "OK. We've got the storage, we've got the computing power, let's stop storing our data in binary and make it readable by humans.". If XML is unreadable, even without knowing what program wrote it, it fails to live up to it's promise.

      --
      "A language that doesn't affect the way you think about programming, is not worth knowing" - Alan Perlis
    4. Re:WTF???? by anshil · · Score: 2

      Thats absolute marketdroid nonse, just because it is XML doesn't mean it must not be documented. It also has to be documented as well as binary. First step is the DTD for the XML which describes some rules the XML follows, but even with a DTD you need a documentation what what means.

      This all applieas also to XML-RPC. It has also to be documented like tradional RPC. Whats the advantage of XML-RPC over RPC? I the hell don't know. Besides that 3 times as much data is transfered I guess the only advantage is a market buzz word thrown in, doesn't matter the technical benefits/costs.

      --

      --
      Karma 50, and all I got was this lousy T-Shirt.
    5. Re:WTF???? by WhaDaYaKnow · · Score: 2

      Exactly.

      And even a document that conformed to an XML Schema could be just as hard to reverse-engineer as a binary file.

      We've all seen the obfuscated C contest (or the obfuscated JavaScript scripts in certain webpages).

      At the end of the day all that matter is if the company _really_ wants to document the format or not.

    6. Re:WTF???? by richie2000 · · Score: 2, Redundant
      I parsed some Visio stuff and the data was more or less readable.

      Visio was just recently bought by M$, they obviously haven't had time to corrupt the file format yet.

      --
      Money for nothing, pix for free
    7. Re:WTF???? by anshil · · Score: 2

      Thats not true, you cannot shift tags in XML, well you can write ASCII text files having this " text ", but this is NOT valid xml, simple as that.

      You will never get past a XML validation tool like xmllint with this.

      For "your" format, look how RTF looks like, it's very similar. only a tag starts with \, and groups with { }.

      The point in using XML is not that the format is that superiour to your own formats (well it ain't bad altough), but the point is that you can already use a whole set of tools on this format, you would have to write yourself for you own. Like a validator (xmllint), a technical exact spec how data symantic is encoded (the XML-DTD), and finally you can use an already written parser/reader to read the syntax (my favorite libexpat).

      --

      --
      Karma 50, and all I got was this lousy T-Shirt.
    8. Re:WTF???? by DGolden · · Score: 4, Insightful

      Here's my pet rant:

      I would say that XML isn't a markup language - a markup language would allow the "bad nesting", since a markup language should be "layers of virtual highlighter pen" applied to an underlying data stream. XML, since it requires "proper nesting", is just Lisp sexps reimplemented, but with terrible syntax. It's yet-another-tree-structured-data-format. Big Wow. A true markup language environment would facilitate part-structured data, like HTML used to be, rather than shoehorning everything into trees.

      Lisp sexps would just say (stuff (things "text"))

      In fact, that's pretty much all there is to lisp syntax right there. The above is (a) a potentially valid lisp program and (b) a valid lisp data structure.

      XML is a data format designed mainly to allow C and Java programmers to use vaguely Lisp-like processing techniques without realising it and/or admitting it to themselves.

      --
      Choice of masters is not freedom.
    9. Re:WTF???? by spongman · · Score: 2

      the bitmaps are stored as encoded data, the drawings are stored as standard VML.

    10. Re:WTF???? by vidarh · · Score: 5, Insightful
      The point of XML is not for it to be human readable, but to allow easy automatic processing of various kinds.

      With XML Schema and DTD's, you can validate various aspects of the data without writing a custome validator.

      With XPath and XPointer you can refer to parts of an XML document without needing to understand what the document contains.

      With XSL you can translate all or parts of the document from one format to the other without your application needing to know the structure, and without needing to understand more of the format than the parts you are extracting.

      With SAX and the DOM you can programmatically traverse and extract information from an XML file without having to write a custom parser.

      With CSS an editor or viewer for instance can use a standard mechanism of applying styles to elements without hardcoding the style attributes for elements anywhere.

      With XML namespaces, you can intersperse data in various formats in the same file, and the components handling each of the vocabularies need not know anything about the other components - an example would be embedding SVG in HTML: The HTML renderer doesn't need to understand any of the SVG tags, only that it should delegate contents with other namespaces to another component. And the SVG renderer couldn't care less about the HTML.

      And this doesn't even touch on the benefits of all the various interchange formats that have been specified on top of these base technologies.

      The importance of XML is that it opens up the doors for building interchangable components that operate on data without needing any hardcoded application specific knowledge of the data.

      Most of the time, you still have to write some code to tie it all together, but you don't have to build your own parsers, your own document object model, your own styling system, your own way of handling contained data of other types, your own way of transforming data between formats, etc.

      For me as a software developer XML delivered years ago. I use XML technologies daily, and it saves me work.

    11. Re:WTF???? by spongman · · Score: 2
      yeah, but you trivialize XML. you're whining about an insignificant syntactical aspect of XML. from what you've shown your 'language' has no handling of encodings, namespaces or schema-based validation, all of which form the basis of whats really useful about XML.
      Funny thing is it's trivial to convert this to XML, but yet vice-versa isn't necessarily so easy. (because of the above example)
      it would be simple to write an XSLT stylesheet to render any XML document in your language. how is your example not equivalent to:
      <stuff>
      <things>text</things>
      </stuff>
      ?
    12. Re:WTF???? by ianezz · · Score: 3, Insightful
      WTF!? XML shouldn't need to be documented. The whole point is to create a human readable file that is parseble by computer. If MS Word delivers an XML file that I can't figure out, it's not XML.

      Uhm, it is also the point of source files in the programming language of your choice, I'd say... and still, you need good comments.

      XML is like Lisp, but with sharp parenthesis.

    13. Re:WTF???? by SlamMan · · Score: 2

      I'd throw a comment about obfuscated perl in here, but I think thats perl's default.

      --
      Mod point free since 2001
    14. Re:WTF???? by statusbar · · Score: 2

      It's time that the Lisp people admit to themselves that they lost and move on.

      Actually, the reverse is true, my friend.

      XML and XSLT are dirty tricks made by bitter lispers on the rest of the computer world! XML is just a way to do "LISP sexps" in a worse syntax. Everyone accepted it because it looked kinda like HTML! They were tricked!

      It is trivial to make a program that converts XML to and from LISP sexps.

      Quite often it is very worthwhile to convert the XML to sexps, do your processing algorithm in lisp, and convert the resulting sexps back into XML.

      --jeff++

      --
      ipv6 is my vpn
    15. Re:WTF???? by Nailer · · Score: 2


      That really depends on your definition of XML and human readable.


      That's a very good point. If your definition of human readable is illogical and bizarre, that could indeed be XML. Or a pair of pants.

  11. What is the format? by pubjames · · Score: 2

    There's lots of speculation here about MS doing stuff to create lock-in with this new format, but I want to actually see the format. Is there any documentation anywhere about it? Or does someone out there have a document in the new format that we can take a look at? Of course, being XML we should be able to just open it and take a look. That would put an end to all this speculation.

    1. Re:What is the format? by Witchblade · · Score: 2

      A starting place. No way to know really how close they'll stick to what they've done up to this point.

  12. XML takes away Microsoft's main advantage by Zeddicus_Z · · Score: 5, Interesting

    As far as I can tell, one of the major reasons many businesses refuse to change over from Microsoft Office to cheaper options is due to file compatability. As our company's IT admin put it recently on the suggestion of using OpenOffice, "I get sent hundreds of Microsoft Word, Excel and Access documents a week. I need to know that I can open and access every single one of those without problems". An example of proprietry file formats helping Microsoft keep the monopoly.

    However, if Microsoft Office documents become "built around an open, internationalized standard", i.e. XML, would this not enable the people behind OpenOffice, StarOffice etc to acheive total 100% file compatability and thus negate Microsoft's largest advantage with Office?

    Of course, this could be yet another Microsoft "embrace and extend" tactic, a la` kerberos. Incorporate the standard in a bastardised form, claim standards compatability, then pollute it so you must be using Microsoft technology to properly interact with it.

    --
    Janie took my gun...
    1. Re:XML takes away Microsoft's main advantage by jgp · · Score: 2, Insightful

      Have you seen the HTML produced by the current "Save as webpage .." options in Word? shudder. The vast majority of semantics are actually embedded in XML islands hidden inside HTML comments. I see no reason why Microsoft would change their tune now (they'll simply change the DTD from one inappropriate document model to another one IMHO).

      <wordDocument>
      <!-- (document content here) -->
      <nonMicrosoftElement>I'm sorry, you don't appear to have a StandardsEnhanced(tm) word processor.</nonMicrosoftElement>
      </wordDocument> --
    2. Re:XML takes away Microsoft's main advantage by GT_Alias · · Score: 2, Insightful
      I need to know that I can open and access every single one of those without problems

      Interesting point...when people start buying Office 11 and sending you those XML-saved Word documents, you will have no option but to go out and fork over some cash for an upgrade.

      Unlike now, I can send an Office XP formatted Word document and older versions can still open it. Of course...older versions can't open newer databases, that's been a wonderful source of headaches.

    3. Re:XML takes away Microsoft's main advantage by ryanvm · · Score: 2

      Shhhhhhhh - you're going to ruin it you fool.

      Remember, we tell them their plans suck after they implement them (see SDMI).

    4. Re:XML takes away Microsoft's main advantage by Planesdragon · · Score: 2

      Have you seen the HTML produced by the current "Save as webpage .." options in Word? shudder. The vast majority of semantics are actually embedded in XML islands hidden inside HTML comments. I see no reason why Microsoft would change their tune now (they'll simply change the DTD from one inappropriate document model to another one IMHO).

      Those are there so you can "round trip" a file from doc/xls/ppt to ms-htm and back, without losing all of your MS-only formatting.

      MS has a utility, which I use on a regular basis at work to strip out the HTML for you. It's called HTML filter

      (Go to http://www.mvps.org/word/ for more useful bits about word.)

      XML, unlike HTML, actually can express everything that office does. MS will use it 99%, possibly adding their 1% extra back into the spec, and let their penetration and familiarity secure their market share.

      Trust me--if MS can get Wordperfect 2004 to use XML to "keep up," they'll be able to beat out their last real competitor in the few markets were it's still entrenched. The biggest problem with MS Word has been roundtrips to Wordperfect; XML can solve that problem if done properly, far better than HTML can.



      I'm sorry, you don't appear to have a StandardsEnhanced(tm) word processor.
      --


      Not going to happen. I could cut and paste some code from a MS-Office HTML file right into this slashbox, and you'd be able to read it just fine. (assuming the lameness filter doesn't get me first.)

  13. HTML from Word by e8johan · · Score: 5, Interesting

    Just look at an HTML file exported form Word2k. I would not call that compatible with any HTML I've ever learned. Most probably the XML file exported from Office 11 will be a Microsoft specific file, specifying lots of Office specific ActiveX (aka OLE) info that cannot be emulated. And, hey, they can probably store binary data in XML. The only change is that most competing products will emit files that Word can easily read, i.e. M$ will get the biggest benefits.

    1. Re:HTML from Word by pubjames · · Score: 5, Insightful

      Just look at an HTML file exported form Word2k.

      An excellent point sir. That's a great illustration of how Microsoft approaches 'open' file formats.

      People that think that MS Office is going to move to open, well documented file formats are just plain nuts. But look at many of the comments in this forum - it seems MS has even managed to persuade many Slashdotters that they are going to use open formats. Poor fools.

    2. Re:HTML from Word by superyooser · · Score: 5, Informative
      True. Just a couple days ago, I saved a doc as Web Page in Word (Office XP) hoping that some clipart would be saved in a web-friendly format. (This was originally made in Publisher, NOT by me.) It didn't work; it saved the images as .wmz! For the web?!

      Anyway, there was tons of gibberish in the file, but it displayed fine in IE6. It was a completely blank page in Mozilla! Nothing at all! We always knew the XP didn't stand for cross-platform, but I didn't know it was this bad.

    3. Re:HTML from Word by CommandNotFound · · Score: 2

      it seems MS has even managed to persuade many Slashdotters that they are going to use open formats. Poor fools.

      Yeah, in less than five minutes on this thread I've seen the terms "rich API" and "framework", two of the biggest Microsoft-parrot terms of the last few years. BTW, I think it's safe to say that the word "framework" is on its way to be the official buzzword of 2003.

    4. Re:HTML from Word by jonbrewer · · Score: 2

      "Just look at an HTML file exported form Word2k. I would not call that compatible with any HTML I've ever learned"

      The trick is, does it validate to W3C specs? Last time I checked, though it was a disaster to look at, it did indeed validate.

      I frequently receive Word and Excel documents that need to be presented on the web. Generally I leave them as-is, storing them in a document management system and just serving metadata via the web, but on occasion I do a conversion. Though the HTML output from Word 2k is ugly, it is machine readable (for parsing and cleaning) and perfectly compliant.

    5. Re:HTML from Word by e8johan · · Score: 2

      "Let's also hope that although MS Word may produce bloated XML, it can still read and process well-formed, simple XML, as good as (or better than) it can read and process non-word files, such as HTML or TXT files."

      Wouldn't this just give Word an edge over all other XML producting wordprocessors? It just keep the one-way compatability to M$ products where they can read what others export to them, but no one can (fully) read what they produce.

    6. Re:HTML from Word by e8johan · · Score: 2

      IMHO:

      1) I think that all alternatives should try to read Word files properly.

      2) I think that all alternatives should support a proper open XML standard that will be truly interchangeable.

      3) I think that the alternatives need to provide viewers for this standard format in an easy-to-install and easy-to-use format for all Word users.

      4) I hope that M$ not will gain more advantages by polluting yet another standard format.

    7. Re:HTML from Word by fzammett · · Score: 2, Insightful

      Yeah, couldn't be that some people actually BELIEVE WHAT THEY WROTE, right??

      Why is it that every OSS zealot has to insist that any point of view contrary to their own is the result of a derranged mind?

      You want to try and convince me that Microsoft is evil and that I should shun absolutely anything coming out of Redmond and that I should embrace the OSS world? Fine, try and convince me. Do it logically and without insulting me. You'll find it's not that hard because I hate Microsoft anyway, but I don't hate every product they produce, in fact I very much like some of them (Win2K, Office in general as two examples).

      BUT DON'T FUCKING DO IT BY TELLING ME I'M A NUTCASE OR A PAID LACKEY OF SOME CORPORORATE ENTITY BECUASE I DON'T CURRENTLY AGREE WITH YOUR WORLD-VIEW!!

      Another group of people acted the way some of you people act... we fought a world war against them...

      --
      If a pion (n-) collides with a proton in the woods & noone is there to hear it, does lamdba decay into the source pa
    8. Re:HTML from Word by tshak · · Score: 2

      Although your point makes for a nice +5 on /., it bears little intelligence. First, XML is not like HTML - it's strict. Second, everything MS has done with XML has been open and has strictly supported standards. Finally, did you even read the article? The whole point of the article is that Tim Bray, on of the leading XML guru's, has commented VERY POSITIVELY based on beta versions that he's seen.

      So, before you antt-MS troll, try reading the article and maybe even thinking.

      --

      There is no longer anything that can be done with computers that is nontrivial and clearly legal. -- Paul Phillips
    9. Re:HTML from Word by HiThere · · Score: 2

      I'm not sure who was using that first, but I know that it was on the Mac years before it was on the PC. And it was standard in printing before that.

      It's quite plausible that I misunderstand what you mean, but directional quotes have a long history and are considered more readable than non-directional quotes. Imagine, if you will trying to read an expression that used non-directional parenthesis. Ugh! Now I'll agree that this is a more extreme example than the example of quotes, but not as much as is frequently assumed. Part of it is that we have just become habituated to the non-directional quotes.

      And for text processors that render a double quote as two single quotes, I have only one "word": ugh!

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
    10. Re:HTML from Word by e8johan · · Score: 2

      Even though I've read the article, I remain a sceptic. It wouldn't be much fun if everyone just agreed with the main article, would it!

    11. Re:HTML from Word by guybarr · · Score: 2

      Yeah, couldn't be that some people actually BELIEVE WHAT THEY WROTE, right??

      could be, but I did not say \forall pro-microsoft posts are from hired pen just that \Exist some.

      And MS used this tactic of hired liars to overcome OS/2, so I see no reason why they shouldn't do it again.

      Why is it that every OSS zealot has to insist that any point of view contrary to their own is the result of a derranged mind?

      you find that in my post, where exactly ?

      You want to try and convince me that Microsoft is evil and that I should shun absolutely anything coming out of Redmond and that I should embrace the OSS world? Fine, try and convince me.

      I say you should read /. posts with a HUJE grain of salt. /. is a rumors site, and is as vulnerable to outside manipulations (thru lobiers) as any discussion room, or conference. This is the meaning of my post.

      I said a short, single sentence, the amount of information you infer from it is quite, ahem, impressive.

      Do it logically and without insulting me.

      Where did I insult anyone in particular ? Or specificly you ? I didn't say you are a hired-pen (I don't know you at all), I said I belive MS hires some (as she did in the past)

      (fzammet said)
      BUT DON'T FUCKING DO IT BY TELLING ME I'M A NUTCASE OR A PAID LACKEY OF SOME CORPORORATE ENTITY BECUASE I DON'T CURRENTLY AGREE WITH YOUR WORLD-VIEW!

      like they write in chess: ?!

      again, you infer quite a lot of info regarding yourself from a small general sentence, then curse and shout and deny what you presume you've read.

      drink a glass of water, cool a bit.

      Another group of people acted the way some of you people act... we fought a world war against them...

      I don't know if this is more funny or alarming.

      1) Who are "these people" that I'm supposedly a part of ?

      2) Hey, you're equating me to the Nazis because I think there are liars and hired-pen on /. .

      no personal insult there ... also, IMHO, not a lot of common sense.

      --
      Working for necessity's mother.
  14. Typical XML-proponent mistake by Baki · · Score: 5, Insightful

    Just because the file format, instead of binary, is "human readable", does not make it more open.

    For "any programmer with a Perl script and a bit of intelligence" it doesn't make a difference if you read bytes (binary) or XML structures.

    As long as you don't get a DTD with extensive comments on how to interpret the elements, along with some promise/guarantee that the DTD won't change every minor release, there is no real improvement at all.

    The fact that XML is human readable is irrelevant, since no human shall read the files, but programs such as perl scripts shall. For them it makes hardly any difference; it is only marginally easier since you can use an existent XML parser instead of rolling your own (which is no big deal using the right tools such as YACC).

    This 'openness' comes at a good time for Microsoft. They suggest openness in a time that they are criticized and attacked because of file-format lock in. Many 'advisors' shall be mislead, blinded by buzzwords such as XML as they are, and actually believe that this solves the issue.

    1. Re:Typical XML-proponent mistake by smallpaul · · Score: 5, Interesting

      As long as you don't get a DTD with extensive comments on how to interpret the elements, along with some promise/guarantee that the DTD won't change every minor release, there is no real improvement at all.

      Have you ever tried to reverse engineer a binary file format? And have you ever tried to do the same thing with an XML file format? I learned huge chunks SVG yesterday _without_ opening an SVG book, just by mucking around in an existing SVG file and with an SVG viewer. Of course, Microsoft could do something clearly in violation of the spirit of XML, by making the whole thing one tag full of base64ed text or something. But as long as they use tags in a semi-sane way (which is the whole point, for integration with corporate systems), XML will be a big step forward.

    2. Re:Typical XML-proponent mistake by Baki · · Score: 3, Insightful

      One big difference: SVG was designed and is intended to be open and understandable. Office formats, using XML or not, are not. I do not believe MSFT would voluntarily cease their lock-in strategy.

      XML may be easier to reverse engineer, but must not be, this depending on how complex the DTD/Schema is and if the designer intended it to be easily understandable or not. Apart from that, as a purist I don't like reverse engineering, especially not if the subject of reverse engineering is from an uncooperative company known for its dirty tricks.

      A non XML grammar/syntax, if accompandied by a decent and documented EBNF description of it's grammar, is much better to base your program on than an undocumented XML.

  15. Re:Too good to be true by Masa · · Score: 5, Insightful

    Because it doesn't matter if everyone is able to read, modify and generate Office-compatible files. People will us Office products in future. Opening the file formats doesn't change anything.

    XML makes it easy to create programs that will depend on MS Office. So this only makes it easier to create programs which depend on Microsoft products.

  16. Proprietary file formats... by lanalyst · · Score: 2, Insightful

    It seems M$ has done their best over the years to protect their file formats... The implication now is Ballmer's enemy #1 (open office, ximian, koffice, star office, joe's office, etc) will be able to interchange documents seamlessly with M$ Office.

    I don't know about anyone else, but the reason companies hold onto M$ (like grim death) is they receive documents via email in M$ format - defacto proprietary format.

    There has to be an angle here. This can't be construed as a tactic to hold market share.

  17. Re:read through "EULA" in the XML? by McCall · · Score: 2, Interesting
    COULD it make it illegal to "reverse engineer" the document format? I can very easily see that if it could, microsoft could include a clause that explicitly prohibits GPL programs from interpreting the XML...

    No way. What happens when I recieve a MS 11 XML Word document on my Linux system via email. I haven't accepted any sort of EULA, and I can start hacking out the DTD straight away - which I must point out, a complex XML document is close to worthless without.

    They may prevent MS users from reverse engineering the documents on their MS OS's and I suppose they could even forbid users emailing their documents to other OS's (EULA's are great eh?) - but I doubt they will do this, it would cripple Microsoft Office.

    Andrew McCall.
  18. Re:Too good to be true by MrHanky · · Score: 5, Interesting

    Maybe they need a migration path away from the win32-based format they use now. .NET also seems to follow that path. Remember that MS needs access to other platforms than the i386/desktop in the future - mobile devices for instance. Keeping a format that is basically a binary image from a PC is good for locking out competition, but not when you have to start competing with yourself.

  19. Stalling tactics?... by pubjames · · Score: 5, Insightful

    Perhaps these announcements of XML compatible office file formats are just stalling tactics? MS has done it before.

    MS now has a serious competitor in StarOffice/OpenOffice.org. And that competitor has two compelling advantages - it's cheaper/free, and open XML file formats. So when clued-up IT people say to their Pointy-Haired Bosses that they should use StarOffice/OpenOffice.org, PHBs can respond "but MS is doing that next year. We can avoid all the disruption of changing office suites just by waiting a bit and upgrading to the next version of MS Office. Besides, we're already paying for it." Then when MS actually releases Office 11, they will have used all sorts of devious and subtle devices to keep their lock-in of the file format, and MS and PHBs will be happy.

    1. Re:Stalling tactics?... by tshak · · Score: 2

      Perhaps these announcements of XML compatible office file formats are just stalling tactics? MS has done it before.
      Did you actually READ the article? Of course not! I'm sick of moderators giving +5's to A) redundant posts in this thread and B) clueless posters who haven't even read the article. No, this is not a stalling tactic, Tim Bray and other 3rd parties have SEEN this and are very excited about it. This is not vapor-ware or market-ware, it's real, it'll be out, and you'll be able to parse it.

      --

      There is no longer anything that can be done with computers that is nontrivial and clearly legal. -- Paul Phillips
  20. Well Excel in Perl is pretty easy now by twoshortplanks · · Score: 5, Informative
    I've used the excel reading and writing modules for Perl with great success. They're easy to use and do the job. (there are also simpler interfaces if you want them too.)

    Or you could go the whole hog and use a SAX writer like XML::SAXDriver::Excel to create the documents from XML yourself.

    (This is not to say I don't think XML native formats arn't cool and will have many uses, I'm just pointing out what you can do now.)

    --
    -- Sorry, I can't think of anything funny to say here.
    1. Re:Well Excel in Perl is pretty easy now by twoshortplanks · · Score: 3, Informative
      Why not just have Excel export the file as CSV?
      Oh, you can do that...but I've come across numerous problems while doing this. For a start, you lose the metadata about cells (i.e. if it's a formula or a string or a number with $foo number of decimal places.) You have problems associated with using multiple workbook speadsheets (annoying if you've ever had to use them.) CSV is okay (and I've used it quite a bit) but it simply doesn't hold as much info as the original file.
      --
      -- Sorry, I can't think of anything funny to say here.
    2. Re:Well Excel in Perl is pretty easy now by twoshortplanks · · Score: 2

      Yes, I've seen people do this. We're using ParseExcel and WriteExcel as our servers arn't running a Microsoft OS (and arn't near ones that are) and both these modules work file under FreeBSD.

      --
      -- Sorry, I can't think of anything funny to say here.
  21. Too good to be true? by varslot · · Score: 2, Interesting

    The article states that:

    "The important thing," he explains, "is that Word and Excel (and of course the new XDocs thing) can export their data as XML without information loss..."

    Does this mean that MSO will have the same support for XML as currently for RTF? In that case I'm not that excited. If the default will be to save as MS-word format, and not XML (or MS-XML as the case may be), then we are no better off. Only Microsoft is, as they are now able to import OpenOffice/StarOffice documents.

    It's sort of like when Word could read WordPerfect documents in the old days.

    --
    There arises from a bad and unapt formation of words a wonderful obstruction to the mind. (Francis Bacon)
  22. What I heard.... by LarsBT · · Score: 3, Interesting
    I can't remember the reference, but I heard that they will embed binary code for different word-objects within XML tags e.g.

    <equation> 0100100100111101010011010101101010010 </equaition>
    which is allowed in XML (if I understand XML correctly). So not much gain if everything is still in propriety closed binary format.

    I think maybe it was the CEO of Microsoft Denmark. I'm NOT sure though

    1. Re:What I heard.... by AnEmbodiedMind · · Score: 2, Funny
      More like
      <?xml version="1.0" encoding="UTF-8"?>
      <!DOCTYPE msword PUBLIC "-//W3C//DTD WORD 1.0//EN"
      "http://www.microsoft.com/word11.dtd">
      < worddoc >
      <![CDATA[ ??????????You'd be lucky????????? ]]>
      </worddoc>
      ;)
  23. Skeptical by PizzaFace · · Score: 2, Funny
    Three questions about Word's XML format:
    • How's it encrypted?
    • Do I need a Passport account to open it?
    • Thank you, sir, may I please have another?
  24. The new Word XML document format: by Bazman · · Score: 5, Funny


    <uueWord2kDocument>
    M"@D)("!'3E4 @3$E"4D%262!'14Y%4D%,(%!50DQ)0R!,24-%3 E-%"@D)("`@
    M("`@(%9E7)I9VAT("A#*2`Q.3DQ
    M($9R96 4@4V]F='=A6]N92!I2!A;F0@9&ES=')I8G5T
    M92!V97)B871 I;2!C;W!I97,*(&]F('1H:7,@;&EC96YS92!D; V-U;65N="P@
    </uueWord2kDocument>

    1. Re:The new Word XML document format: by jsse · · Score: 2, Insightful

      Don't laugh yet. That's exactly what'd be happening.

      The new document just needs to have their meta-tags comply with XML, the rest could still be obscure junky as show above.

    2. Re:The new Word XML document format: by Bazman · · Score: 2

      Plus I now realise my uuencoding is broke. There were some '>' signs in the output that didn't get through! Never mind. MS will probably use the same format as tnef attachments :)

      Oh by the way, it was an uuencoding of part of the GPL...

      Baz

    3. Re:The new Word XML document format: by Alsee · · Score: 2
      UUE?!? Christ man! Get with the times!
      <?xml version="1.0"?>
      <yEncWord2kDocument>
      ))=J*:tpsp *++**+*+**)(*Ts????R|SJtzoqJv?????£VJ ??????J[V_V^V]`*)*m*0./0/.00/011024:44334>896:A>B BA>@@DGOIDEMF@@JVJMPQSTSCIWZW
      RZORSR)*m+111424=} 44=}RD@DRRR
      </yEnc2kDocument>
      -
      --
      - - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
  25. Syntax vs. Semantics by mindriot · · Score: 5, Insightful

    Yes, the point of XML files is that their _syntax_ is simple and easily parseable by computers. But that doesn't tell you anything about the _semantics_ of a document. And as long as there is no proper documentation on what the mess of tags in your XML file means, there's hardly any way for you to hack together a Perl script to, say, extract plain text, or convert the Word XML file to an OpenOffice.org XML file, or whatever else comes to mind.

  26. Other MS products using XML by Anonymous Coward · · Score: 2, Insightful

    Other MS products that use XML (Visual Studio.net, for example) actaully do it quite well. The VS.net generated XML, including project files, is clean and very readable.

  27. C'mon People by BurritoWarrior · · Score: 3, Insightful

    Office's MS-XML will be even less compatible with sthe spec than MS-Kerberos or MS-Java/J++. Office is their cash cow. It brings in 30-40% of their revenues all by itself.

    If you think there is even a remote chance in he-double L that MS will loosen their grip on this revenue stream, I have a bridge to sell you.

    You can call this flamebait if you want, but what in MS's history would lead me to believe they are suddenly going to change their historic behavior pattern AND risk a huge amount of revenue at the same time?

  28. I can see it now... by dimator · · Score: 3, Funny
    <?xml version="1.0"?>
    <document type="word">
    <![CDATA[
    @%MYD<V@Q4VEA8^`!AX0>DN6UIJE=^1J;1F\ @! (P@@<$Y(@OL%AS`0B=$<S*
    4&A399HT2*S-@*+U&1)+KCS>J4 HJTZ=^F534G%_S8\6=YS7?#59_.U!YI[_^
    AU$`HOG^5/N3A9 9'<\V/YP`(T*'MZ6)3UVSCDYF&+B;0H?7I3'O7'(2/H(Z>
    U= ;N1:`!4*"4U/ATNK5GOO+^B\O?/\QK^3KE>KVYL"PN-3O2'/^9 3/U)I.PP
    FXG3%*.RR)0.R'/&N!?>U'*;4FK6B,U:B<4@-6O% 1D!!%Z/31&E(R*MCU,HH
    15RT`H`P2H$,O5FB!R,`"*`!J5FJ -4@TNEB5)E:'"D;AO4.?>-Z1FGVQN"3O
    VN6RANM76P&((F=# 3GYM05%C;E%C1F[MQ>P:*".O*3VW,<-9`T:D.^O2BE@*
    4N25 U@$0X#X!(B8*+H-1(3'!Y9'%ZF1B%7P9E#"^90&U72560M1E`R F$1;4$
    :%/(I$JY3"67*"&E5,4&X.2>R]!F@"#7VLH>;5`>@( "`!IX4A`FK)LG*7O%D
    P^$G)10Y"^L:FO_^\,GTP-"V:_R/GL %-,**[?^UIWRK2YT.;70-KW8.LG;)[
    ]]>
    </document>
    --
    python -c "x='python -c %sx=%s; print x%%(chr(34),repr(x),chr(34))%s'; print x%(chr(34),repr(x),chr(34))"
    1. Re:I can see it now... by dmiller · · Score: 4, Redundant

      This is probably dead-on, except it will be:

      <document type="word">
      <ole><![CDATA[ (linenoise) ]]></ole>
      </document>

      I.e OLE blobs embedded in an XML container

    2. Re:I can see it now... by Jester99 · · Score: 2

      Hm. If that's what the office docs will look like, then I think it should be pretty easy for perl hackers to use them... it seems to me that they would just need to strip off the tags, and then they've got a valid perl script that formats the document automatically. ;)

  29. They already did this for two other products by pvera · · Score: 5, Interesting

    SQL Server has had an XML web gateway since version 2000. You can run any query and output it as xml or have an xml template pull the query and transform the results with XSL, all without one line of server side script.

    ASP.net uses XML for all the human-readable files, and the IIS in windows.net server finally uses Apache-style configuration files which are also XML.

    --
    Pedro
    ----
    The Insomniac Coder
  30. Yeah, right by Alex+Belits · · Score: 3, Insightful

    XML is a format with nearly infinite possibilities for obfuscation, convolutedness and poorly defined standards. The most we can expect is the possibility to validate a file to absolutely certainly determine if it is compliant with the new Word format or not.

    --
    Contrary to the popular belief, there indeed is no God.
  31. M$ only cooks with water too. by Qbertino · · Score: 3, Insightful

    I'm working with that weedy Word 2k at the office. And we use Outlook as a standard communication Platform. Believe me, that their Software often is such a pain isn't that much of a greater plan to rule the world, but more the flat-out ineptitude of delivering products with a conceptual consitency.
    Looking at Frontpain and Word HTML and extrapolating XML from that, tells me they're gonna do just a crappy job as usual and really think they've done a great thing.
    Just like the people sending me source code additions and DB content as Wordfiles. Nothing but simple inemptitude, I say.
    Not that my System of choice, Linux, is that much more consistent. Mind you. With a bazillion Font methods, every single one of them looking crappier than the next and QT, GTK+, Motif, Lesstif, Inbetweentif, Swing, TK and whatnot and none of them following the same Clipboard behaviour it's just as weedy. Only it is under *my* control to change it.
    That way, the bottom line is: With OSS if it doesn't work, there's another way. With M$ it's 'Game Over' with the first "Error in module [fill in random hexcode here]".
    That's the simple difference.

    --
    We suffer more in our imagination than in reality. - Seneca
  32. what a cool codename by Anonymous Coward · · Score: 4, Funny

    code-named 'Office 11'

    awesome. Apparently the next version of the linux kernel is code named 2.6! Wow!

    1. Re:what a cool codename by joshuac · · Score: 2

      ---snip
      It's called "Office 11" so they can release it right before Apple releases OS 11, and thus will look "behind the times" compared to MS.

      ---snip

      Nice theory, but it actually is their 11th release of Office. Trying to make Apple look "behind the times" probably did not even occur to them. Click help/about in an office application to see it's version. Word 2000 is part of Office 9.

  33. How to convert Word to XML by Korth · · Score: 5, Informative

    I've recently been reviewing a dozen of different software to convert from Word to XML.

    So far the best tool I found is upCast (free for personal use) from http://www.infinity-loop.de/ .

    To convert a Word file:
    * Use Word's AutoFormat feature to convert visual formatting to Word styles
    * Redefine all the text as Word styles
    * Run upCast to convert to XML using the "XML (content, no DTD)" filter
    * Run HTML Tidy from http://tidy.sourceforge.net/ with the parameters -xml -utf8 -clean -bare .

    Other tools that might be worth a second look:
    * Majix (Open Source) - http://www.tetrasix.com/
    * WorX SE - http://www.xyvision.com/
    * XML MarkupKit (in German) - http://www.eds.schema.de/download/MarkupKit/
    * DocSoft LLC Word-to-XML - http://www.docsoft.com/w2xml.htm

    1. Re:How to convert Word to XML by frank249 · · Score: 2

      To convert MS Word docs to XML, I just open the doc in WordPerfect 10 than save to XML. Whats the big deal? WordPerfect has been able to do it for the past couple years. BTW, It also publishes to pdf.

      --

      Today's vices may be tomorrow's virtues.

  34. Hype! Hype! Hype! by RobotWisdom · · Score: 5, Interesting
    This article is pure PR, with no new content. The XML-cult will keep waving their hands and promising great payoffs 'RSN' (real soon now) until people actually start trying to implement uniform semantic tags in their data and documents... at which point universal disillusionment will set in because the problem is way too hard even for trained AI-PhDs. [more]

    The thread a couple of weeks ago about the death of META headers will apply 1000 times worse for semantic tags-- if the semantic web is going to work at all it needs to start from headers describing the webpage as a whole.

    (Also, what's with XML-Journal's claim the article has three pages when it only has two?)

    1. Re:Hype! Hype! Hype! by greenrd · · Score: 2
      From your linked page:

      The central problem of AI is to find a finite vocabulary that can be used to express any idea.

      MS's promises have nothing whatsoever to do with "understanding" the semantics of a letter to your girlfriend or whatever and expressing your sentiments as an XML tree. If you think it is, you have failed to understand the article! It is not an attempt to mark up semantics, it is an attempt to convert things like bold, italic, font size into XML representations.

    2. Re:Hype! Hype! Hype! by RobotWisdom · · Score: 2
      "MS's promises have nothing whatsoever to do with 'understanding' the semantics of a letter to your girlfriend or whatever and expressing your sentiments as an XML tree. If you think it is, you have failed to understand the article!"

      This is just the old XML bait-and-switch again. Bray writes of "all sorts of wonderful new things [that] can be invented". TimBL touts the Semantic Web as the immediate justification of XML.

      "It is not an attempt to mark up semantics, it is an attempt to convert things like bold, italic, font size into XML representations."

      No, you are simply wrong.

  35. Why do people get so excited about XML? by Viol8 · · Score: 2, Insightful

    Yes so its portable. Yes so its (mostly) human readable. So what? So is GWBASIC. XML is just a data description format (I wont grace it by calling it a language , its not) and there have been plenty of portable DDFs in the past. Pdf , postscript (though the latter is actually a language). So why all the hoo-ha about XML? Seems to me that various marketing types have jumped on the bandwagon with this one and are going to ride it till the wheels fall off and take all the suckers along with them.

  36. Bigger picture by Cheese+Cracker · · Score: 3, Insightful

    Look at the bigger picture of where Microsoft is heading. They're diversifying their line of business.
    In the past, MS Office was the cash cow at Microsoft, but the market for office packages is rather
    saturated... companies and governments are looking for cheaper alternatives etc. Not much room to
    grow. Now they can afford playing the good guys by opening up their file formats, since they got
    new markets to capture... mobile phones, handheld computers, home entertainment etc.

    1. Re:Bigger picture by Melantha_Bacchae · · Score: 2

      Cheese Cracker wrote:

      > In the past, MS Office was the cash cow at
      > Microsoft, but the market for office packages is
      > rather saturated... companies and governments are
      > looking for cheaper alternatives etc. Not much
      > room to grow.

      How quickly you all forget. Office 11 is to be on the subscription plan. Microsoft said so long ago, and Licensing 6 makes it reality.

      > Now they can afford playing the good guys by
      > opening up their file formats, since they got
      > new markets to capture... mobile phones,
      > handheld computers, home entertainment etc.

      Now they have new markets to subsidize. They need their cash cows more than ever. This Christmas season could be the demise of the X-Box, long before it is ever paid off.

      Of course the customers mostly saw Licensing 6 for what it was and two thirds of them refused to be exhorted of "unearned profits" on a regular basis.

      That's the ironic part about thousand year kingdoms: when they barely last a day. ;)

      Shinoda: "The age of Millennium."
      Io: "What does that mean?"
      Shinoda: "A thousand year kingdom. It wants to create a home for itself. There is one flaw in its plan: Godzilla."
      "Godzilla 2000 Millennium" (Japanese version)

  37. What we need is a ISO standard by javilon · · Score: 5, Interesting

    The open office group should get together with the rest of the guys (abyword, koffice and maybe wordperfect) and work out a format that can be submitted to the ISO. Possibly based on the open office format.
    Then goverments and corporation will adopt it for official documents so they can read their own documents in ten years.

    --


    When his defense asked, "Which computer has Jon Johansen trespassed upon?" the answer was: "His own."
    1. Re:What we need is a ISO standard by pubjames · · Score: 2

      The open office group should get together with the rest of the guys (abyword, koffice and maybe wordperfect) and work out a format that can be submitted to the ISO. Possibly based on the open office format.
      Then goverments and corporation will adopt it for official documents so they can read their own documents in ten years.


      You are absolutely correct sir. This is something I've been ranting about for ages.

    2. Re:What we need is a ISO standard by AnEmbodiedMind · · Score: 2, Informative
      From OpenOffice:

      The OpenOffice.org XML project contains support for and implementation of the XML based file format.

      Mission
      Our mission is to create an open and ubiquitous XML-based file format for office documents and to provide an open reference implementation for this format.

      Core Requirements (these items are absolutely required)

      • The file format must be capable of being used as an office program's native file format. The format must be "non-lossy" and must support (at least) the full capability of a StarOffice/OpenOffice document. The format is likely to be used for document interchange but that use alone is not enough.
      • Structured content should make use of XML's structuring capabilities and be represented in terms of XML elements and attributes.
      • The file format must be fully documented and have no "secret" features.
      • OpenOffice must be the reference implementation for this file format.
      Core Goals (these items are highly desired)
      • The file format should be developed in such a way that it will be accepted by the community and can be placed under community control for future development and format evolution.
      • The file formats should be suitable for all office types: text processing, spreadsheet, presentation, drawing, charting, and math.
      • The file formats should reuse portions of each other as much as possible (so for example a spreadsheet table definition can work also as a text processing table definition).
      Standardization and Inter-Office Cooperation
      There is a office_standards mailing lists hosted on this site, intended to foster cooperation between the various office suites. At this early state no results have been achieved, but we are certainly excited about the prospects. For details, look at http://xml.openoffice.org/standardisation/ .
      Its on its way... maybe
    3. Re:What we need is a ISO standard by Anonymous Coward · · Score: 2, Informative

      This is already in hand. Sun are taking the OpenOffice XML file format to OASIS for standardisation. Something should be announced about the formation of a working group on this real soon now.

    4. Re:What we need is a ISO standard by pubjames · · Score: 5, Insightful

      The ISO tried it. It was called ODA
      and was a complete failure.


      So? Formats come and go all the time. Just because the ISO failed in the early nineties doesn't mean someone else would fail today.

    5. Re:What we need is a ISO standard by pubjames · · Score: 3, Interesting


      This may interest you:

      http://www.1dok.org/eng/index.html

    6. Re:What we need is a ISO standard by Jason+O'Neil · · Score: 2, Interesting
      That's actually a really good idea. If all the OSS Word Processors created a file format that worked seamlessly from program to program, it would be a major plus for all the smaller word processors.

      It would allow for competition in Linux word processors, without having to worry about file format compatibility problems.

      Then if someone just creates a script which converts MS Office docs (on mass, like every one inside the directory structure) to this wonderful new format (Should be possible thanks to Open Office) and it would be much easier to then switch to OSS.

      I personally have no problems with the current open office format, but if they made it human readable, so it can be created from plain text editors if necessary...

      Quick somebody suggest it to them

  38. Nice, but redundant statement by cwernli · · Score: 4, Funny

    any programmer with a Perl script and a bit of intelligence

    and I thought intelligence was a prerequisite to be able to handle perl ? :)

    1. Re:Nice, but redundant statement by tanveer1979 · · Score: 2
      any programmer with a Perl script and a bit of intelligence
      and I thought intelligence was a prerequisite to be able to handle perl ? :)

      But you can have a perl script even if you dont know how to handle perl. This is the beauty of perl, intution, a person who dosent know how to handle perl, but has used a bit of regex(read intelligence) can actually make a perl script work.

      --
      My Aurora : http://www.youtube.com/watch?v=o91ZsGwJYyg
      FB : https://www.facebook.com/TanveersPhotography
    2. Re:Nice, but redundant statement by poot_rootbeer · · Score: 2


      You obviously haven't had to work with the kind of legacy code I have...

  39. Incompatible XML? by Dexter77 · · Score: 2, Redundant

    It's very easy to make an XML document that can't be processed with any common parser library. It will make programmers work extremely hard if they have to make different XML parser for M$-XML.

    Now if the M$-XML isn't compatible with the standard XML what's the use? You still have to save it in M$-XML format to be able to use it with Word. If most coders want to use M$-XML it might even brake down XML standard since there are more Word documents in the world than XML documents put together!

  40. Re:MS moderators? by pubjames · · Score: 2


    Why has this post been moderated as a troll? There is nothing trollish about it at all.

  41. Re:Too good to be true by bokmann · · Score: 5, Insightful

    Except I will look to xml.openoffice.org to write some xslt transformations to take Microsoft office documents and liberate them once and for all.

    Once I can move my team of 20 people to open office with no real worries or complaints about 'interchanging' files with lusers still using Microsoft, I will.

    BUT, have you ever looked at an HTML file generated by Microsoft word? It is a GREAT example of how they can pollute a standard into something unreadable.

    I suspect that they will copyright or otherwise lock up their DTD/Schema, and try to lash out at anyone that uses them in other than 'approved' ways.

  42. Is he crazy? by Lumpy · · Score: 2, Redundant

    when the huge universe of MS Office documents becomes available for processing by any programmer with a Perl script and a bit of intelligence, all sorts of wonderful new things can be invented that you and I can't imagine.'"

    so a new software release will "magically" convert every document ever made to XML? I dont think so. The fact that they will finally have compatability with the rest of the planet is nice, but I'll bet a $100.00 that they will bastardize xml to their liking just like how they did it with IE and HTML.

    --
    Do not look at laser with remaining good eye.
  43. Re:Whats wrong with html/css2 ? by Jugalator · · Score: 3, Informative

    Whats wrong with HTML and CSS2 for all your word processing?

    I don't think the new XML format is meant for documents you wish to publish on the web. Office already support the HTML format pretty well (with some extensions.. ahem) since Office 2000. HTML support works even better in Office XP since it allow you to save the document as "filtered HTML", where Office filters most of the Office-specific tags and attributes at the cost of loosing some information in the document.

    I think the XML format is being added since XML represent the document with a much more meaningful structure that's easier to parse by third party software for use in electronic commerce and other automated systems, something that's inappropriate to use HTML code for, as it was designed to make pretty layouts, not to describe content for easy parsing.

    I think it's pretty obvious why MS would want to add XML support - to spread their Office document format and make Office useful in places such as web services where it wouldn't be as useful before.

    --
    Beware: In C++, your friends can see your privates!
  44. Political XML by Wheelie_boy · · Score: 2, Insightful

    Looks like M$ has found a way to placate those various governments that are beginning to insist on open file formats for data storage.

  45. There is some documentation of Office XML already. by frleong · · Score: 5, Informative
    Here at MSDN

    It is simply not what others is claiming: <?xml version="1.0"><data>blahblah</data>

    --
    ¦ ©® ±
  46. MS-standards (oxymoron?) by billybob2001 · · Score: 2, Flamebait
    There appears to be some confusion surrounding M$ using the acronym XML.

    Rather than assuming this relates to eXtensible Markup Language, consider the following insider information:

    M$ have been basing their business model on XML for years.


    It stands for Kiss My License!


    X

  47. What are you all complaining about? by nmg196 · · Score: 5, Insightful

    Microsoft is switching from a proprietary file format, to XML, and the first 100 comments are all flaming MS. WTF does it take to make you people happy?

    They've already shown with .NET that they can make an entire programming framework (and at least 3 assocated languages) into an open standard and even have them ratified by the ECMA and maybe even ISO. Because of this people have already managed to port Perl, Python and many other languages to this framework before it even came out of beta! The guys at Ximian have even managed to port quite a bit of the framework itself as part of the Mono Project.

    So perhaps instead of perpetually slating Microsoft, you could get off your arse and do something useful instead.

    Nick...

    1. Re:What are you all complaining about? by nmg196 · · Score: 2, Interesting

      "somewhere" - that really good reliable source of information.

      "about MONO, we'll see" - go and see then - you only had to click the fscking link that I put there for you. Even a Windows user should be able to manage that.

      "all kinds of IP rights" - and you reckon Sun doesn't have those for Java?

    2. Re:What are you all complaining about? by TummyX · · Score: 3, Informative


      They've already shown with .NET that they can make an entire programming framework (and at least 3 assocated languages) into an open standard and even have them ratified by the ECMA and maybe even ISO.


      That's not true. Only C# has been submitted to ECMA. VB and JScript.NET have NOT.

      The CLI submissions are only a small subset of the .NET framework. This is for a good reason, most of the .NET framework relies on Windows services (System.DirectoryServices, System.Windows.Forms, System.EnterpriseServices, ...).

      C# and the CLI does NOT make up a platform like Java. It's more like C. Both C# and C provide a basic set of classes. Anything more 'advanced' is provided through extension libraries that may or may not be cross platform (just like C). You could write a sound library for C# that uses DirectX and it would only work on Windows. On the other hand, you could write a sound library for C# that uses OpenAL. It would work on all platforms where OpenAL is supported.

      Many features that Java has such as GUIs, Telephony, Speech, Sound, 3D etc aren't supported by .NET and certainly won't be standardised. Sound support will be added by Microsoft in the future but it will use DirectX (obviously NOT cross platform).

      The cross platform hopes for C# pretty lie in OSS hands. It is up to the OSS community to write 'standard' cross platform libraries for C# (just like we have for C). C# interfaces nicely with C so it is likely that many cross platform libraries for C# will use the corresponding C libraries.

      As you can see, the CLI is much more like C+GLIB than the "Java Platform".

      Java is a meta-operating system. It a huge set of APIs consistantly on all platforms.

      C#/CLI does not always provide a consistant API on all platforms but it allows and encourages you to rely and exploit on the native APIs available on the underlying operating system.

      Which is better? It really depends on what you want. Java is obviously the only choice for cross platform development (atm). C# however appears to be a good replacement for C -- especially on the client side. It complements the underlying operating system whereas Java tends to hide it. That's why you will see a lot of C#/GTK# applications for Gnome in the future but not many Java/GTK applications.

    3. Re:What are you all complaining about? by King+of+the+World · · Score: 2

      The only parts submitted to ECMA are C# and the CLI which excludes Windows Forms, Web Forms, and many base libraries which are proprietary and owned by Microsoft. As Nelson would say, HA! HA!

  48. Well, that's "Embrace" taken care of... by Queuetue · · Score: 2

    I presume we can expect "Extend" at Office 11's release, and then we can pencil in "Extinquish" sometime late next year?

    Is that good for everybody?

  49. Re:umm by Arimus · · Score: 2, Insightful

    As he is one of the people responsible for XML and Office 11 is going to be using XML as its native file format have you spotted the link (hint think of three letters...)

    That aside, if MS do adopt XML as their file format AND they don't screw the way the HTML formatted output did then it is about time, and I would imagine that the people who came with XML are going to be happy to see their work being put to good use.

    --
    --- Users are like bacteria -> Each one causing a thousand tiny crises until the host finally gives up and dies.
  50. Re:Information wants to be free by djupedal · · Score: 2

    ...like I said.

  51. Export to HTML... by Anonymous Coward · · Score: 2, Insightful

    ...to see where they're going with this. Word has been exporting to HTML, which is really some funky XML/XHTML with stylesheets that IE can read and display, for a while.

  52. the most wonderful thing... but it's not happening by g4dget · · Score: 3, Interesting
    The most wonderful thing that would happen would be that people can finally dump that messy piece of software and move to a better toolset.

    Unfortunately, Microsoft won't let it happen. The data may be "in XML", but that doesn't mean you can read it or generate it well. Instead, Microsoft will give you just enough to serve their business interests and nobody else's.

    How? Office will probably stick undocumented base64 encoded binary stuff into the output, containing formatting information. You can use the document content, for example, with a database, but you can't load it into another word processor and preserve all the formatting. And in the other direction, sure, you can generate simple documents that Office will import, but you can't generate arbitrary Word documents--they will, again, have weird, undocumented tags and binary stuff.

    In short: don't hold your breath. Microsoft isn't stupid.

  53. Comment removed by account_deleted · · Score: 2

    Comment removed based on user account deletion

  54. Lose the fight, win the war by PackMan97 · · Score: 2, Insightful

    Sure, IBM lost control of the PC market...but is that better than what's happened to Apple?

    Let's go back in time to 1985 and you can choose which company to invest in...IBM or Apple. Hmmm...tough choice isn't it? Their stocks have both appreciated almost the same amount since then! Shocking isn't it.

  55. Re:Whats wrong with html/css2 ? by whovian · · Score: 2

    I don't think the new XML format is meant for documents you wish to publish on the web.

    Just being curious here and not a troll, I thought mozilla supported XML. Take a look at this page, where it appears that XML style sheets can be used to impart some BibTeXian features to information perhaps meant for the web. It looks potentially very useful.

    --
    To-do List: Receive telemarketing call during a tornado warning. Check.
  56. Re:There is some documentation of Office XML alrea by eetu · · Score: 2, Interesting

    The document at MSDN doesn't seem to have anything to do with MS Office 11 or the new "built around XML" Office file formats. It simply explains how files can be imported to/exported from Access and Excel of MS Office XP.

    --
    "If I can't have a revolution, what is there to dance about?" - Albert Meltzer
  57. One word, DCMA by N8F8 · · Score: 2

    They havn't opened the office document standards, they might just make then more parsable. You would still be breaking the law if you built a product with ability to parse an office document without paying a MS royalty.

    --
    "God fights on the side with the best artillery." - Napoleon, Marshal of France - speaking truth to power
  58. Re:Too good to be true by gpinzone · · Score: 2

    I agree with you, but... Considering that HTML really doesn't do a lot of the things that can be done in Word like headers, footers, etc., they couldn't have just used strict HTML 4.01 without losing a boatload of features. Yeah, you could come up with a way to get an HTML file to look like a Word document, but then you'd have to break away from the whole Word Processor paradigm and embrace a system more like PDFs. Too many incompatibilities.

  59. You omitted one by Black+Perl · · Score: 2
    Check this out:

    YAWC Pro (http://www.yawcpro.com/)

    This can output XML according to any DTD (by default it uses the Simplified DocBook DTD).

    --
    bp
  60. Competition from below forces Microsoft to open up by Jeppe+Salvesen · · Score: 2

    Openoffice is XML-based, and extended into suit-compability by StarOffice. It is to my best knowledge rather xml-based, easily parseable and well documented.

    That alone is a unique feature that adds a lot of value to openoffice in the medium to long perspective. Microsoft would certainly not risk one of their big cash cows by clinging too tightly to their paradigms. They are many things, but not they are not complete idiots.

    So, opening up the format would remove some of the reasons why customers might want to migrate to other systems.

    It's a defensive move, really. A rather good one for all parties, too, especially if they refrain from their anti-open-source licensing. If they allow open source projects to process their documents, we will add value to their product. I certainly hope they will see it this way, though I'm not convinced.

    --

    Stop the brainwash

  61. Bye bye excel by plopez · · Score: 2

    Who needs XML?

    my $handle= new Win32::OLE('Excel.Application.9') || die "died: $!\n";
    #version 9 is ofc 2k, version 10 is office xp.
    if($source_file=~/\.xls$/i)
    {
    $handle->Workbooks->Open($source_file);
    my $worksheets_count=$handle->Sheets->Count;
    #print "Count: $worksheets_count\n";
    #note that a) excel sheet tabs are #numbered from '1'
    #(YAR VBA should not be considered a real #programming
    #language)
    #and b) for my purposes the first 3 were garbage. #Season to taste.
    for(my $i=4;$iActiveWorkbook->Worksheet($i);
    $sheets->Ac tivate;
    my $temp=$source_file;
    $temp=~s/\.xls$//i;
    my $target_file= $temp . "S$i" . '.' . "txt";
    #-4158 is the MS magic number for tab delimited.
    $handle->ActiveWorkbook->SaveAs($targe t_file,-4158 );
    #not quite sure what the line below does any more.
    $handle->{XLSaveAction}=2;
    push @target_names,$target_file;
    }
    $handle->ActiveWorkbook->Close(0);

    This is one of the things I put under the ruberic of 'Stupid Perl Tricks' Saved as text and data locked in a SS can then be easily imported into any database. After assorted data munging to normalize it, of course...

    --
    putting the 'B' in LGBTQ+
    1. Re:Bye bye excel by plopez · · Score: 2

      Damn the tiny /. interface hosed my script. the for loop line should read:

      for(my $i=4;$i=$worksheets_count;$i++)

      --
      putting the 'B' in LGBTQ+
  62. OpenOffice is XML, now! by magi · · Score: 5, Informative

    Doing XML stuff with OpenOffice is supergreat. It took me half-an-hour to study the format enough to write a XSLT parser that extracts all strings from an OO document.

    Now I wrote, just for demonstration, the following XSLT example in just a few minutes, useable directly with xsltproc in Linux.

    The example prints all the Heading paragraphs in a OO Writer document, indented according to the header level.

    <?xml version='1.0'?>
    <xsl:stylesheet
    xmlns:xsl="http: //www.w3.org/1999/XSL/Transform"
    xmlns:office="ht tp://openoffice.org/2000/office"
    xmlns:style="htt p://openoffice.org/2000/style"
    xmlns:text="http:/ /openoffice.org/2000/text"
    xmlns:table="http://op enoffice.org/2000/table"
    xmlns:draw="http://openo ffice.org/2000/drawing"
    xmlns:fo="http://www.w3.o rg/1999/XSL/Format"
    xmlns:xlink="http://www.w3.or g/1999/xlink"
    xmlns:number="http://openoffice.org /2000/datastyle "
    xmlns:svg="http://www.w3.org/2000/svg"
    xmlns:c hart="http://openoffice.org/2000/chart"
    xmlns:dr3 d="http://openoffice.org/2000/dr3d"
    xmlns:math="h ttp://www.w3.org/1998/Math/MathML"
    xmlns:form="ht tp://openoffice.org/2000/form"
    xmlns:script="http ://openoffice.org/2000/script"
    version='1.0'>

    <xsl:output method="text" encoding="ISO-8859-1"/>

    <!-- Print all headings, indented. -->
    <xsl:template match="text:h">
    <xsl:value-of select="substring(' ', 1, (@text:level - 1) * 2)"/>
    <xsl:text>* </xsl:text>
    <xsl:value-of select="text()"/>
    <xsl:text>&#xa;</xsl:text>
    </xsl:template>

    <!-- Don't output any other text. -->
    <xsl:template match="text()">
    </xsl:template>
    </xsl:stylesheet>

    The result would be something like:

    * Top-level heading such as a chapter
    * Second-level heading (section)
    * Another section
    * Subsection
    * Subsubsection
    * Yet another section

  63. Re:Man, I hope by arkanes · · Score: 2

    You mean I just have to logon using "sadmin" and I'll have total access to the file system?

  64. No, it doesn't by alispguru · · Score: 3, Interesting

    Look up at this. Putting information in XML makes the first baby step of reverse engineering easier, nothing else.

    XML helps only if the creator of the document wants the information to be easily accessible by programs other than their own.

    --

    To a Lisp hacker, XML is S-expressions in drag.
    1. Re:No, it doesn't by JordanH · · Score: 2
      • Look up at this [slashdot.org]. Putting information in XML makes the first baby step of reverse engineering easier, nothing else.

      I think calling it a baby step is an exaggeration.

      It's far far easier parsing XML documents with tags vs. some binary format. Without XML, you have no idea, whatsoever, of the size of the fields of data you are dealing with.

      For the purposes of reverse engineering, it's the roughly the difference between having source code and a binary executable.

      Some people have experimented with Source Code obfuscators, sometimes called Shrouds, but have found that these are always reverse engineered rather quickly due to the availability of Source Code parsing tools. While binary executables are sometimes reverse engineered, I would hardly characterize the difference as being just baby steps away.

      In any case, it will be very difficult for MS to justify purposeful obfuscation of the XML. If they do this, it will give competitors more ammunition that MS talks open but really is lock-in.

      There are already really good tools that will diff XML docs for you. A fairly junior programmer, working alone, with those tools, could discover the meaning and data types of all the tags with a little exploration.

      If MS does come out with XML documents, they will be reverse engineered really quickly, I'd bet. The binary Word formats, by comparison, often take quite a while to reverse engineer and there're often problems with the conversions.

  65. office HTML by avandesande · · Score: 2, Insightful

    Anyone looked at the HTML output from an office program? It's terrrible. Do you think their xml will look any better?

    --
    love is just extroverted narcissism
  66. Hmmm.... by BuffJoe · · Score: 2, Insightful

    I have a feeling that Microsoft "XML" will use Microsoft "Unicode." That is, any character in the range of 0x82 to 0x95, which Unicode reserves for extra control characters, will be littered with "smart" quotes, emdashes, and other proprietary extensions to Unicode that ensure that nothing works with it. I ran into this problem when I tried converting FrontPage generated HTML into XHTML so I could do conversions with XSLT. Needless to say, it took a lot of effort, even with HTML Tidy, to get Microsoft's generated HTML to get converted into XHTML! HTML Tidy constantly complained about the HTML, and looking at what FrontPage generates, it's not hard to see why it complained.

    I ran across the demoroniser, which fixes Microsoft Unicode problems, but it still doesn't fix the invalid HTML that FrontPage generates.

    Microsoft XML? Hah! I'll believe it when I see it.

  67. Microsoft's intentions by affenmann · · Score: 2

    Well, one consequence is that many people will be forced to upgrade to the new office, since all the Word-attachments will require the new word to be readable (and editable)... Now, this is a good motivation for M$.

  68. Tim here with a bit more background by tbray · · Score: 5, Informative

    I've seen the native Word XML format (alpha mind you, so it might get changed). It isn't exactly pretty, and if I had to write code to extract all the paragraphs that contained the word "foo" in bold it would give me a bit of a headache, but I could do it.

    The word "foo" in bold single-underline looks something like

    <r>
    <rf>
    <rp class="bold" />
    <rp class="underline" lines="1" />
    </rf>
    foo</r>

    Yeah, it's pretty verbose.

    Near as I can tell, it is 100% round-trip-able, i.e. you save as that file format, you read it in again, you hit ctl-S and it saves again; about as good as a native format. Now someone needs to write some script-ware to run Word in batch mode to xml-ify server directories with zillions of office docsl

    I think the reason MS is doing this is obvious. Look at their financials - they *really* need people to upgrade to the new version of Office. End-users don't buy Office any more, CIOs and the like do. These people are just not gonna be impressed by another new word-processing feature, but they might be motivated to upgrade if they thought that they were opening up all their data to re-use by other programs.

    I expect that with any luck we'll get a secondary industry built around doing cool unexpected stuff to Office docs. Don't want to sound over-excited here, but a huge amount of all the intellectual capital in the world is sitting around in Office docs, and this makes it noticeably more re-usable. Has to be a good thing.

    Cheers, Tim

    1. Re:Tim here with a bit more background by Anonymous Coward · · Score: 2, Funny

      Hey, no fair injecting actual information, from a primary source, no less, into a /. discussion! That's totally cheating! #@$! karma whore...

    2. Re:Tim here with a bit more background by Compuser · · Score: 2

      Round-trip-able is fine and all but is _any_
      formatting lost between XML version and binary
      format? In so many words, from what you have
      seen, is there a point of writing a script to
      run Word in batch-convert mode? Is the XML
      version more faithful to original formatting
      than, say, OO import filter?

    3. Re:Tim here with a bit more background by donutello · · Score: 3, Informative

      I think the reason MS is doing this is obvious. Look at their financials - they *really* need people to upgrade to the new version of Office. End-users don't buy Office any more, CIOs and the like do. These people are just not gonna be impressed by another new word-processing feature, but they might be motivated to upgrade if they thought that they were opening up all their data to re-use by other programs.


      Uhh.. from this article.

      Information Worker turned in healthy revenue growth of 26 percent, reflecting customer adoption of Microsoft Office XP through multi-year licensing programs. Customers acquiring Office this quarter included ChevronTexaco, Lockheed Martin, MetLife, Newell Company (Rubbermaid) and the US Department of the Army, Program Executive Office, Aviation.

      and

      Microsoft Corp. today announced revenue of $7.75 billion for the quarter ended Sept. 30, 2002, a 26 percent increase over revenue of $6.13 billion for the same quarter last year. Operating income for the first quarter was $4.05 billion, compared to $2.90 billion in the same period last year. Net income and diluted earnings per share for the first quarter of fiscal year 2003 were $2.73 billion and $0.50, which included an after-tax charge for investment impairments of $291 million or $0.05. For the same period of the previous year, net income and diluted earnings per share were $1.28 billion and $0.23, which included an after-tax charge for investment impairments of $1.22 billion.

      "Results for the first quarter were exceptionally strong, exceeding our expectations. During the quarter, we saw broader customer adoption of our licensing programs than we anticipated, as customers recognized the value of entering into long-term licensing agreements for our products. This strength in licensing led to solid growth for Windows® XP, Office XP and .NET Enterprise Servers," said John Connors, chief financial officer at Microsoft. "Consistent with our view at the outset of this year, the global economic outlook continues to be uncertain, however we remain committed to making the investments necessary to drive long-term product innovation and customer value across our businesses."

      --
      Mmmm.. Donuts
  69. Structure vs Presentation? by 4of12 · · Score: 3, Insightful

    MS Office saving its data in XML format is a great start.

    But will this really be enough?

    Previous complaints about how versions of Office didn't disclose the format were often referred to a specification that Microsoft made available to describe what was in a Word document.

    The key problem, IIRC, was the the description was not sufficient for one to predict how the Word document was actually formatted and rendered on the page.

    Because XML is very much like SGML or TeX, it has the potential for much more exhaustively describing document structure. But whether the new Word XML format (or OpenOffice format, for that matter) contains sufficient information for developers to reproduce the "right" format is a different issue.

    I hope I'm wrong and that the format is specified comparably to the level you'd find in say PostScript or PDF.

    Maybe MS is willing to let rendered Office douments change, just as HTML rendered documents change whenever one resizes the browser window.

    But I doubt it.

    --
    "Provided by the management for your protection."
  70. XML had an inventor? by blair1q · · Score: 2

    It's a stupid nested name-value pair text databasing system.

    That'd be like inventing the sentence.

  71. Wow! by grub · · Score: 2



    ..code-named 'Office 11'

    It must have taken Microsoft months to come up with that ultra-secret code name.

    --
    Trolling is a art,
  72. Scripts, not GUI by captaineo · · Score: 2

    My guess is, the XML format will make it much easier to manipulate Office documents from scripts, but it will still be very difficult to construct an actual WYSIWYG editor for them.

    e.g. Say that there is a tag with extremely complex, undocumented, formatting and display rules. It might be easy to add or remove things from tags, but only Office would actually know how to *display* a table correctly.

    This would allow MS to say "we have an open file format" without really endangering their core business, GUI document editing tools.

  73. Totally meaningless by ChaosDiscord · · Score: 2
    The important thing...is that Word and Excel...can export their data as XML without information loss.

    Oooooh, yay, Microsoft added another export filter to Word and Excel. The world is a better place.

    The reality is that unless the XML format is the default format, this change is useless to most users. The cry against new word processors is always, "If it doesn't import every single Word file ever created with every single feature supported, it's worthless." Unfortunately the insane complexity of the Word file format, the lack of documentation, and the constant churn as new versions of Word come out mean that you'll never see perfect conversions, yet too many people whine that it has to be perfect. (Completely ignoring the fact that most users would never notice the (in most cases) ever so slightly inaccurate translation, the minority that push the documents to their limits refuse to admit any value in an imperfect translation.) An XML would make the translation easier, but it's useless if it's not the default. Microsoft's monopoly on office productivity software is based on the massive numbers of existing Word and Excel files.

  74. Re:Whats wrong with html/css2 ? by Jugalator · · Score: 2

    Yeah, Mozilla support XML and I even think it supports XSLT for any layout needs. However, I still think XML in Office 11 is meant for parsers and automated systems, since I don't think MS would bother with it when they already have implemented HTML for web purposes. But I can't be completely sure of what Steve Ballmer & co are planning. :-)

    --
    Beware: In C++, your friends can see your privates!
  75. Government Contracts Might be The Reason by bobaferret · · Score: 4, Interesting

    I think the reason that they are switching over is probably due to the trend in emerging foriegn markets. Peru being a prime example. Countries are starting to enact legislation that requires any government procurments of software to only be for software that uses an open file format. Due to the long term storage problems.
    This tied to the fact that US sales are going to slow down or are already, due to the complete inundation of PC, they need new markets, and unless they use an open format they won't be able to get them. I'd be panicked Linux and Java eroding their server market. Governments are eroding their Office market. They only way they can grow is add value.

  76. Genuine XML? by J.+Random+Software · · Score: 4, Interesting
    Good in theory, but HTML support in Office 2000 was such a debacle that there are third-party tools designed just to unmangle the markup. They compltely ignored Processing Instruction syntax, which is intended to do just what they wanted, and
    <![if !supportEmpty Paras]>
    wasn't even well-formed SGML.
  77. Re:MS moderators? by Mad+Bad+Rabbit · · Score: 2

    Yesterday, when I attempted to moderate something as "Interesting", the confirmation page showed a
    moderation of "Overrated" instead. I'm pretty sure
    I selected the right value from the pulldown list,
    and suspect there may be a bug in the moderation system.

    --
    >;k
  78. He answers this in the article by drew_kime · · Score: 2
    "The important thing," he explains, "is that Word and Excel (and of course the new XDocs thing) can export their data as XML without information loss."
    Emphasis added. So the answer to What will be the default save format? is a resounding not XML.
    --
    Nope, no sig
  79. Advisable concern. by Perianwyr+Stormcrow · · Score: 2

    Perhaps, when confronted with carrying a snake across the river on our backs, we are properly wary?

    --

    What we call folk wisdom is often no more than a kind of expedient stupidity.-Edward Abbey

  80. but for semantics by rodentia · · Score: 2

    A non XML grammar/syntax, if accompandied by a decent and documented EBNF description of it's grammar, is much better to base your program on than an undocumented XML.

    Except that an undocumented XML file is in an exhaustively EBNF-documented syntax already. Not to mention that the constraints upon QNames mean that the semantics of the schema will be available for disclosure via existing tools even if obfuscated. The same cannot be said for an arbitrary syntax, ANTLR notwithstanding.

    --
    illegitimii non ingravare
  81. You may be onto something... by megaduck · · Score: 2

    Does it somhow become encrypted on its way out of the database, remains scrambled on it's way over the internet, and reassembles itself into nice XML once it arrives on the recepients computer?....

    I think you just described Palladium.

    --
    This .sig for rent.
  82. MSWord - OO - Save File by fferreres · · Score: 2

    That's the easiest way, really. And the benefit of having nicely documented DTDs. OO is the true compatibility XML file format for office files.

    That's why MS need to have their own. Because if they don't do it, many companies will use OO as a gateway (many not just yet, but soon).

    So they have to do XML, plus MS is wanting to integrate Office + Windows Programing + WEB Frontends + EVERYTHING in an interoperable way. They can dictate the standard of what the WORLD will have to use in the future.

    They will always be in the middle, and their revenue models will adapt to it just fine. The MS layer if you want to call it.

    --
    unfinished: (adj.)
  83. Re:Ummmm... diminishing sales? by aaarrrgggh · · Score: 2

    Elsewhere, you will find these sales largely attributed to the new license terms. Sales of Office were supposedly down. Most analysts expect the revenue growth to slow.

  84. Re:"Codename Office 11" by hey · · Score: 2
    Your comment is +5 Funny - I'd mod you up if I had the points. Maybe the real release will be "Office XI"

    I like the theory that as soon as a product passes version 10 it has lived too long. There is probably something else that can do the jobs better. Time for some lateral thinking. Probably the case with MS Office.

  85. Why MS can't win that one by Anonymous+Brave+Guy · · Score: 2
    A company that has for 18 years been trying to lock people in to their technology, will cause some people to be a bit paranoid.

    But, as the saying goes, it's only paranoia if they're not out to get you. ;-)

    The business world is a harsh and, ultimately, fickle one. Microsoft got to the top by doing good things, but you can't abuse your position for long or people will start to notice. As the world comes to depend ever more (rightly or wrongly) on IT to get its business done, following standards and maintaining open sources of information will become ever more important. Even a company as big as Microsoft won't survive by locking people in forever. They rose to the top in a remarkable period of time, and they now have near 100% market share in certain fields, but convincing companies to continue upgrading is becoming harder and harder.

    One of the major drives to get new versions of many products today is the promise of greater power to get data from A, where it is, to B, where you want it. If everyone else is playing ball (because, being minor players, they have to just to stay on the scale) and Microsoft doesn't, then sooner or later, Microsoft will lose market share to everyone else. No company survives by not giving its customers what they want, and right about now, Microsoft's customers most want the two things they can't, or won't, give: security and interoperability. All the UI reworks in the world aren't going to change that, and they know it.

    --
    If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    1. Re:Why MS can't win that one by Anonymous+Brave+Guy · · Score: 2
      How can you possibly say they got to the top by doing the right things??

      Because, for quite a long period of time, they really did produce some of the best mainstream software around.

      Windows -- in its 3.1 days -- was a catalyst for many of the good things we now take for granted on a PC. Whether or not MS ripped the ideas or not, it was MS that produced the product so many people used, and they used it because it let them do things they couldn't do before. To hell with the instability, it wasn't really that bad, and today it isn't really that much better. It got jobs done that otherwise couldn't be, as far as Joe User was concerned.

      Much the same could be said of early versions of Winword and Excel. Early versions of both made genuine, useful advances over the other software of the day (DOS- or Windows-based). Wordperfect, Lotus and co. dropped the ball, while MS ran with it, for a period of several years.

      Sling rocks all you like, but VB isn't one of the most popular development environments on the planet today by accident, either. For its time, it was revolutionary, and it's hardly fair to accuse MS of ripping off ideas without noting the number of other "visual" development environments that sprang from that one. Too bad they couldn't do it themselves with VC++, and Borland had to do it for them, but hey, you can't have everything. ;-)

      I honestly believe that MS have, in the past, produced some very good software. If it weren't for certain attitudes they currently exhibit, I would still say they produce software that is among the best in the world on many counts. The problem is that now, they have serious competition for the first time in a while, and the advances in useful features and UI tweeks aren't enough to make up for the security flaws, the lack of interoperability, the poor performance, the licensing concerns and that damned paperclip. Why would people upgrade perfectly usable Office 97 apps to Office XP, with all the downsides Microsoft have caused that to entail? If it ain't broke...

      The only way MS will continue to be successful in the face of good opposition from the open source community, as even Mr Ballmer himself has noted, is to provide genuinely more useful/useable applications that are worth paying for. They'll try legal moves to buy themselves time, but in the long term, both screwing your customer base, and buying politicians and asking them to screw their electorate, are losing strategies. They aren't stupid, and they know this. The interesting question is whether they'll actually act upon it effectively before they lose the faith of their customer base with the delaying actions.

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
  86. Re:read through "EULA" in the XML? by Bake · · Score: 2

    Did you read the article?

    There was no mention of DTD anywhere in it.

    XML Scheme on the other hand .....

  87. Microsoft reasons for doing this by kune · · Score: 2

    (1) Strategic value of proprietary Word format decreases. Most texts written today are E-Mails not Word-Documents. Word becomes more and more an editing format. Documents are published as ASCII texts, HTML and PDF. Word douments can't be combined with Web services, I've never seen a Web application creating Word documents. (2) Microsoft can't create a new proprietary format, that can't be read by Word 97. Everybody will accept that Word 97 doesn't read XML. If you want XML, you have to buy the new Office. (3) Outlook and Internet Explorer are examples how Microsoft can dominate a market starting with standard formats and protocols.

  88. How come slashdot comments are duplicated in link? by crush · · Score: 2

    If you follow the XML Journal link and look at the "feedback" at the bottom it appears to be the comments that are appearing here on slashdot. Is there some sort of reciprocal exchange of comments going on between the two sites? Is this kosher?

  89. Why Microsoft will do the right thing with XML by marhar · · Score: 2
    If they do the "right thing", then they will be able to position Office 11 as a generic frontend for any XML you happen to generate, regardless of the source. Imagine how convenient it would be to generate a nice spreadsheet from your backend perl script or nicely formatted form letters from your database application.


    The logic is: Everybody goes to XML, and Office becomes the universal front-end for everything XML.


    If on the other hand they screw it up, then that leaves a potential "underserved market" for somebody to step in and get some leverage in the newly created "xml frontend" segment of the business.

  90. Re:funny url, althought invalid by King+of+the+World · · Score: 2

    The ""bug"" is by design to stop page wideners. If they didn't apply it when it was an url the page wideners would just write long and fake urls. Right?

  91. Why is MS doing this...featuresets by InnovATIONS · · Score: 3, Insightful
    By taking the initiative in this MS can create an XML schema that neatly includes ALL of the featureset and terminology of MS Word/Excel/etc.

    Which then by virtue of market share becomes standard. It is actually in their best interest to publish it clearly. Then the other potential competitors will feel strong pressure to fit their software to match MS and have no real excuse why they can't. If MS waited there would be some other standard emerging and MS would be pressured by customers to adopt it. Then it would be MS having to shoehorn its document logic into some other form and not the other way around.

    While other potential competitors are playing catch-up with making their documents fit into the MS schema MS can be busy thinking about the next thing to do.

    So frankly I expect the word document xml (and excel and the rest) to actually be quite clear and documented but very aligned to how MS Word sees a document, which will likely impress others as obtuse.

  92. Yeah. Right. by rice_burners_suck · · Score: 2

    Yeah. And when Microsoft embraces and extends XML so it only works with Windows by obfuscating the format to the extent that nobody wants to parse it except the 20,000 monkeys beating away at Microsoft's very own 20,000 keyboards, nothing good will come of it. Oh well.

  93. Re:Whoa (Re:They already did this ) by pvera · · Score: 2

    It could save a lot of time to asp programmers. Instead of taking an html template and adding asp code to pull the recordset and loop thru it, you can do this:

    1. Put your query in an xml file and drop it into the xml gateway folder at the iis server. This xml file is tiny, since it only holds the query and a link to the xsl.

    2. Use XSL to make your template.

    3. Done!

    --
    Pedro
    ----
    The Insomniac Coder