Slashdot Mirror


Office 2003 and XML

zachlipton writes "Internet World is reporting that initial reports from Office 2003 beta testers don't look good for those hoping to share documents with non-MS systems using the XML file format. Gary Edwards, the OpenOffice.org representative for the OASIS XML file-format group is quoted as saying "although it's still early in the review process, it does look as though XP XML has been so seriously crippled as to be useless to anyone but the big content management and collaboration system providers." Apparently, all formatting and presentation information is removed from the XML. Furthermore, Office's new collaboration featres will only work with users who are also running Office 2003 (requiring Windows 2000 or 2003) that are connecting over XP servers." So Microsoft will continue its efforts to lock-in users with proprietary formats, and hopefully the rest of the world will produce an XML standard document format without them.

29 of 502 comments (clear)

  1. What did you expect? by inburito · · Score: 1, Interesting

    Somebody honestly thought that Microsoft would suddenly give out their most valuable asset, the proprietary office file format, and people would be free to use whatever they want..

    Pigs will fly if that day ever comes!

  2. Do Better? by 4of12 · · Score: 2, Interesting

    This is hardly surprising news.

    My question, though, is whether it is possible for other vendors and OpenOffice to create a better , more pleasing formatting and presentation of the content in the XML than Office 2003 does?

    --
    "Provided by the management for your protection."
  3. How does it work? by pardasaniman · · Score: 0, Interesting

    How will it read the XML if it only contains text?

    Will there be a binary portion which contains the formatting??

    How will this work for Spreadheets?

    How will images be embedded?

    So many questions!

  4. Re:Style Sheets by danlyke · · Score: 3, Interesting

    Yeah, but...

    It's unclear from the article whether that leaves the style information intact, and obviously Gary Edwards has an ax to grind, but in the systems I implement, sometimes I can't get users to adopt the use of style sheets, but I can extract the semantic information from stylistic patterns. It's not all that difficult to look at the formatting for a screenplay, for example, and pull out the meta information about what actors appear in what scenes based on the bold outdented bits.

    If I can get to the presentation markup as well, if the style sheets are in an easy to use format, then this is no problem. If the XML is a simple export format rather than the full document then I may as well be printing to PostScript and trying to reverse engineer the semantics from that.

  5. MS .doc / Adobe PostSript & PDF by PerlPunk · · Score: 5, Interesting

    All Microsoft needs to do is make their standard an open one (that can be used by others), like Adobe has done with their PostScript and PDF formats. Adobe has done quite well with their products based on these formats, too. Products like Adobe Illustrator and Photoshop (which works very well w/ bitmaps saved in PostScript) are the industry standard in digital art. If Microsoft followed a similar model, I'm sure that Microsoft Word will continue to be the industry standard in word processing software, and Microsoft as a business won't be any less richer for it.

  6. sometimes.. by siphoncolder · · Score: 5, Interesting
    I wonder if michael is testing us for stupidity, literacy, and actual technical knowledge of the issues.

    1) Take MS, make a report that says they did something bad, watch how many people flock to bash them DESPITE THE FACTS PRESENTED IN THE ARTICLE, which leads me to:

    2) How many people read the article? And of those people who DID, :

    3) How many of them know that XML is supposed to be a divorce of data from presentation? Why this comes as a shock to people is obvious - they didn't know that.

    The poster above who said "style sheets" - bravo. You couldn't have made a better point with two words.

    --
    i'm amazed that i survived - an airbag saved my life.
  7. Re:is it possible... by 4/3PI*R^3 · · Score: 2, Interesting

    If Micro$oft incorporates DRM into the proprietary file format they are under no legal obligation to document the format according to the antitrust settlement.

    If there is no documentation then any reverse engineering of the file format would be at least a violation of the EULA.

    In the worst case, since reverse engineering the format would allow a person access to a copy protected data set, this would be a violation of the DMCA.

    Did any of us really thing that B.G. hadn't thought this whole thing out years ago? He may be the scourge of the industry but he's not an idiot. B.G. doesn't do anything on the spur of the moment he plans everything.

  8. Two ways to look at it... by DigitalSorceress · · Score: 2, Interesting

    I'm sort of of two minds on this -

    On one hand, there are a lot of folks who have very strong opinions about the fact that the data should be separated from presentation... If Office 2003 were to strip the MS-apps-specific formatting (which is probably NOT very standards-friendly), but leave the style markings (heading, paragraph, footnote, etc...) then really, they would be providing a semi-structured document that conformed to XML standards.

    As a web application developer/web author, there have been many times when I have been given MS Word docs and Excel spreadsheets as content for our web site... In the past, I have resorted to copying the whole page directly onto a text editor (thereby scrubbing all formatting information) and then using HTML markup to make the document look much like the Word original, but without having to deal with that rather poor HTML output the Word and Excel's Save as HTML features produced. If I could have a semi-structured document, it would have been easier to write some macros to parse the XML structure to automate some of the rough formatting (hooks for stylesheets or somesuch).

    On the other hand, it seems to me that is might be in Microsoft's best business interest (the selfish ones) to make darn sure that it's not possible for OpenOffice fully interoperate with MS Office documents. I don't think they would be very smart (current business model-wise) if their new products (which will rapidly become de-facto business standards) helped to enable Open Office standards to take away their marketshare.

    In the final analysis, I probably wouldn't worry too much until there's a critical mass of people using it. By then, a bunch of folks will have figured out what CAN be done with whatever format MS ended up with. At that point, Office 2003's XML format will probably make it possible for people to do something they couldn't do before or at least, to do something easily that once was more trouble that it was worth.

    That's worth something...

    --

    The Digital Sorceress
  9. Re:Style Sheets by Captain+Large+Face · · Score: 4, Interesting

    The problem is that they don't include it elsewhere.. So in order to share documents in the style intended by the user, it must be saved as the proprietary format.

    IMHO, this ensures the user will opt-out of the XML format, and stay with the proprietary format. As I posted above, if Microsoft are going to do this, then they should bundle an XSL document with each XML document.

  10. Re:Style Sheets by AndyS · · Score: 2, Interesting

    Why? This is a file format. The word processor will handle it all for you.

    I don't think this is going to be like HTML when you expect to write it yourself. I imagine this will look more like the OpenOffice file format where you have multiple XML files inside a ZIP file (along with graphics and other multimedia stored inside the zip)

  11. XSL and FO by StevenYelton · · Score: 2, Interesting
    I suppose what really needs to happen is they supply the XML document for the content and a style sheet for the presentation.

    It would also be nice to be able transform the XML via a provided XSLT into fo (FO at W3C and FOP). Then you could present the document as a PDF, RTF, Doc, Java applet, or whatever.

  12. Re:At some point..... by bfree · · Score: 4, Interesting

    Why? The attitude sounds harsh when expressed so simply, but if you tell you "client" that you can't read the file and that your company has decided not to purchase the software required to be able to do so as otherwise they would have to pass on the associated costs to their clients, so could they please send the file in a format you can read instead (even Word XP or earlier thanks to oo.o) or fax it, should the client really have a problem and if so is it worth keeping hte client (yes I really said that, lots of the time troublesome clients aren't worth keeping without changes if you actually can cost them completely)? Similarly with a coworker you can ask them if you can buy the software from their budget (in a company setting there should be company standards so this should be easy)!

    --

    Never underestimate the dark side of the Source

  13. Re:At some point..... by NDPTAL85 · · Score: 2, Interesting

    Why? Because its a bullshit attitude to have when dealing with clients.

    Your supposed to bend over backwards to help and assist your clients, not make them do that for you. Of course if you do business with a holier-than-thou Free Software ethos then yeah I guess you wouldn't see a problem with acting like that. And I'm not saying it would put you out of business either. You'll simply be regarded as a jerk.

    --
    Mac OS X and Windows XP working side by side to fight back the night.
  14. This is already going on by Botchka · · Score: 1, Interesting

    Our clients use Lotus Notes. When an Outlook user sends someone an email with an attachment and chooses to send it in exchange rich text format, Lotus Notes doesn't play nice with that and converts the attachments to c.dat file. Our clients can't do anything with this c.dat file and the only option we have is for them to contact the sender and have them unselect exchange rich text format and everything works fine. My point being is that M$ has always had a proprietary way of going about things. Give them enough rope and eventually they will hang themselves on their proprietary ways. Don't think that companies are looking at open source as a viable alternative to M$ just because it's fun and different. It's because people are tired of being locked into M$ products and the constant layout of cash to get upgrades and *features* that nobody wants or needs.

    --
    Money not found! A)bort, R)etry, D)eclare Bankruptcy
  15. Re:Separating Content from Presentation a Good Thi by Delirium+Tremens · · Score: 2, Interesting

    Check your favourite HTML tutorial.
    Yes, good HTML is valid XML.
    Unlike your example, which is not even valid XML. But that's beside the point.

  16. REPEAT AFTER ME: XML IS NOT A FILE FORMAT by Trailer+Trash · · Score: 5, Interesting

    Internet World is reporting that initial reports from Office 2003 beta testers don't look good for those hoping to share documents with non-MS systems using the XML file format...

    That's because XML is not a file format, it is instead a format for file formats. To quote the O'Reilly "Learning XML" book, page 2:

    Note that despite its name, XML is not itself a markup language: it's a set of rules for building markup languages.

    I've said this many times on /. (look at my history), but the fact that a particular format is XML-based says nothing of your ability to read it. I'm even going beyond the fact that Microsoft could simply stick their traditional file formats into a CDATA and claim XML compliancy.

    The statement "If Microsoft used a standard XML format for their documents then anyone could read them" makes as much sense as an equally stupid statement like "If Microsoft just used 8-bit bytes in their file formats then anyone could read them".

    Sorry to rant, but the level of cluelessness around XML is astounding. Please read up, there's a ton of useful information on XML around the internet.

    MDC

  17. Re:At some point..... by ccp · · Score: 2, Interesting

    It must be a cultural thing, but...

    Do you have to reduce EVERYTHING to dollars and cents?

    Have you heard of principles, dignity, pride?

    Who said that you have to bend over backwards to satisfy your clients? I'd hate to have these clients. Mine are satisfied with a good deal, and I mean good to both parties.

    Cheers,

  18. Re:At some point..... by Photon+Ghoul · · Score: 2, Interesting

    I hope that your company keeps you in the back room so that you don't have to deal with any clients.

    Understand that I solely use OpenOffice for my documents when I'm not using vi. I prefer vi over all other types of documenting - it's fast and easy for me. Anyway, the point is - your personal preference doesn't matter. Communicating quickly and effeciently is.

    Yes, Microsoft sucks, closed formats suck, etc. The truth of the matter is, you either adapt to what (in this case) your clients use or they don't do business with you. Most businesses that I've worked for tend to opt for the adapt method than the 'go to hell' method.

    It has nothing to do 'bending over backward' for everything the customer asks for. It has everything to do with communication and making stupid things like file formats as transparent and unimportant to the client as possible.

    If you lose business over something as stupid as a file format, I only assume that business must be booming with the rest of the customers that only deal in your preselected formats.

  19. Once again, MS gets slapped by FUD by Planesdragon · · Score: 2, Interesting

    From the article:

    "XML and Web services use, especially for content-driven applications, is still very much limited to basic use of XML as a data-exchange mechanism between systems -- primarily for internal integration approaches," he said. "When dealing with exchanging information internally, what is most important is not to bundle all collaborative features into making for a huge, cumbersome XML file that only certain applications can process, but rather to strip out all the presentation layer features and focus on just the data to be exchanged. In this case, I don't see how Microsoft is violating that. You can choose to save a document with all the rich presentation data left in, if you choose (and that data will only be processable by Office applications), or you can choose to save the XML with just the data in it. I don't see how that cripples anything."

    If MS is doing XML right, an XML export from word will only mark the text file with the necessary handles to bind to the formatting file. If you open the text file without the formatting file, you get rather plain text.

    The same thing happened with MSHTML. Yes, it's got a lot of proprietary comments in it (the "" tags), but the CSS and formatting designations are as standard as the crude hacks and random idosyncracies that a human web designer may do.

    Plus, it's only an "early beta." I hope that the authors of the article send their comments to MS, so MS can expand on what their XML exports can do.

  20. Re:At some point..... by MeanMF · · Score: 4, Interesting

    You could also just download the free MS Word viewer that Microsoft provides here.

  21. Re:At some point..... by urmensch · · Score: 2, Interesting

    so what if I'm not running windows?

    System Requirements for Using Word Viewer

    * Microsoft Windows® 95 operating system or Microsoft Windows NT® Workstation operating system 3.51 or later

  22. Re:At some point..... by Uwe+Barschell · · Score: 2, Interesting

    I read the article. A representative from OpenOffice said that according to reports he has heard, the MS-XML format is crippled. An Office 2003 beta tester quoted in the article had a different view:

    Gary Edwards (OpenOffice Representative): "Although it's still early in the review process, it does look as though XP XML has been so seriously crippled as to be useless to anyone but the big content management and collaboration system providers. Reports are that when saving to XML, [Office 2003] strips out the presentation and formatting information, leaving near raw content."

    Mark McWilliams (MS-Office 2003 beta tester): "The opened XML document looks exactly like the original .doc file. And if I open up the XML file in a text editor, I can see that all of the formatting is properly maintained in the XML file."

    This beta tester also said the document he used was heavily formatted, and that there is an alternative, data-only XML format in MS Office 2003 that does remove the formatting.

    Who is right? I dont know and I dont care, because I dont use MS-Office 2003. However, I am usually suspicious of criticisms of a product that are levelled by its competitors. The users of that product usually have a more objective and accurate view.

  23. Re:Did anyone RTFA?? by aluminum+boy · · Score: 2, Interesting
    I totally agree. In fact, a single XSLT would likely be required to convert the XP-XML into the OASIS model, with or without formatting! Sounds much more interoperable than it was previously.

    It would be nice if Microsoft used an open (OASIS) format, but it sure doesn't sound like people are locked into the format.

  24. Not surprising by failedlogic · · Score: 2, Interesting

    I'm not surprised by any means that MS would chose a proprietary standard. It seems MS over the years has made importing/exporting .DOC documents much harder; locking people to using their apps. Which for MS is from a business/revenue perspective is understandable. It seems Wordperfect has OTOH, been much easier to convert/import and most documents I've imported have been nearly flawless.

    Perhaps on a separate note, what format would be best to use to compose essays and large documents in non-corporate environment. I compose a lot of documents as a student and I require something that I can easily format and safekeep electronically for many years. Other than POT ( no, not that but Plain OLD TEXT ), would some form of XML be better or Tex/TeTex..... ? It would be nice to standardize everything to one format and not have to worry many years later about not being able to retrieve it.

  25. Re:At some point..... by frozenray · · Score: 4, Interesting

    > You could also just download the free MS Word viewer that Microsoft provides here [microsoft.com]

    For those not running Windows, the Word viewer comes "free" with a $199.- (list price) version of Windows, a good sized chunk of your system disk (not that it really matters much given today's HD prices and capacities) and the usual installation hassles, like drivers for equipment which isn't included on the CD etc. Even if you got Windows "free" with your PC from the manufacturer, you just paid the Microsoft tax up front, and will continue to pay if you want to keep your system up to date.

    That's like saying the Grappa I got offered after shelling out $150.- for dinner with a date last Saturday was "free". Sure, I didn't pay for it, but you can't get it without buying dinner first.

    Yes, I know there are solutions for reading MS Office documents on Linux. But I always cringe when people tell me to use the "free" readers - they're not free in any sense of the word in my book.

    --
    "There are already a million monkeys on a million typewriters, and Usenet is NOTHING like Shakespeare." - Blair Houghton
  26. Uh ... what? by Osty · · Score: 3, Interesting

    Furthermore, Office's new collaboration featres will only work with users who are also running Office 2003 (requiring Windows 2000 or 2003) that are connecting over XP servers.

    Excuse me if I don't take this article seriously, but the author apparently knows nothing about Windows. Office 2003 will only work on Windows 2000 or 2003? Not Windows XP? Maybe he meant that the collaboration servers require Windows 2000 servers or Windows Server 2003 servers, since there is no XP Server. And speaking of XP, what exactly does he mean by "connecting over XP servers"? That's simply impossible -- there is no server version of XP, only Home and Pro.


    As for Microsoft not supporting Office on the obsolete Win9x platforms, good for them. It's past time for Win9x to be killed off once and for all. Not supporting it in Office is a good step forward.

  27. This may be a stupid question by nicotinix · · Score: 2, Interesting

    but can we not write a little add-on program to word/excel/powerpoint to allow File->Save As->Openoffice.sxw or File->Export->Openoffice.sxw???

    I am not a programmer, so I don't know how feasible that is. I know I would download and install something like that.

  28. interoperability should not depend on font size by cyril3 · · Score: 2, Interesting
    I thought xml was supposed to describe data so that an application could do what it wanted without having to follow the rules of the sending app. eg in finance, there will be a set of industry standard tags that an accounting program will set to data in an xml file and if i get that file my program will understand the tags and use the info in ways i want.

    it's not limited to allowing me to dispaly a word document of a report in open office and have it look exactly the same. I want to be able to import the xml file and have my analysis software know that a particular record is an non-current asset etc.

    who cares what font was used. Interoperability should not depend on font size or colour.

    It's precicesly the Microsoft specific bits of a file that should be stripped out. If a display property is only available on a ms platform then the xml file should not contain them.

    The big if is wheter ms takes out more and leaves the xml file unusable because there is insufficient description.

    What you will find is that industries and user groups will begin to define xml schema for their data. WP will be different but xml will still have a place.

  29. is it XML or not? by Anonymous Coward · · Score: 1, Interesting
    either it is XML and uses a funky DTD/schema or is merely a tagged text file that resembles XML in many ways, but I am at a loss as to how people are restricted from gaining data from within it. Furthermore, I have to question the statement:
    Apparently, all formatting and presentation information is removed from the XML.
    And this is a bad thing? One of the fundamental elements (badda bing!) of XML is its separation of data from presentation. If you mean "formatting" in that certain parts of the documents are tagged specially so that the corresponding stylesheet (or what passes for it) interprets that to mean bold, italic, color or even font and size changes then I suppose that might be a little odd. How do you in fact then track what parts (like one sentence inside a paragraph for example) are different and other rules should be applied? If however you are suggesting that an XML document actually specify italics when it should merely specify a more abstract specification of emphasis and uniqueness (within context of surrounding text) I would to strongly recommend you rethink that. XML is XML, not HTML and should not be treated the same.

    Organizations and individuals should be able to markup the parts of their document that apply special formatting, yet the actual rules of the formatting should be separate from the document itself (the actual content). Once a standardized way of noting different "categories" of marked up content is reached and then a standardized method of actually specifying what the author intends with that (e.g. italicizing and bolding with a color of red) then end users can more easily setup their own rules as to how to present these parts of the document based on their own personal organizational ruleset. (e.g. I might need to ignore certain italicized parts if in a certain body segment or perhaps if they are also specified as being a definition I can hyperlink to a dictionary... something that simply putting <i>RTFM Definition</i> will not allow except by using a complex or restrictive and hard coded method that does away with the reasons for using XML in the first place.

    Ironic as it is, I applaud MS for going this way simply becuase it may offer the CHANCE of standardization compliance... if people will get over "my team vs. your team" groupthink.