Office 2003 and XML
zachlipton writes "Internet World is reporting that initial reports from Office 2003 beta testers don't look good for those hoping to share documents with non-MS systems using the XML file format. Gary Edwards, the OpenOffice.org representative for the OASIS XML file-format group is quoted as saying "although it's still early in the review process, it does look as though XP XML has been so seriously crippled as to be useless to anyone but the big content management and collaboration system providers." Apparently, all formatting and presentation information is removed from the XML. Furthermore, Office's new collaboration featres will only work with users who are also running Office 2003 (requiring Windows 2000 or 2003) that are connecting over XP servers." So Microsoft will continue its efforts to lock-in users with proprietary formats, and hopefully the rest of the world will produce an XML standard document format without them.
Microsoft will have to learn IBM's lesson about transforming from a company that makes standards, to one that contributes to them.
They still don't get that their attempts to "embrace and extend" the whole damn internet isn't going to work.
The rest of the world WILL produce an XML standard document format without them, thank heavens.
Prevent email address forgery. Publish SPF records for y
All Microsoft needs to do is make their standard an open one (that can be used by others), like Adobe has done with their PostScript and PDF formats. Adobe has done quite well with their products based on these formats, too. Products like Adobe Illustrator and Photoshop (which works very well w/ bitmaps saved in PostScript) are the industry standard in digital art. If Microsoft followed a similar model, I'm sure that Microsoft Word will continue to be the industry standard in word processing software, and Microsoft as a business won't be any less richer for it.
There is a big difference between seperating presentation from content and removing the presentation totally.
1) Take MS, make a report that says they did something bad, watch how many people flock to bash them DESPITE THE FACTS PRESENTED IN THE ARTICLE, which leads me to:
2) How many people read the article? And of those people who DID, :
3) How many of them know that XML is supposed to be a divorce of data from presentation? Why this comes as a shock to people is obvious - they didn't know that.
The poster above who said "style sheets" - bravo. You couldn't have made a better point with two words.
i'm amazed that i survived - an airbag saved my life.
I have Office 2003 Beta 2 freshly downloaded from MSDN. This article is completely wrong. I did the following:
.DOC Word document with tables, multiple fonts, etc. .DOC format.
1. Opened a heavily formated
2. Saved the document as XML.
3. Opened up the XML document in Word and it looks EXACTLY like the original
I also opened the XML file in a text editor and sure enough it contains complete formatting information.
The point of the Office 2003 "Save as XML" with the "Data Only" checkbox is _NOT_ a poor mans Save As XHTML. It's decide to allow the data of the document and pet placed into an XML document based on a schema. You literally can make your own schema file/XSD, and use a tool inside Word to map the elements of a Word document to elements of the schema. If you simply map a paragraph to a string you will lose formating. Unless of course you define in your schema how you'd like to store formating information. But that is generally an overkill.
Think of a resume. you could define an XSD for a resume, and be able to save resumes against this XSD, as validated pure XML.
Now, if you want to produce a document, using an XML syntax but want to combine both data and presentation, then you want WordML.
WordML uses Word's own tags to markup the word document. I was going to show you an example of WordML but i don't feel like escaping allt he greater-than/less-than signs. Anyhow, WordML contains all the formating and everything necessary to display a Word document as it is supposed to look.
I think this Open Office guy is looking for a devil in Office 11 that isn't there. That or he didn't read the friggin manual.
-Malakai
-Malakai
A Dragon Lives in my Garage
Have you ever played a game like Civilization or Alpha Centari? You would be amazed at how much those games make you understand politics. Once you are in the lead, you do anything you can to protect that lead. And why would you expect the real world to be any different?
But this isnt a game, this is business. And since businesses are SUPPOSED to make money, they need to make sure people continue to buy MS Office. And making an office suite that shares documents with all the various third-tier office suites just doesnt do that. Why should my company buy MS Office if the documents it produces are exactly the same as those of FreeBeerOffice? Now, if FBO cannot do things MSO can do, then there is an incentive...
Manipulate the moderator system! Mod someone as "overrated" today.
This may be bad (keeping in mind the jury is still out on exactly how Microsoft is making this work) because in the case of office documents, the style is actually *part* of the content, from the perspective of Joe Office User.
.xml file, then that .xml file is practically useless to anyone who wants to collaborate with the original author since all of the styling information is lost.
.sxw (the default writer format) you're actually taking the raw data of the document, the styling rules for the document and a few other important bits and pieces and zipping them up into a single file.
i metype
.doc, .xls etc formats have changed and will need to be reverse-engineered again.
If Microsoft just puts the raw text data into a
As an example of a good way to do this (IMHO), take a look at how OpenOffice.org builds their files. When you make a
After unzipping this file, the following directory structure was exposed:
content.xml
META-INF/manifest.xml
meta.xml
m
settings.xml
styles.xml
With this type of design, you can get the best of both worlds. Technically, there is a separation between your presentation and content which allows simple programatic access to the data when necessary. At the same time, this design allows for full collaboration between people who also consider the styling of the data to be part of the content because the style rules for the content are included with the document.
With xml-saved Office documents containing only data and no style, collaboration between non-office users (and apparently Win9x users as well) will be no better off than before. Perhaps worse, assuming the binary
If this article is true and Microsoft has decided to remove the styling of their xml-saved office documents, I see two possible reasons for this:
The first is obvious. You're not using Office? Ok, second class citizen, here's the data but in a format that is next to useless for you to use.
The second possibility involves Microsoft just not being where they want to be with the Office XML sharing. Keep in mind that it took OpenOffice.org something like a year and half or so to define their XML interchange format. Microsoft may be going there, but due to overwhelming inertia, it just might not be going there very quickly.
Personally, I think the first option is the most likely. However, with OpenOffice.org working with OASIS and others on a common XML interchange format, I'm hoping Microsoft will be forced by the marketplace into option 2.
Best regards,
David
Read some other articles, or better yet get ahold of a beta and try it out. The authors of this articles will feel like schmucks when they realize what they missed.
First off, by default, if you save the word document as XML, it gets saved as WordML,which preserves Word's styles and formatting in an XML name-space that's separate from the one bound to the schema-controlled data.
If you check off the checkbox "Data Only" then you will lose all formating and your own XSD will be used to map this document into XML data.
WordML looks like a XML'ified RTF language. It would be trival to create an XSL stylesheet that transforms WordML into HTML/CSS with all formating (that HTML is capable of) which directly mimics MS Word. OpenOffice could also eat WordML quite easily and have all the formating/style of Word.
What the authors of this article are REALLY bithing at, is the fact that MS didn't buy into the OpenOffice Document Specification from OASIS. MS prolly sees OASIS as the US sees the UN. Defunct, not needed.
If you describe your data using XML semantics, and all it takes to convert from semantic style A to B is some XSL, then who cares about forcing everyone to use one specific format.
-malakai
-Malakai
A Dragon Lives in my Garage
Internet World is reporting that initial reports from Office 2003 beta testers don't look good for those hoping to share documents with non-MS systems using the XML file format...
That's because XML is not a file format, it is instead a format for file formats. To quote the O'Reilly "Learning XML" book, page 2:
Note that despite its name, XML is not itself a markup language: it's a set of rules for building markup languages.
I've said this many times on /. (look at my history), but the fact that a particular format is XML-based says nothing of your ability to read it. I'm even going beyond the fact that Microsoft could simply stick their traditional file formats into a CDATA and claim XML compliancy.
The statement "If Microsoft used a standard XML format for their documents then anyone could read them" makes as much sense as an equally stupid statement like "If Microsoft just used 8-bit bytes in their file formats then anyone could read them".
Sorry to rant, but the level of cluelessness around XML is astounding. Please read up, there's a ton of useful information on XML around the internet.
MDC
Do you have ESP?
InternetNews is authored by morons.
-malakai
-Malakai
A Dragon Lives in my Garage
Or spy on other people from a God perspective. Damn you! Now I'll have to spend the rest of my day realizing how pathically small my scope is...
It take more faith to believe in evolution than it takes to believe in God
Your use of the tired "Bzzzzt" exclamation at the beginning of your post completely overwhelmed any potential interest in whatever it was that you were trying to say.
Please, next time try to avoid the condescending tone, people might respond more constructively.
Oops, i forgot to set the reply to "Code". Please note, your SAX parser probably wont be able to parse this, heh. It is however, theoretically proper XML.
o ncooisoi39f940f9439 0f904390f94390fj904j90j3f09j4fj3490jf30jf040fj03j0 9fj9340fj043j90fj4903fj9043jfj0vjoirejvoojvoerjgoe jgojerogjoejoenmvotnhnoignoengotnhinringuinfi</pro prietarybinary>
<?xml version="1.0" standalone="yes" encoding="en">
<!DOCTYPE worddoc [
<!ELEMENT document (document_properties, document_section)>
<!ELEMENT document_properties (title, author, organization, department, job, generalsummary)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT organization (#PCDATA)>
<!ELEMENT department (#PCDATA)>
<!ELEMENT job (#PCDATA)>
<!ELEMENT generalsummary (#PCDATA)>
<!ELEMENT document_section (sectionsummary, proprietarybinary, unenhancedcrappytext)>
<!ELEMENT sectionsummary (#PCDATA)>
<!ELEMENT proprietarybinary (#PCDATA)>
<!ELEMENT unenhancedcrappytext (#PCDATA)>
]>
<document>
<document_properties>
<title>Crappydoc</title>
<author>William H. Gates III</title>
<organization>BORG</organization>
<department>Unimatrix 0</department>
<job>Secondary information processing adjunct</job>
<generalsummary>Doc about crappy M$ things.</generalsummary>
</document_properties>
<document_section>
<sectionsummary>Haha, you cant parse this and make it look perty, it's BINARY! You're still screwed!</sectionsummary>
<proprietarybinary>firoiorfioeiojvonvonviniooiwnc
<unenhancedcrappytext>Hehe, doesnt this text just look ugly? I bet it does, if you arent using M$ WORD!</unenhancedcrappytext>
</document_section>
</document>