MS Office XML Format Now In TextEdit
computerdude33 writes "Apparently, Apple heard of Microsoft Office changing to XML formats. If you have OS X 10.4.2, you can save documents in TextEdit in Word XML Format. They are saved with a *.xml extension, and are riddled with references to Word. Here is an example of one of these documents."
I don't really see the problem with "bloated" xml, when the files are zipped by default. Instead of smushing your efficiency requirements in with your readability and standardization requirements (and screwing all three), you first handle readability and standardization and then rap it in a standard efficiency layer. The upshot is, not only are the files often *smaller* than the old Word equivalent, but I can also hack through them using a couple of standard perl packages that have come with linux, OS X and cygwin for years.
Where's the downside?
So a simple two word text file has the following 33 XML tags pasted here with the greater and less than signs removed...
.doc format (20,000 bytes).
What is your point? Oh lord, this file is 1200 bytes long, for "just two words of text."
I created the same two-word document and saved it in several text-based formats that preserve the formatting. HTML (2700 bytes), RTF (3600 bytes), PDF (16,600 bytes), and of course, Word
The XML version is smaller than all three, and I dare-say, easier to parse and manipulate with a 3rd party program.
Yeah, if you don't want any formatting information stored with your text, use plain text. But otherwise, XML seems to be as good a format as any of the other markup doc formats commonly used in Office.
Ironically, the word ironically is often used incorrectly.
It's only "ugly" if you are not used to XML. It's certainly not "bloated" at all.
"Verbose" perhaps... but verbosity is kind of the whole point of XML in the first place.
I hate MS as much as the next guy, but I'm thrilled with the fact that they are finally creeping towards some open document standards.
When you consider that their main profit strategy for the last 5-10 years has been "force pointless upgrade sales by screwing with the document format and breaking compatability with everybody, including our old customers," I think the fact that they are suddenly playing nice like this (even though it may open opportunities for other people to chop them off at the knees) shows that Balmer & Co. seriously believe that their future does not lie in merely maintaining the MS-Office monopoly. Maybe Cringely's right, and the boys from Redmond are betting the company on the X-Box evolving into a ubiquitous media console.
Then again, maybe they are so cock-sure that they have the best & brightest programmers in the world, that they think they will be able to open the format and still maintain their lead on quality alone. I find it hard to believe that they are that delusional, but you never know.
Information wants to be anthropomorphized.
Nor was it true that "nobody cared". Lots of people bitched about it.
I thought he was demonstrating different exports from Word. Word 2004 (Mac) makes it 2,167 bytes. Granted, that's horrible HTML...
Granted, that's horrible HTML...
It's also a fair example, because Word-HTML can "round-trip" back to Word with no loss in fidelity. A barebones HTML file can not.
Whenever I hear the word 'Innovation', I reach for my pistol.
Which is why even the dedicated MS-haters blanched at having to use NN4. It was bloated, buggy, crappy.
MS didn't achieve browser dominance just through (mis)use of their monopoly. Netscape helped them by releasing NN4.
When they came for the communists, I said "He's next door. Take him away. Goddam commies."
Ah, Pages. The program has some neat features, but has all of the hallmarks of being rushed out of the door for the 1.0 release. It's a nifty program for making flyers, and maybe short newsletters, but it's pretty much a loss to do any serious word processing in the thing, as it currently stands. In a way, it doesn't surprise me to hear that TextEdit is leading the way on the XML front, despite the fact that Pages has an XML native format...
Babar