Microsoft Claims OpenDocument is Too Slow
SirClicksalot writes "Microsoft claims that the OpenDocument Format (ODF) is too slow for easy use. They cite a study carried out by ZDNet.com that compared OpenOffice.org 2.0 with the XML formats in Microsoft Office 2003. This comes after the international standards body ISO approved ODF earlier this month." From the ZDNet article: "'The use of OpenDocument documents is slower to the point of not really being satisfactory,' Alan Yates, the general manager of Microsoft's information worker strategy, told ZDNet UK on Wednesday. 'The Open XML format is designed for performance. XML is fundamentally slower than binary formats so we have made sure that customers won't notice a big difference in performance.'"
It's actually likely they're slightly faster for spreadsheets. For example:
* they use single-letter tag names, for the most part, to reduce parsing time
* they remove all strings and put them in a look-up table
I'm not sure how much difference these things actually make in practice, but there's probably a little speed there.
What's not fair is to compare OOo to Microsoft Office, and determine the speed of OpenDocument versus OXML based on that...
"Elmo knows where you live!" - The Simpsons
Actually the problem is not binary versus none binary, its fixed length versus variable length fields and records.
With old style formats, you knew that the header was 512bytes followed by 600 bytes of meta data, followed by the document sections which all indicate their size (or have some way of calculating it based upon the block type)
With XML, you get a tag opening and have to parse until the closure, this adds a lot to the complexity of reading.
Writing is slightly different, and should infact be simpler with XML even though it may be more verbose, you don't need to buffer the entire block or rewrite the section header to indicate the length, you just happily do a sequential write.
liqbase
+5 Insightful? Oh PLEASE!
.doc files. So are Microsoft's new XML files. So it's pointless to claim that a "binary" file format is faster than an XML file format.
ODT XML files are binary files. So are old Word 2003
When people say "binary files" they mean this as opposed to "text files", a seperation that stems from the ability to open a file for in "binary" or "textfile" modus in several APIs. Has to do with, amongst others, interpretation of control codes such as ^Z.
The other big mistake: file formats aren't fast or slow. The algorithms for reading and writing them are (or aren't) slow.
*slaps cheek* NO WAI!
You fail to see the point of what they're saying. They're saying a binary file, with a header and fixed data structures, are alot easier to read & parse than an XML file, which consists of structures of variable length, needs to be interpreted, etc etc etc. This is a problem with XML.
I'm Rocco. I'm the +5 Funny man.
I've noticed that Word will stream open a large DOC file, so that you can start to work on it before it's been entirely loaded -- similar to a web page.
DOC files don't so much as stream as open for Random Access. They're structured in such a way that the information is stored as an object heirarchy scattered across the file. This makes saving faster because only the changes are saved to the file. It also make opening faster, because Office only needs to pull up the information that's on the screen at the moment. (Even if it's at the end of the document.) PDFs work in a similar, but more structured, fashion.
The unfortunate fact about ODF is that it requires a complete decoding of the file when loading, and a complete reencoding of the file when saving. However, I don't see any reason why Microsoft can't just add ODF support and make it an optional format. Computers are fast these days, and it should be up to the user to decide whether he needs the performance provided by the MS DOC *cough* "standard".
Or in other words, Microsoft is grasping at straws, trying to find a reason why they shouldn't support opening and saving of ODF files. I feel so sorry for them. (Not.)
Javascript + Nintendo DSi = DSiCade
This reminded me of this paper, "The Psychology of Learning". In it the writer describes the act of people who don't want to learn new things: "As long as everybody around them use tools, techniques, and methods that they themselves know, they can count on outperforming these other people. But when the people around them start learning different, perhaps better, ways, they must defend themselves. Other people having other knowledge might require learning to keep up with performance, and learning, as we pointed out, increases the risk of failure. One possibility for these people is to discredit other people's knowledge. If done well, it would eliminate the need for the extra effort to learn, which would fit very well with their objectives."
This issue is about Microsoft defending their turf rather than not wanting to learn something new. But it's basically the same motive at work: find ways to undermine the new to benefit the old.
It goes on, "This model of learning also explains other surprising behavior that I frequently observe. I have seen novices in software development with knowledge of a single programming language explain to experienced expert developers why their choice of programming language was a particularly bad one. In one case, I talked to a student of computer science who told me why a particular programming language was bad. In fact he told me it was so bad that he had moved to a different university in order to avoid courses that used that particular language. When asked, he admitted he had never written a single program in that language. He simply did not know what he was talking about. And he was willing to fight for it. With respect to programming languages, negative opinions about a language that a person does not know, are usually based on very superficial aspects of it. To people obsessed with performance lack of such in a programming language is a favorite reason to advocate its eradication (even though performance is not a quality of a language, but of a particular implementation)."
The positive lesson to take away from this is the MS is undoing itself. It's turning to cheap, nasty, suit-driven mentalities to defend its turf rather than the old days when it would just go out and write something new and nasty. It's become an unwieldy beast. I read about the Vista delays yesterday and briefly thought "Will anyone notice - who uses Windows these days". To an extent it shows what a bubble I live in. But it's true - *all* of my regular contacts use linux, freebsd or mac os x. As they should. After all - friends don't let friends use Windows.
Believe with me, my saplings.
TeX consists of long streams of ASCII bytes and offer no random-access abilities whatsoever except those implemented by a text editor and the underlying filesystem. And yet, LyX, which can easily handle thousand-page documents, loads and saves nearly instantaneously.
Your complaint is really over the relative brokenness of two major office suites, not the inherent advantages of their document formats.
Dewey, what part of this looks like authorities should be involved?
This stuff doesn't even make sense.
OpenOffice uses ODF. Office uses binary formats. The performance analysis quoted doesn't compare ODF and OpenXML. It states right in the article:
Here is a comparison with the standard 16-sheet SXC and XML sample file I've been using. The sample is in compressed XML format because it is smaller and easier for you to download. You'll have to convert the XML file to XLS and the SXC file to ODS to run the following test yourself.
XLS is a binary format. This study is irrelevant to the statements made. And it's the only data given to substantiate the claims made. So there is no data given at all.
All you can conclude from this is that OpenOffice 2.0, retrofitted recently for ODF, is much slower in a windows environment than Office 2003 using binary file formats. A far cry from any statements made either by Yates or by the summary.
What a pile of crap journalism.
-1 Uncomfortable Truth