Microsoft Ends Era Of Closed File Formats
RzUpAnmsCwrds writes "According to an MSDN Channel 9 interview with an Office file-format developer, the next version of Microsoft Office (Office 12) will default to newly-developed XML file formats in Word, Excel, and PowerPoint. The new formats will apparently include XML files along with other files (images, etc) inside of a Zip file. Microsoft will also be providing extensive documentation of the new format to the public through MSDN. The developer likewise announced that Microsoft would be releasing updates for Office 2000, XP, and 2003 to read and write the new formats when the new version of Office is released. If this interview is correct, it could mean the beginning of the end of Microsoft's proprietary file formats." Coverage at Beta News, Information Week, and the Washington Post.
Because, let's face it, the only reason this is happening is because MS have lost the battle to outlaw reverse engineering. Now they'll have widely available specs for their file format -- and all you'll have to do is license the 20 or so patents that protect these formats, and you'll be able to make a competing product that can read Excel files.
Remember, GIF was a completely open format -- but that didn't mean Open Source software got to use them freely.
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
You fucking troll, since when microsoft's binary document formats are documented, or fast?
Implementing a file format as binary data or even a simple SGML structure such as RTF means less overhead. Using XML you have to run an XML parser, and the file is more freeform. There are no set data structures, it is just a stream of text. With a binary format you can structure it in such a way that you can read a header in and know exactly where to seek in the file to get the information you need. With XML you are pretty much stuck reading sequentially and figuring things out as you go along. Sure, an XML parser library may make it easier, but behind the scenes it is still parsing that stream and processing each tag one at a time.
24 beers in a case, 24 hours in a day. Coincidence? I think not!
No they won't.
Watch the video - the entire file format is completely open.
He admitted that inside the ZIP they are currently storing the binary copy to make it easier to test and profile against the formats, but when Office 12 is released it'll just be the one XML, completely open format. He also made a point that they are going to have 'thousands' of examples on MSDN, along with very detailed documentation and whitepapers.
Now whether it's patented or not, I don't know. But this is a _VERY_ big step for Microsoft. It's going to make translating between this and OASIS (which OpenOffice2 and a lot of others are considering/implementing as their default) as simple as an XSLT transformation.
IntechHosting - Free domain, 2GB, PHP, £4.95/$8.95
For those who don't want to watch the video, the new format will supposedly offer a %75 improvement in file size. The old, binary format did not use any compression at all. Some of the other features include having the formatting information at the end of the file so that a half transmitted file still contains all the content.
gzip and zip are completely different things. gzip compresses a stream (and does a much better job than compress, which it has replaced entirely. However, gzip is slowly being replazed by bzip2 nowadays), whereas zip is an archive format that can store individual (usually compressed) files. The huge advantage of zip over compressed tar archives comes from the fact that you have random access, i.e. can extract a single file from a potentially HUGE archive).
GIF had patent issues with the LZW-Algorithm it used. The patent has expired recently, but the GIF issue is completely unrelated to ZIP (ZIP uses LZ77).
About the patent issue: There are a dozen or so zip-related patents, but they're all highly specific and shouldn't stop anyone from using zip, or even writing a zip utility. See also Patents on data compression algorithms.
Last time I checked the XML format stored a serialized form of COM objects that where not documented. Better than nothing but not really all that open.
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
I agree with you in that binary formats can be faster, and I don't love XML-as-storage-format too much, but the case in point is *microsoft's* binary formats, which are little more than straight memory dumps, and UNDOCUMENTED, and SLOW.
A well-designed binary format makes much more sense than XML, in this I concur with you, but XML is better than current microsoft's doc formats in that it would be easier to figure out the inner workings of the format, and making struggle for compatibility a much less gory task.
Stupidity is an equal opportunity striker.
Fellow slashdotter Bill Dog
You've got your history a little confused there. ARC was evil because of copyright issues - the LZW patent may have come into play as well, but the main problem was the SEA v. Katz lawsuit over copyright on the file format - and so Phil Katz created ZIP as a free replacement. There was no need to replace the already-free ZIP with a "freer" format. In recent years there's been some proprietary embrace-and-extend applied to ZIP, especially in the realm of encryption (the original ZIP's encryption algorithm was inadequate) but those extensions are arguably violations of the informal copyleft Katz put on ZIP.
.Z and .gz.
The gzip format was created to replace Unix "compress", not ZIP or ARC (which would have been, from the point of view of an early gzip user, toy formats for PC weenies - "get a real computer!") and there may have been patent issues there too, but I think the main reason for the switch from compress to gzip was simply an improvement in compression ratios. Archive formats like ARC and ZIP were never directly competing with compression-only formats like
First of all, the entire MSDN library can easily be accessed online (http://msdn.microsoft.com/library/), second an MSDN subscription doesn't involve any kind of NDA. The only times I've personally come across this was with pre-release stuff and with their limited beta programs and in those cases it's nothing that any other company doesn't do either.
Well, considering that on Tuesday, they were granted a patent on marshaling XML to and from objects, I'd guess they still have their bases covered. Yeah, the XML is "open", but you can't write an application to convert that XML into an object map without violating their new "intellectual property".
Xenon, where's my money? -Borno
Microsoft announces that they are going to do something that Slashdot has wanted them to do for ages, and Slashdot proceeds to find faults with it?
Leaving aside that fact that, like you, Slashdot is not an intelligent entity with needs and desires, this is not what we wanted.
We want an open, documented and non-patent encumbered format that allows interoperability between Office and other software.
That is not what this is.
Another poster has already provided a link that will tell you everything you need to know about the conditions attached to these formats.
They have patented everything. You will need to licence their patents in order to use these formats in your software.
That makes them useless. And yet you want us to get down on our knees and praise Microsoft for their generosity? With all due respect, you are either a Microsoft shill or a fucking retard. Not that I am suggesting that those two options are mutually exclusive.
Office 2003 XML Reference Schema Patent License
Just TRY to use the Office XML Specification in an Open Source application. Go ahead, I dare you.
"Live Free or Die." Don't like it? Then keep out of the USA
I'm with you on this. Much as I'm not keen on MS, the SharePoint server is absolutely phenominal in terms of actually getting things done in a group. Tie it with a properly configured Exchange Server and a 2003 domain, and you have a rock solid (Yes, solid) platform for group work, communication and management that OSS can't even touch.
How many people can read hex if only you and dead people can read hex?
Uhm ... try reading the license.
Looks kinda like a BSD license, don't it?
Yeah, especially the part that says "You are not licensed to sublicense or transfer your rights."
Or not.
http://www.microsoft.com/Office/xml/faq.mspx
If you don't know where you are going, you will wind up somewhere else.
Here's two examples of prior art.
The Internet is full. Go away.
Open Format: These formats use XML and ZIP, and they will be fully documented. Anyone will be able to get the full specs on the formats and there will be a royalty free license for anyone that wants to work with the files.
From the blog of Brian Jones, Program Manager Microsoft Word here. So they intend to enforce control over who uses them, but not by paying royalties.
This sig is empty.
However, Microsoft may be walking a tight line here (at least in theory) as they are a convicted Monopolist. I refer you to this quote from Nolo Press' "Patent It Yourself" (p 1/8), on how Patents can be lost:
"The patent owner engages in certain defined types of illegal conduct, that is, commits antitrust or other violations connected with the patent".
I'm not suggesting that Microsofts' patents aren't a threat to the Open Source community; nor am I suggesting that the patents be taken lightly in any way. However, I do have to wonder how much of Steve Balmer's chest-thumping about Patents is just FUD, and perhaps Microsoft isn't as strong on this point as they'd like everyone to believe.
I have no doubt whatsoever that Microsoft would try to pull whatever they can get away with. The question I would like to raise is whether they would actually be successful (or how successful they might actually be); especially given that there are now deep pockets behind Open Source?
I am not a lawyer (nor ever wish to be one ;) ). But I mention this for two reasons. First, for everyone's general awareness. And second, to solicit some input from those who are more knowledgeable in this area of the law.
The best way to predict the future is to create it. - Peter Drucker.
If Tyranny and Oppression come to this land,
it will be in the guise of fighting a foreign enemy. -James Madison
From the FAQ:
If you don't know where you are going, you will wind up somewhere else.
Why do people like you keep reiterating the tired old "without a network" line?
It hasn't been true since at least NT4 SP6a, when NT4 achieved a C2 rating *WITH* network. Windows 2000 achieved CC both with and without networking.
The NT4 link is no longer around on MS's site, but there are still some pages out there that reference it:
Such as this one
And here is Win2k
If you need web hosting, you could do worse than here
XML is based upon standard text strings, not binary codes. The part between the angle-brackets is what defines how to read the string(s) between the tags. This is why it's called freeform. As long as the tags are defined somewhere (like a DTD), then the strings will display properly.
As for having to read in the document fully into memory - not necessary. Just follow the tags.
Actually, Word docs are not read fully into memory anyway - only the basic information is read into memory along with what information is needed to display in the current buffer. Ever notice how your hard disk is hit during certain times when just typing a paragraph in? It's due to the memory buffers being swapped onto the disk for the areas that are not being displayed.
From the website:
But, note that the patent and copyright provisions in the license for the Office 2003 XML Reference Schemas require you to include a notice of attribution in your program.
I'm just guessing that the GPL's noted incompatibility with an advertising clause is what breaks compatibility here. MS being MS, they could well have done it intentionally; that said, an advertising clause might also have simply been seen as appropriate. Who knows.
---
Mod me down, you fucking twits. Go ahead. I dare you.
(I read with sigs off.)
Clean-room reverse engineering doesn't help if the format is patented, and this format is patented.
Take a look at a high school senior's Publisher & PowerPoint files some time. They are HUGE, especially with lots of pictures (as instructed). It's not unusual to see file sizes 100-250mb on an ordinary PP or Pub file. And boy do they take a long time to open across a network on P4 computers with lots of ram. We just had a graduation PowerPoint file that was in the neighborhood of 700mb and it took a couple of minutes to open from the network (using a 100BaseT connection).
I've told the teachers to make sure the kids optimize their photos, etc. but it falls on deaf ears...
Have you hugged your penguin today?