Microsoft Ends Era Of Closed File Formats
RzUpAnmsCwrds writes "According to an MSDN Channel 9 interview with an Office file-format developer, the next version of Microsoft Office (Office 12) will default to newly-developed XML file formats in Word, Excel, and PowerPoint. The new formats will apparently include XML files along with other files (images, etc) inside of a Zip file. Microsoft will also be providing extensive documentation of the new format to the public through MSDN. The developer likewise announced that Microsoft would be releasing updates for Office 2000, XP, and 2003 to read and write the new formats when the new version of Office is released. If this interview is correct, it could mean the beginning of the end of Microsoft's proprietary file formats." Coverage at Beta News, Information Week, and the Washington Post.
Hopefully this file format change will bring about the end of ever-changing file formats from one version of an app to the next. Who among us doesn't have files saved in an old version of, say, Word, which can no longer be read correctly in a newer version of Word?
Anakin Simpson: If you're not with me, then you're my enemy--ooh, donuts!
Also, "Microsoft Ends Era Of Closed File Formats" is a little overreaching, don't you think?
That's exactly what I was thinking. If Microsoft was really opening up Office, why didn't they go for the OASIS Spec? Me thinks that this is an attempt by Microsoft to lead the industry around by the nose, thus solidifying their place as "Industry Leader". And with a proprietary document format, they can make minor, but frustrating, changes every version just to keep the competition on its toes.
Javascript + Nintendo DSi = DSiCade
The one thing that these others have in common, that MS Office lacks, is support for the OpenDocument DTD. OpenOffice.org v2 will use OpenDocument as its main format.
Note that many of the articles linked to by the original post express skepticism about how open MS' XML will actually be. Recall that in the last year, and even in the last weeks, MS has sought patents from the USPTO for XML and XML related functions. And is even now pushing to get legislation in Europe to make those same patents valid in the EU. That smacks more of a PR stunt rather than an actual opening up.
Furthermore, since the articles don't mention the current leaders in productivity tools with XML-based formats (i.e. OpenOffice.org or StarOffice), that looks all the more like warmed over press release being passed of onto the public as news. What's next? A press release about MS suddenly supporting PDF export like in OOo or StarOffice?
Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
Access to the MSDN documentation will require a MSDN developer's subscription and a signed NDA. The NDA will of course forbid the use of file format specification in unsecured software. Appropriate copyright, patent and other licensing fees will be required of developers writing commercial software to access the new file format.
All kidding aside, I think any hope about this is misplaced. There will no doubt be numerous restrictions on the use of the format information.
There's also the fact that MS has done quite a bit to document their Office Formats in the past. The major issue is that the documentation differs significantly from the implementations.
In other words, this is a load of marketing, designed to grab a few buzz words so the sales staff can toss around the phrase "Open Format" when necessary.
"Live Free or Die." Don't like it? Then keep out of the USA
In particular; consider "Microsoft may have patents and/or patent applications that are necessary for you to license in order to make, sell, or distribute software programs that read or write files that comply with the Microsoft specifications for the Office Schemas." taken from the same page...
What changed? How is that an "improvement" exactly?
The future profitibility of MS Office is as a component of network groupware systems. Because if you are primarily using Office in standalone mode, you are just fine with any version of Office released in the last 8 years. So, the "value" has to be in improved collaboration or document management.
In this respect, Microsoft needs open formats just as much as anyone. Ever try to write a server-based system that reads information from DOC files? Using winword.exe with automation just doesn't really work. XML lets MS use a relatively lightweight parser in a server-based system.
Oh, and changing the default fileformat will surely spur some upgrades, but from what I've seen the corporate market is generally not in a big hurry to get onto the latest version of Office. I don't foresee a repeat of Office 97.
Business. Numbers. Money. People. Computer World.
The interesting thing is that all this server based control and logging of DMR'd functions gives an enormous boost to the type of information available for international and corporate espionage. Through backdoors, security holes or escrow keys it was possible before to get only the documents themselves for the most part. Now it's possible to monitor who's collaborating with who, and see everyone in the distribution chain.
That much can be guessed even now during the vaporware stages. However, as more technical information becomes available it will be possible to guess whether these same functions can be used for more than monitoring and can actually be used to stifle or suppress dissent or specific individuals or groups.
Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
Well I gotta hand it to you for a what amounts to an absolutely brilliant troll. You had me nodding my head the whole way through, but actually your response is just as hyperbolic as the story title. I really don't care to get into all the details... but one thing you said,
I can't see this as anything more than a much belated, empty gesture on Microsoft's part.
is true from the MS perspective, but that doesn't mean nothing good can come of it. Having a documented XML format could do wonders for OpenOffice compatibility, which wouldn't necessarily put a dent in Microsoft's monopoly, but it would make life a lot easier for those of us who don't want to participate in it. I'm not saying it'll pan out, just that there are possible real benefits.
Watch the video - the entire file format is completely open.
Honestly, I am not going to believe it until I see it.
Microsoft has lied before.
It's quite possible they don't intend to open their file formats at all, they just intend to make the Washington Post and its readers think they've opened their file formats. In the meantime, if Microsoft actually wanted to "end the era of closed file formats", all they'd have to do is, you know, actually comply with the letter of the antitrust decision currently handed down against them in the E.U. and the spirit of the toothless antitrust "settlement" currently in effect against them in the U.S.. Mysteriously, they haven't.
Irritable, left-wing and possibly humorous bumper stickers and t-shirts
Ummmm, hello people. This is an XML format.
If MS needs extensions, that's what namespaces are for.
As long as MS extensions don't change formatting functionality (this is really not rocket science, Word is not an innovator here), they can tack whatever metadata they need into the file format and still have it be portable.
If you don't believe me, look at what Inkscape has done with SVG. Psodipodi built on it, adding a namespace to provide their needed data. Inkscape did the same on top of that. It produces one file containing three XML namespaces containing with interoperable metadata for two editors--and it's viewable in stock SVG viewers.
Obviously it's up to Microsoft to "do the right thing" with their metadata, but this certainly levels the playing field so that others can do what they need to with the documents.
Now the patent on XML-to-object-mappings, that's another story...
I think Mauve has the most RAM. --PHB (Dilbert Comic)
I use Open Office exclusively and have for the past couple of years. Reading the files in certainly isn't a problem for me. The only files that are slow to load are the master document files, and that's because they link to dozens of other files.
The XML specification is being expanded (it might already be done) to allow binary formats. There are good reasons, though, why it's best to keep data files in straight XML text format. It eliminates the need to worry about machine architecture. Little endian or big endian, it maks no difference to you. The files are perfectly portable across platforms, which is increasingly important these days. XML files zip very nicely, making them almost as small as a corresponding binary file.
It is far easier to provide backwards compatability to earlier file formats when you are using XML than if you are using binary file formats. With XML, if it sees a tag it doesn't understand, the parser ignores it. If a binary file format loader sees stuff it doesn't understand, it bails out with an illegal file format error.
When you move to a new expanded file format with XML, you don't have to write a conversion utility. Since you are merely adding new tags, your program can read any of your old data just fine, then add the appropriate tags and new data. This saves a great deal of trouble for programers.
Machines are fast and cheap. People are slow and expensive. It is far better to have our computers do a little extra work on loading a text file and eliminate conversion utilities and complicated loading routines that a prone to bugs.
-All that is gold does not glitter - Tolkien
www.ra
Not to be a troll, but Microsoft Word's HTML output gives a good idea of how greatly can they bloat XML.
My other post is a First.
you have a rock solid (Yes, solid) platform for group work, communication and management that OSS can't even touch.
Which, imho, is one of the two problems with the OSS business desktop. Gnome and KDE are great desktops, Linux has a long pedigree of network interoperatiblity, but it is really nothing more than a chain of islands, each doing its own thing. There is not a large network collaboration software for linux that takes care of all of the needs of business users. You can cobble the parts together, but if developers don't control all aspects of the equation, it makes things difficult.
Microsoft is in a unique position where it can tie in parts of its operating system and application software together for a 'just works' solution. People can cobble together a 'works' solution, and even a 'works better than MS' solution, but there is a lot of issues with setting these solutions up. To date (and I have been looking) there is no single definitive solution for something as simple as network logon, and the preferred solution (Ldap, pam and Kerberos) is not the easiest thing to deploy.
Even if you were to create a ldap-pam-kerberos network, with a document management system that used the kerberos authentication, e-mail that used kerberos authentication, and a plugin that allowed you to check out and check in documents into the dms for OpenOffice,without using a third party middleware that added twenty extra steps into it, you would need a huge company or dedicated group to do it, do it right, and do it seamlessly.
Novell's working on it, but because it's new, it isn't mature enough for business to see it as a viable solution. Novell still has its fanboys, and their stuff does warrant it, but they are not seen as a competitive threat to MS.
So you think that a 1Ghz cpu is going to be slowed down because someone's resume or board of directors presentation isn't binary anymore?
Oh puuuulllleeze.
Such concerns might be relevant for a C64 or perhaps even a MacPlus. However, for small consumer documents such notions are absurd.
This isn't exactly someone's corporate data warehouse we're talking about here.
A Pirate and a Puritan look the same on a balance sheet.
The term you are looking for here is 'self describing structure'.
If you have data in the form of LISP S-Expressions you know where structures start and stop. If you have just one document in that format you can pretty much work out the entire file format - or at least the features being used.
If you have a binary document you have to do a lot more digging and it can take you days to just work out the basic structure.
This will make Word much more useful, it will be much easier to create documents with other applications and emit them in Word format. So for example if I have a report writer component in my server I could spit out a Word Document rather than HTML which I would use today.
I can also write filters to automatically convert from Word format to other formats, so I can take HTML source and spit out Word, I can take an XML data structure and emit word.
So why would I prefer Word format over HTML when I was one of the people who helped write HTML? Well the answer is that virtually all HTML editors are optimized for editing Web pages. I write books and other reports that really don't fit that structure. There is nothing in HTML land that I have seen that provides the power of the Word outline mode and has built in spell checking as you go.
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/
XML data structures are all going to make life a whole lot easier now that memory and processing time are such commodities.
Commodity != free, especially when you are trying to deploy something on a battery-powered device.
Considering that even MS sometimes has problems interpreting their own binary formats from way back when, I could see it just being a move on their part to eliminate that problem in the future.
That said, I don't believe the GPL incompatibility was intentional; I believe it to be a side-effect of what MS thought was an appropriate way to make sure that their patents and patent issues were appropriately labeled in software using their schema.
---
Mod me down, you fucking twits. Go ahead. I dare you.
(I read with sigs off.)
Actually, I don't find much of an advantage. In my experience, even if you are trying to extract a tiny file from a large archive, it still seeks through the majority of the zip file, and is only slightly faster than uncompressing the entire thing.
Despite that, tar and gzip could be even better. A little programming and you could modify tar to transparently compress individual files with gzip/bzip2 before adding them to the tar archive. In other words, instead of a "tar.gz" file, you'd have a "gz.tar" file...
Why this hasn't been done yet, I don't know.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant