Microsoft Releases Office Binary Formats
Microsoft has released documentation on their Office binary formats. Before jumping up and down gleefully, those working on related open source efforts, such as OpenOffice, might want to take a very close look at Microsoft's Open Specification Promise to see if it seems to cover those working on GPL software; some believe it doesn't. stm2 points us to some good advice from Joel Spolsky to programmers tempted to dig into the spec and create an Excel competitor over a weekend that reads and writes these formats: find an easier way. Joel provides some workarounds that render it possible to make use of these binary files. "[A] normal programmer would conclude that Office's binary file formats: are deliberately obfuscated; are the product of a demented Borg mind; were created by insanely bad programmers; and are impossible to read or create correctly. You'd be wrong on all four counts."
I would like to point out another good option Joel doesn't have on his list. It's a software called OfficeWriter, from a company named SoftArtisans in Boston. When I last checked/worked there, it was capable of generating Excel and Word docs on the server, and I believe Powerpoint was probably coming relatively soon. Creating a product that can write office documents isn't quite as impossible in terms of labor as Joel is saying.... but it's still way beyond any hobby project. Plus, he is suggesting that you use Excel automation or the like through scripts to create documents on the server, which is a decent suggestion, if you want Excel or Word to constantly crash and lock up your server, and you enjoy rebooting them every day. If you want to do large scale document generation on a server you are going to need something like Officewriter. -Vosotros/Matt
RTFA. That's in the FAQ. Yes they are.
In other words - if you do something related to a spec that isn't covered, it isn't covered. How could it be any different?!
I'm not saying that there aren't any flaws, but this kind of ill informed, badly thought out comment (a.k.a. "+5 Insightful", of course) has little value.
I'd assume it has something to do with the antitrust action the EU was taking. Didn't they order that Microsoft had to open all their protocols/formats?
As far as I remember, they only insisted on protocols (it was on the basis of a complaint from server OS vendors that MS was tying their market-leading desktop OSs to their server OSs and gaining an unfair advantage).
I'm not going to say anything against the Microsoft doc; he's pretty much absolutely right and it's a great introduction to why older formats are how they are in general to boot.
The Hungarian thing – no, I still don't see it. Hungarian should not be used in any language which has a reasonable typing system; it's essentially adding unverifiable documentation to variable names in a way that is unnecessary, in a language which can verify type assertions perfectly well. The examples in the article are just ones where good variable naming would have been more than sufficient. It's not good enough.
Oh god I've started another hungarian argument.
Did you read the article? Nah, why would you do so for some MS bashing.
.doct format, and did a surprisenly good job.
If you read the article you would notice that the binary solution of winword 97 (and in fact it is compatible with it predecessors) was a good solution in 1992 when word for windows 2.0 was created. Machines did have have less memory and processing power that your phone, and still had to be able to open a document fast.
my conclusion is that the open office devs are crazy that they ever supported the word
Actually, I think they're releasing it now because they were ordered to in a (European?) court settlement, not because they want to.
http://en.wikipedia.org/wiki/Hungarian_notation
http://www.mhall119.com
Among other issues, borderlayoutmanager did not behave properly in MS's implementation. It was buggy in incompatible ways, but your right, that in and of itself wasn't the big problem. The big problem was their insistence on both not fixing the bugs, and not going along with major initiatives (such as JFC/Swing).
If by "2 or 3 years" you mean about 5 years, then I'd agree. Java development tools didn't really reach maturity until things like Eclipse came onto the scene about 5 years ago.
Throw the bums out!
"Programmers didn't understand why Hungarian originally used his famous notation"
....
Uhh.. There was never a "Mr. Hungarian"
It was invented by Charles Simonyi and the name was both a play on "Polish Notation" and a resemblance to Simonyi's father land (Hungary) where the family name precedes the given name.
>As PJ pointed out over on Groklaw, MS are giving a "Promise"
>not to sue but this is very very far from a license.
Some (hypothetical?) questions:
What would happen if those patents in some way was transfered to someone else?
Despite the promise, are you still actually infringing the patent? Just with an assurance of the current patent holder that he won't do anything?
If so, what would happen if it becomes criminal to break a patent (it was quite close to be part of an EU directive not so long ago)? Together with such suggestions one have also seen sugestions that police should be allowed (and required?) to act on those crimes even without a filing from someone suffering infringment. How would that apply to a situation with such a promise?
It's not the language that makes it obsolete, it's today's IDEs.
First, understand that nearly every bit of "Hungarian Notation" you've ever seen is misused. The original set of prefixes suggested by Simonyi were designed to convey the PURPOSE of the variable, not simply the data type. It was adding semantic data to the variable name.
This is still valuable today.
However, in days of lesser IDEs, the more common use of Hungarian Notation is still helpful, as it was a lot more work to trace a variable back to it's declaration to identify the type.
At my company, our users do that every day. Excel spreadsheets embedded in Word or PowerPoint, Microsoft office Chart objects embedded in everything. It's what made the Word/Excel/PowerPoint "Office Suite" a killer app for businesses. MS Office integration beat the pants of the once best-of-breed and dominant Lotus 1-2-3 and WordPerfect. When you embed documents in Office, instead of a static image, the embedded doc is editable in the same UI, and can be linked to another document maintained by somebody else and updated automatically. It saves tremendous amounts of staff time.
Ah, marketing. Where would we be without it?
Microsoft developed J/Direct specifically to make Java non-portable to other OSs. The MS JVM wasn't better than Suns, it was just tied heavily into the OS, and code developed for it broke if run on any other VM.
J++ was another lockin tool to ensure any "Java" developed in Microsoft's IDE would only run on Microsoft OSs. JBuilder was always a better package anyway.
"I've got more toys than Teruhisa Kitahara."
Beware: In C++, your friends can see your privates!
Joel worked on the Excel team.
Coder's Stone: The programming language quick ref for iPad
Ok, I was going to respond to this but I will not get dragged into another one of these discussions. It's worse than tabs vs. spaces, I tells ya.
I have to disagree, tabs and spaces are easily handled with an "indent" program.
On VERY LARGE projects where there are hundreds of include files and hundreds of source files, it is not convenient or even possible in all cases to find the definition of an object that may be in use.
Context and type information in the name makes it easier to quickly read a section of code:
for(int ndx=0; ndx nLimit; ndx++)
{
pnUsrData[ndx] = pnReceived[ndx];
}
To anyone versed in your prefixing, it is easy to see pnUsrData is an array of integers, and we are assigning values from another array of integers.
However:
for(int ndx=0; ndx nLimit; ndx++)
{
pnUsrData[ndx] = foobar[ndx];
}
In the above, it is clear we are assigning data to elements in an integer array from a subscript on an object, but what kind of object? Where do we find its definition?
Now, renamed it looks like this:
for(int ndx=0; ndx nLimit; ndx++)
{
pnUsrData[ndx] = mytypeFoobar[ndx];
}
No we can see it is a "mytype" object and we can easily find its reference and declaration.
That's what Hungarian notation provides and it is not useless, IMHO, it's over zealous use made code less readable. Rather than give hints, zealous proponents attempted to create a whole new language for specifying variable and function names that was virtually impenetrable.
You better believe it costs Microsoft quite a bit to keep it around. At the lowest level, having the codebase that big means the tools and practices needed to manage it have to be equal to the task. Here's a hint: MS does not use SourceSafe for the Office codebase. (They use the Team tools in visual studio, so they do eat their own dogfood, but not the lite food).
Far more insidious is the technical debt incurred by carrying around that backwards compatibility with Version-1-which-supported-123-bugs-and-all. Interdependencies that mean a bug either can't be fixed without introducing regressions, or can only be fixed dint of a complex scheme involving things like the 1900 vs. 1904 epoch split that Joel discusses.
Oh yes, it costs a small fortune to carry around that baggage, and only a company as big as Microsoft with Microsoft's revenues can afford it. The price might seem like 'nothing' in the billions of dollars that flow in and out of Microsoft, but ignoring the elephant in the room doesn't make the elephant go away.
No argument there.
The summary also points out with links to why this release might not actually indicate MS is really releasing their formats to break with that past after all.
No. The article doesn't make that claim. That's your own interpretation. The overall intent of the article is simply to convey a few simple points:
1) Why the MS office document format is so crufty (minus conspiracy theories).
2) How to work *with* the Windows OS to use those documents.
3) How to use better, more open, alternatives to creating office documents.
Nothing in the article contradicts anything I said earlier.