Slashdot Mirror


Microsoft Releases Office Binary Formats

Microsoft has released documentation on their Office binary formats. Before jumping up and down gleefully, those working on related open source efforts, such as OpenOffice, might want to take a very close look at Microsoft's Open Specification Promise to see if it seems to cover those working on GPL software; some believe it doesn't. stm2 points us to some good advice from Joel Spolsky to programmers tempted to dig into the spec and create an Excel competitor over a weekend that reads and writes these formats: find an easier way. Joel provides some workarounds that render it possible to make use of these binary files. "[A] normal programmer would conclude that Office's binary file formats: are deliberately obfuscated; are the product of a demented Borg mind; were created by insanely bad programmers; and are impossible to read or create correctly. You'd be wrong on all four counts."

259 comments

  1. 14$7 P0$7 by Anonymous Coward · · Score: 0

    The original post is brought to you by the Microsoft corporation

  2. Joel by Mario21 · · Score: 2, Insightful

    Joel's articles are a joy to read. No matter what time I receive the email about a new article by Joel, it will be read on the spot.

    1. Re:Joel by zootm · · Score: 1, Insightful

      I agree to some degree, but as a slight contrary point I find his silly insistence that Hungarian is a "good thing" and his constant pimping of FogBugz (especially the "this is usually a bad idea, but it's alright when we do it!" attitude of some of the posts to be a little annoying. He's definitely smart and makes a lot of sense though.

    2. Re:Joel by AKAImBatman · · Score: 4, Insightful

      If you actually read the article, he's right. His point is that the use of Hungarian notation has been bastardized beyond believe. Programmers didn't understand why Hungarian originally used his famous notation, and thus tend to make an error every time they attempt to replicate his work. That's why we have tons of Java programs that look like crap due to some foolish programmer mindlessly following Hungarian Notation.

      On the subject of the Office Document format, I believe that everything he says is also true; but with a few caveats. The first is the subject of Microsoft intentionally making Office Documents complicated. I fully accept (and have accepted for a long time) that Office docs were not intentionally obfuscated. However, I also accept that Microsoft was 100% willing to use the formats' inherent complexity to their advantage to maintain lock-in. The unnecessary complexity of OOXML proves this.

      The other caveat is that I disagree with his workarounds. He suggests that you should use Office to generate Office files, or simply avoid the issue by generating a simpler file. There's no need to do this as it's perfectly possible to use a subset of Office features when producing a file programatically. Libraries like POI can produce semantically correct files, even if they aren't the most feature rich.

    3. Re:Joel by zootm · · Score: 4, Informative

      I'm not going to say anything against the Microsoft doc; he's pretty much absolutely right and it's a great introduction to why older formats are how they are in general to boot.

      The Hungarian thing – no, I still don't see it. Hungarian should not be used in any language which has a reasonable typing system; it's essentially adding unverifiable documentation to variable names in a way that is unnecessary, in a language which can verify type assertions perfectly well. The examples in the article are just ones where good variable naming would have been more than sufficient. It's not good enough.

      Oh god I've started another hungarian argument.

    4. Re:Joel by richlv · · Score: 1

      the constant bragging and plugging of his own product makes me want to stay away from it as much as possible.

      --
      Rich
    5. Re:Joel by mhall119 · · Score: 4, Informative

      Programmers didn't understand why Hungarian originally used his famous notation It wasn't created by some guy named "Hungarian", it was created by Charles Simonyi.

      http://en.wikipedia.org/wiki/Hungarian_notation
      --
      http://www.mhall119.com
    6. Re:Joel by Jamu · · Score: 0, Redundant

      Taking bad examples of code and using them as proof that the other method is good is hardly a convincing argument for Hungarian. I can see how it's useful for some langauges, but C++? Someone enlighten me: What is a convincing argument for using Hungarian in a strongly typed language?

      --
      Who ordered that?
    7. Re:Joel by mike_sucks · · Score: 1
      Ack, no. They always appear to be really nifty on the surface, but they always go wrong on the details.

      Take this article, for instance - sure, he's right that trying to implement support for these specs is futile. It's the same reason why Office's OOXML "standard" is a joke. But he didn't really need to spend 6 pages saying so. And sure, the workarounds are fine if you're a Windows shop, but workarounds #2 and #3 are not simple "half day of work" if you have no experience with Microsoft technologies - it's weeks at least. You're much better off using an existing free or proprietary library to do the work in whatever environment you are familiar with.

      And 2 days work to allow for an adjustable epoch? To add a constant to a parsed number? I lol'ed! For someone who believes in metrics based software development, he sure seems to be pulling a lot of numbers out of his arse.

      I like his stuff when I first came across it, but there's too much cool-aid, not enough reality.

      /Mike

      --
      -- "So, what's the deal with Auntie Gerschwitz et all?"
    8. Re:Joel by Anonymous Coward · · Score: 3, Interesting

      Hungarian should not be used in any language which has a reasonable typing system;

      That's "Systems Hungarian" in the original article, and you are correct.

      "Apps Hungarian", which adds semantic meaning (dx = width, rwAcross = across coord relative to window, usFoo = unsafe foo, etc) to the variable, not typing, is what is good and what he is advocating. It is exactly "good variable naming". You can see that you shouldn't be assigning rwAcross = bcText, because why would you turn assign a byte count to a coordinate even though they're both ints. The article is quite good really. How relevant it is in a .NET/Java world is another discussion entirely.

    9. Re:Joel by mOdQuArK! · · Score: 1

      You're not parsing his (Joel's) article correctly. The Hungarian notation that everyone learned to hate is not the type of notation which was originally proposed.

      He describes the original form of Hungarian notation as a way to add a concise description of how the data that a particular variable is holding is intended to be used, NOT just a way to restate the type info already maintained by the compiler.

      Way, way, down in the article he has a short blurb which has some short examples of how the original notation was meant to be used:

      Apps Hungarian had very useful, meaningful prefixes like "ix" to mean an index into an array, "c" to mean a count, "d" to mean the difference between two numbers (for example "dx" meant "width"), and so forth.

      He contrasts that with the way that people ended up understanding Hungarian notation:

      Systems Hungarian had far less useful prefixes like "l" for long and "ul" for "unsigned long" and "dw" for double word, which is, actually, uh, an unsigned long. In Systems Hungarian, the only thing that the prefix told you was the actual data type of the variable.
    10. Re:Joel by encoderer · · Score: 3, Informative

      "Programmers didn't understand why Hungarian originally used his famous notation"

      Uhh.. There was never a "Mr. Hungarian" ....

      It was invented by Charles Simonyi and the name was both a play on "Polish Notation" and a resemblance to Simonyi's father land (Hungary) where the family name precedes the given name.

    11. Re:Joel by mike_sucks · · Score: 1, Interesting
      All design patterns are workarounds for missing language features. See GTK's use of an object oriented pattern in C, for example. Hungarian is a design pattern (well, naming convention, but same thing) for the same weakly typed language: C.

      Modern languages are strongly typed and hence will tell the programmer when they've screwed up, at compile time or later. So there's no need for Hungarian in these languages, much like C# or Java and maybe even C++ now has built in support for object oriented programming.

      So again, Joel spins something that was useful historically as being something that is still essential, even though it is now completely redundant. This man is a living, breathing excuse for poor practices based on historical, obsolete artifice.

      Now, to get to your point - it is moot that some programmers are using some bastardised version of Hungarian, because even if done correctly it is now a waste of time when using a modern programming language. It only contributes by making a program harder to read, hence increasing complexity and reducing maintainability rather than providing any actual benefit.

      /Mike

      --
      -- "So, what's the deal with Auntie Gerschwitz et all?"
    12. Re:Joel by cp.tar · · Score: 1

      I'm not going to say anything against the Microsoft doc; he's pretty much absolutely right and it's a great introduction to why older formats are how they are in general to boot.

      The Hungarian thing – no, I still don't see it. Hungarian should not be used in any language which has a reasonable typing system; it's essentially adding unverifiable documentation to variable names in a way that is unnecessary, in a language which can verify type assertions perfectly well. The examples in the article are just ones where good variable naming would have been more than sufficient. It's not good enough.

      Oh god I've started another hungarian argument.

      Hungarian notation has nothing to do with typing systems.
      Hell, I'm barely a novice programmer, but even I can see that.

      Hungarian notation is a good variable naming practice — as long as you use it to mirror internal program semantics, not create redundant typing information.

      So far, I have tried to implement something similar to Hungarian notation in most of my programs; this article taught me a thing or two more, though some aspects touch on things way beyond my level.

      Anyway, his article on Hungarian notation and — more importantly — visual code review in general reminds me of feature checking in Chomskyan syntax... easy, mechanical, and rather foolproof if implemented properly.

      --
      Ignore this signature. By order.
    13. Re:Joel by encoderer · · Score: 3, Informative

      It's not the language that makes it obsolete, it's today's IDEs.

      First, understand that nearly every bit of "Hungarian Notation" you've ever seen is misused. The original set of prefixes suggested by Simonyi were designed to convey the PURPOSE of the variable, not simply the data type. It was adding semantic data to the variable name.

      This is still valuable today.

      However, in days of lesser IDEs, the more common use of Hungarian Notation is still helpful, as it was a lot more work to trace a variable back to it's declaration to identify the type.

    14. Re:Joel by encoderer · · Score: 1

      When done properly, it has nothing to do with being strongly or weakly typed. It has nothing to do with knowing when you've "screwed up."

      The original set of prefixes suggested by Simonyi were designed to convey the PURPOSE of the variable, not simply the data type. It was adding semantic data to the variable name.

      Outside of HN, the only way to include this semantic information in all the super excellent languages you mentioned is by adding a comment after the variable declaration.

      That's do-able, though. Not because of the LANGUAGE, but because of the IDE, where it's trivial now for the IDE to take you back to the declaration of a given variable and then right back to the last position in the codebase.

    15. Re:Joel by mike_sucks · · Score: 1
      If you've got to use an IDE to find the definition of a variable in your codebase, you have a much bigger problem.

      Still, how is "rwPosition" any better than "rowPosition"? (from the Wikipedia article) Sure, "i" is kinda ambiguous, but use a modern for-loop instead and get rid of it altogether. Again citing Wikipedia, some of Simonyi's suggested prefixes added semantic information, but not all.

      I'll say it again: Hungarian is pointless in a modern language.

      /Mike

      --
      -- "So, what's the deal with Auntie Gerschwitz et all?"
    16. Re:Joel by Anonymous Coward · · Score: 0

      Uhh.. There was never a "Mr. Hungarian" ....
        Oh, but there is!

      Try the goulash, eh?
    17. Re:Joel by AKAImBatman · · Score: 1

      how is "rwPosition" any better than "rowPosition"?

      They're not any different. The only reason to use the former format was to save keystrokes in the days before auto-completion. If Simonyi* invented the concept today, he would have used rowPosition rather than rwPosition**.

      The thing is, he was working back in the days when programmers regularly used single character variables to save keystrokes as well as to keep their code within 80 columns. (i.e. DOS console resolution.) So he tried to push a semantic standard that communicated information about the code, and it got all screwed up in the translation.

      * Thanks for the correction, guys
      ** In fact, Simonyi would have used rwPos
    18. Re:Joel by zootm · · Score: 0

      I am not getting dragged into another argument about Hungarian, but I should say that Hungarian, as Joel uses it in his article and as it is typically used, is replicating what should be type information. Using some form of additional notation can be useful but obscure minimal character sequences are rarely justifiable as a means to do it.

    19. Re:Joel by mike_sucks · · Score: 1
      I'll say it one more time for the peanut gallery: It was historically useful, maybe, but today is pointless.

      /mike

      PS: 80 col display terminals were around long before DOS - VT100's ran in either 80 or 132 col mode in the 70's.

      PPS: If your code today needs more than 80 cols (or arguably 132 cols), you have bigger problems.

      --
      -- "So, what's the deal with Auntie Gerschwitz et all?"
    20. Re:Joel by zootm · · Score: 0, Troll

      I read it fine, I think. The fact is that there's nothing more complex or less useful about using i, j, k (this is pretty much convention now) or "blahIndex" for indices, using "blahCount" for count, and the "d" prefix is pretty much a maths thing (delta) so it's common enough anyway.

      I really don't want to get dragged into another one of these arguments, though.

    21. Re:Joel by AKAImBatman · · Score: 1

      It was historically useful, maybe, but today is pointless.

      So you're telling me that you never use variable names like "xdiff", "rowStart", "tabName", "currentRow", or some other combination of semantic meaning combined with a noun?

      Any programmer worth his salt uses names that are descriptive. And many of those names happen to align with Simonyi's original idea. In fact, he didn't originate the concept so much as bring it over from his work with Smalltalk.

      80 col display terminals were around long before DOS - VT100's ran in either 80 or 132 col mode in the 70's.
      Which is irrelevant to why they were coding that way. Microsoft Office was created for DOS (and later Windows) PCs. Thus the 80 column limitation was imposed by DOS, regardless of the history behind the chosen width. (In fact, IBM chose 40/80 columns to match their mainframe terminals. The PC was designed to be usable as a VT100 terminal in a pinch.)

      If your code today needs more than 80 cols (or arguably 132 cols), you have bigger problems.

      Says you. Indenting is four spaces per indent. 3 levels deep (Class -> Method -> Control structure) uses up 12 columns right there. Your code has to thus fit in 68 characters, which is a difficult thing to do when class names often exceed a dozen characters. Regardless of the spacing, here is a common line in Java that chews up 87 characters:

      BufferedReader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
      I could separate it into individual variable assignments for the FileInputStream and the InputStreamReader, but why? There's no value in hanging on to those references, and the line is easy enough for another coder to read.

      Feel free to throw around a few more insults like "peanut gallery" around, though. They're really helping you look intelligent and thoughtful. :-/
    22. Re:Joel by F452 · · Score: 2, Funny

      Close, but not quite.

      It was actually all started by cHarles Hungar, and thus the "Hungarian" label.

    23. Re:Joel by AKAImBatman · · Score: 1

      I read it fine, I think.

      Your posts suggest otherwise. In fact, I think you read the first part of Joel's article and never got to the portion where he turned it all around. Joel himself argues against common Hungarian notation. (i.e. Systems Hungarian) That doesn't seem to be percolating through your noggin'. :-)
    24. Re:Joel by zootm · · Score: 1, Insightful

      No, I think that the uses he proposes for Apps Hungarian are better handled by a typing system, in languages which support such things. Obviously all sorts of hilarious cludges can be used in languages where you're dealing with insufficiently-typed data.

    25. Re:Joel by mike_sucks · · Score: 1
      You have no idea how much something like:

          Row rowCurrent = getCurrentRow();

      irks me, but you might see how redundant it is. _Of course_ choosing variable names is important, but Hungarian is a very specific notation, the examples above (including yours) are not Hungarian.

      You need to find a balance between too terse and too verbose. Using Hungarian today can fall down in both ways. Done correctly it is too terse ("rwCur" anyone?) and if not done correctly, is redundant (see examples above) and hence too verbose.

      Sure, if rowCurrent was a pointer to an unsigned int, then it would be a useful variable name, but if not there is little point. Welcome to the 21st century.

      WRT my PPS, there a lot of good reasons to have 80 cols max. It means your code is emailable (or do you use HTML mail? /me lols). It keeps function/method complexity down by limiting nested blocks. It means you can fix stuff reasonably well on a serial console. It keeps code readable (c.f. newspaper column widths). I bet you also have your web browser windows maximised, all the time. Tisk, tisk.

      If you're a Java developer, you may be interested to know that Sun's Java code conventions specify 80 cols max - so it's not just 'sez me'.

      /Mike

      PS: You java streams are a good example of poorly encapsulated code. Separate your concerns and the 80 col "problem" goes away.

      PPS: Even excluding the encapsulation problem, why wouldn't you just do "new BufferedReader(new FileReader(file));"? Or not bother with the BufferedReader at all given that 99% of the time is is needlessly used?

      --
      -- "So, what's the deal with Auntie Gerschwitz et all?"
    26. Re:Joel by encoderer · · Score: 0, Redundant

      Yeah, that's wrong.

      I'm sorry, bro.

      It just is.

      http://en.wikipedia.org/wiki/Hungarian_notation

    27. Re:Joel by Anonymous Coward · · Score: 0

      I really don't want to get dragged into another one of these arguments, though.

      Yet this is the third post where you've said exactly that, although this one didn't say it in italics like a previous one did. :-)

    28. Re:Joel by AKAImBatman · · Score: 1

      Obviously all sorts of hilarious cludges can be used in languages where you're dealing with insufficiently-typed data.

      Types, types, types, you keep talking about TYPES! That's where you're running into trouble. App Hungarian is about semantics, not typing. An object system can avoid App Hungarian style through contextual usage in many circumstances, but it doesn't always make sense to create a complete object hierarchy for every problem.

      Granted, row systems can be contextualized like this:

      public class Rows
      {
        private Row current;
       
      ...
      }
      Having a collection of rows with an index into a "current" item allows me to avoid having a variable like "currentRow" or "rwCur". (The latter being closer to original App Hungarian.) But there are situations where the information is transient enough to be in a procedural loop. This is where notation similar to App Hungarian-style is still warranted:

      int pos = 0;
      int startPos = myString.indexOf(delimLeft)+1;
      int endPos = myString.indexOf(delimRight, startPos);
       
      for(int i=startPos; i<endPos; i++)
      {
      //do logic
      }
      Granted, startPos and endPos would have been something like psStart and psEnd to keep with the App Hungarian notation, but the concept itself was not flawed. What was flawed was the translation to System Hungarian which was based on type rather than semantic meaning. In fact, System Hungarian becomes redundant as you'd end up with variable like "iStartPos" and "iEndPos". Which is stupid, because the context of the variables already communicates the typing information.
    29. Re:Joel by zootm · · Score: 1

      Using "pos" or "position" is what I meant earlier (I hope in this particular thread!) by "sensible naming conventions". Hungarian in this case wins you nothing other than a naming scheme that is more opaque and requires training, whereas suffixing with "pos" your intention is more clear. The "benefits" of Hungarian which stem from this sort of use-case are benefits of sane naming, which gain nothing from the over-formalisation provided by Hungarian. The rest of the benefits of Hungarian are better realised by types, which was my real point. Maybe I wasn't clear enough with that earlier.

    30. Re:Joel by encoderer · · Score: 1

      Yes, your example shows some redundancy:

              Row rowCurrent = getCurrentRow();

      But I have 2 thoughts on this.

      1. You picked out one of the easiest straw men to knock down. Compare that to genuinely useful prefixes like, say, "us" to indicated an unsafe string.

      2. The code snippet is certainly redundant, but what's better?

      Row current = getCurrentRow();

      ???

      It's fine, until you use the 'current' variable 100 lines later in the procedure. One day the debugger throws an error on the line and you have to locate the declaration and figure out wth "current" is.

      I believe in writing self-documenting code. And "current" just does not do it.

      Now...

      I'm sure you'll give me some nice CS201 line like "if you have a 100 line procedure you've got bigger problems" (akin to the CS201 platitude you threw-out there a few posts upthread).

      In theory it's nice to have a procedure no longer than a single screen. In practice, some algorithms just don't allow for expression in 100 lines in any way that approaches being readable and self-documenting.

      However, this shows it's head more clearly when dealing with properties.

      If rowCurrent is a property, it's a lot more acceptable to even the pedantic-CS201 types that the declaration will not be in the same vicinity as the calling code.

      When you're working with a file with dozens or hundreds of includes, finding the class definition and then the property declaration within it is certainly possible but it's nowhere near convienant.

      Again, modern IDE's alleviate this for you, NOT modern languages.

      In summary, I stand by my original point.

    31. Re:Joel by Anonymous Coward · · Score: 0

      What the heck... you got 1 mod of "Overrated" for that? Overrated must be the new "I disagree" button.

    32. Re:Joel by syousef · · Score: 1

      It wasn't created by some guy named "Hungarian", it was created by Charles Simonyi.

      Thank you! I was beginning to think I was in the twilight zone, or taking a history less in Bill and Ted's school of Excellent History. (Party on dudes!) In today's episode 2 rednecks argue the value of special relativity and Hungarian notation.

      --
      These posts express my own personal views, not those of my employer
    33. Re:Joel by maxume · · Score: 1

      Please forward Wedding date and time.

      I have a plan.

      --
      Nerd rage is the funniest rage.
    34. Re:Joel by Anonymous Coward · · Score: 0

      If you search the article for the word Hungarian it's not there.

      I have no idea what you're talking about.

      The bottom line is that these idiots sacrificed correctness for efficiency.

      Even my students know better than that.

    35. Re:Joel by AKAImBatman · · Score: 1

      You have no idea how much something like:

      Row rowCurrent = getCurrentRow();

      irks me, but you might see how redundant it is.

      As another poster already pointed out, your example is a bit of a strawman. It is indeed redundant, because it replicates information that the class is already contextually aware of. Seeing that code would probably irk me as well, but a linked list like this wouldn't:

      Row currentRow = Rows.getHead();

      ...

      currentRow = currentRow.next();

      You need to find a balance between too terse and too verbose. Done correctly it is too terse ("rwCur" anyone?) and if not done correctly, is redundant (see examples above) and hence too verbose.

      The original Hungarian was just the right amount of terseness for its time. I wouldn't be caught dead using a variable like rwCur in Java, but it was quite a bit better than many of the variable schemes of the day. The core concept lives on in the more verbose code that modern coders write, though in a rather modified form. One of the biggest changes was to put the noun and adjective back into proper order. (As mentioned in the Wikipedia article, the name "Hungarian Notation" was a bit of a play on "Reverse Polish Notation".)

      do you use HTML mail? /me lols

      /me says, "Welcome to the 21st century." My emails all contain rich text and auto-wrapping. Modern email systems make this simple and straightforward. (e.g. GMail) I try not to embed code in my emails, but HTML email makes it a lot simpler when I do. I can make the code stand out from the text, containing the right font and formatting. I know that it won't get snipped off or weirdly wrapped when someone forwards it or replies, unlike traditional 80-column text email.

      FWIW, I agree with your sentiment that code should be kept horizontally constrained. However, it is a loose constraint with some variance. Hard constraints are artificial and can damage code legibility as much or more than they help.

      It means you can fix stuff reasonably well on a serial console.

      I don't understand. Is your terminal program not VT100 compatible? You can't scroll left and right in your instance of VI? Or resize your terminal window to provide more than 80 lines of height? A serial console is just a hardwired terminal connection. You don't have to put up with artificial constraints like 80 columns here in the amazing 21st century.

      If you're a Java developer, you may be interested to know that Sun's Java code conventions specify 80 cols max

      If you're a Java developer, you know that there are issues with Sun's Java coding standards. e.g. The original set of standards in Java 1.0 called for package names like "COM.Sun.stuff.ClassName". The current code conventions also suggest the K&R style of braces, which I absolutely despise. (The Allman style is (IMHO) far superior.)

      The standards are also rife with examples of Hungarian Notation (even though they never call it out) and suggest using a space between a cast and the variable name (which divorces the variable from its cast).

      You java streams are a good example of poorly encapsulated code. Separate your concerns and the 80 col "problem" goes away.

      This statement makes absolutely no sense. Wrapping a primitive stream with a logical stream has nothing to do with SOC. Worse yet, you make no real statement about how it should be written. Should I declare separate variables for each level of stream? Why?

      why wouldn't you just do "new BufferedReader(new FileReader(file));"?

      There's no reason not to. I used FileInputStream as a placeholder for a primitive stream. I often run into situations like this:

      BufferedReader in = new Buffere

    36. Re:Joel by zootm · · Score: 1

      In my experience, "Troll" seems to fulfil that function too. Ah well.

    37. Re:Joel by zootm · · Score: 1

      Yeah, I know. I felt I should follow up regardless. It was more of a warning that if it got out of hand (as these things typically do) I would just cut it off :)

    38. Re:Joel by Your.Master · · Score: 1

      Apps Hungarian is one of many possible realization of sensible naming conventions. After all, Hungarian *is* just a naming convention, so you're not making an argument when you say that Hungarian is bad because it can be replaced by sensible naming conventions. You have to define why Hungarian is insufficiently sensible.

      The reason it's formalized is so that it can be consistent within very large projects (and with something as well-known as Hungarian, even between projects), which reduces a lot of mental overhead. You can come up with another consistent system and train onto that, or you can just live with the fact that you get StartPosition, StartPos, StartPoint, Start, FirstIndex, LowerBound, etc.. Hungarian says:

      1. Abbreviate the hell out of data semantics. Nobody wants to reread it all the time. StartPosition --> StartP.
      2. Do not re-use data semantic abbreviations, and do not use multiple semantic abbreviations for one semantic concept. StartP --> StartPs (because p usually means pointer -- some might go to StartPos).
      3. Always put the general, common semantics first, and further descriptions last. In spoken English some concepts make sense in one order and some make sense in the other. For clarity's sake, we'll pick a standard within the codebase. StartPs --> PsStart.

      The only difference between suffixing pos and apps Hungarian is that you have chosen to use suffixing as a standard instead of prefixing. Which is a valid choice, but certainly not something that makes Hungarian bad. Generally you put the part you care more about first. If your application, and your group's development style, cares more about semantics than specifics, then put the semantics first. You can use other UI guidelines to choose between names like start, begin, lower, or first, so that the code is consistent and immediately readable by team members unfamiliar with your part of the code. Systems Hungarian in strongly typed languages is an outdated (given IDEs) perversion of a portion of the UI coding style guidelines of a particular Microsoft project. There are others. But coding style guidelines for large teams aren't inherently evil.

    39. Re:Joel by harlows_monkeys · · Score: 2, Insightful
      There are two kinds of Hungarian notation. One is the type that adds type info. For example, prefixing longs with l. As you note, that is pointless. In fact, it is worse than pointless--because if the type of the variable is changed, it might be too much of a hassle to change the name everywhere, and you end up with the notation actually misleading.

      The second type doesn't add type information. It adds meaning information. For example, an index to a table row might be rowIndex. An index to a column might be colIndex.

      This form of Hungarian is not pointless. When you see selectedRow = colIndex in code, it makes the error (some doofus used a column index where a row index was needed!) easy to see. In a sense, it is still adding type information, but it is type information at a level above what the compiler provides (for many languages, at least). This kind of Hungarian helps document the code, and is generally a good thing.

    40. Re:Joel by Anonymous Coward · · Score: 0

      Does this "Charles Simonyi" have any connections with Microsoft?

    41. Re:Joel by joto · · Score: 1

      Hungarian notation has nothing to do with typing systems. Hell, I'm barely a novice programmer, but even I can see that.

      There are plenty of things that are obvious to the novice programmer, but not to experienced programmers. Then the novice becomes enlightened. Hungarian notation is type-information. The problem is that your view of types is too limited.

      You are used to view types as what the programming langauge you are currently using, provides for you, or lets you make through e.g. classes. This is a very restricting view. There's no reason types can't be used for other things, such as horizontal versus vertical, litres versus gallons, meters versus kilograms, coordinates versus tuples, or just about any form of contextual information. Just because your programming language doesn't offer you any practical way of doing this, doesn't mean that you can't view contextual information as "types", it simply means that you can't implement it in code without resorting to hacks like hungarian notation.

      Hungarian notation is a good variable naming practice -- as long as you use it to mirror internal program semantics, not create redundant typing information.

      Or in more general terms: Strict rules are good, as long as you use the right rules. Whether you choose Hungarian notation or not, the important thing is to use common sense, not Hungarian notation.

    42. Re:Joel by obstalesgone · · Score: 2, Insightful

      If we use the prefix to convey the purpose of the variable, what are we supposed to use the rest of the variable name for?

    43. Re:Joel by cp.tar · · Score: 1

      Or in more general terms: Strict rules are good, as long as you use the right rules. Whether you choose Hungarian notation or not, the important thing is to use common sense, not Hungarian notation.

      While we obviously use different meanings of the term "type", I most heartily agree with this part; this is precisely what I was aiming for.

      --
      Ignore this signature. By order.
    44. Re:Joel by Anonymous Coward · · Score: 0

      If you search the article for the word Hungarian it's not there.
      By your reply, I assume you are a professor of some sort? Here's a hint for you. Why don't you try checking the comment he was replying to, genius? This entire thread is only marginally related to the Office Binary Format article.
    45. Re:Joel by mike_sucks · · Score: 1
      So we're agreed that Hungarian has no place in modern programming (languages)? Good! But I have to take offence at your strawman remark. Your assumption that there will be similarly named variables scattered around the code is as bogus as my cherry picking.

      Yes, I would have trotted out the "100 lines max" rule of thumb because, surprise, it actually does make the code more readable. If you were taught that sort of thing in second year CS, then hurray for CS departments teaching people something useful. _Of course_ there are times when you have to go over it but you know what, but this should be the exception, not the rule.

      But I don't know about the IDE argument. The point of Hungarian was that you don't need to hunt for the declaration, you could tell from a few lines of code in isolation what the value of a variable should be. A modern language won't tell you that either, but at least it will throw a fit at compile time or run time rather than segfaulting five source files and hundreds of lines away from the original bug.

      /Mike

      --
      -- "So, what's the deal with Auntie Gerschwitz et all?"
    46. Re:Joel by poopdeville · · Score: 1

      Types, types, types, you keep talking about TYPES! That's where you're running into trouble. App Hungarian is about semantics, not typing.

      Types are a semantic concept, not a syntactic one. Indeed, for every predicate P, there is a type associated with it. The set of all objects for which P is true is called the extension of P, and this set determines a type; and by construction, members of the the extension are of type P.

      If you have read Programming Ruby, this is why they make such a big deal of saying that "Classes Aren't Types" -- classes, by construction in a message passing object system, don't define extensions. On the other hand, an object's "singleton class" -- the 'class' consisting of the canonical list to the methods to which an object will respond -- is a type.

      So pick a predicate "IsABoolean" and another "IsADifferentiableCurveInTheComplexPlane". Both define types. Neither is a particularly good prefix.

      http://plato.stanford.edu/entries/type-theory/

      --
      After all, I am strangely colored.
    47. Re:Joel by mike_sucks · · Score: 1
      Look, I can't be bothered arguing about 80 cols or HTML in email. It's been done to death and you are arguing it is bad because for reasons of personal taste - which, when it comes to, you can never convince someone that their personal taste is wrong. I should point out however that you are wrong about most of it, ask Google if you can be bothered learning something.

      Now, about those char and byte streams - you never want to directly instantiate a file stream then start using it - you're making the assumption that the bytes live on disk, which is not the case in unit tests or when, say running a web app from a WAR, i.e. breaking encapsulation. So you pointed out your own strawman yourself. Good!

      There are only two times you need to use a BufferedReader (only one for BufferedInputStreams): When you need to call BufferedReader.readLine() or really, really need the performance. 99% of the time you need neither.

      /Mike

      --
      -- "So, what's the deal with Auntie Gerschwitz et all?"
    48. Re:Joel by Anonymous Coward · · Score: 0

      Barely a novice programmer, and already so wrong. Typing is exactly what Hungarian notation is for. It encodes typing information in your variable names, often redundantly.

      Before you go off and tell me that you don't waste your time with "intX" style Hungarian notation, I'll point out that "complexNumberX" is about as informative as intX. That is to say, not at all. If the complexNumber part was informative, you could just drop the X, which ends up being very similar to what sane naming conventions do. Abbreviated notations are even worse -- at least the unabbreviated version is readable.

      I'm rather surprised at how much spaghetti code is evidently written by slashdot readers. Surely, using modern OO systems, there's no reason to have more types than you can remember in a single context/block/scope.

    49. Re:Joel by poopdeville · · Score: 1

      Please read http://plato.stanford.edu/entries/type-theory/.

      Every predicate defines a type. That's what a type is! The extension of a predicate.

      --
      After all, I am strangely colored.
    50. Re:Joel by cp.tar · · Score: 1

      If it is redundant, you're doing it wrong.

      As I said, I'm barely a programmer. But I'm a linguist, so I know a thing or two about morphology, syntax and semantics. And guess what: some natural languages have similar features. And natural languages abhorr semantic redundancy.

      Admittedly, I was wrong when I said "nothing to do with typing"; I meant the typing your language of choice already enorces.
      Hungarian notation is not needed in all contexts, but can be a useful tool for semantic typing beyond the scope of what your language of choice allows you. Besides, intX and complexNumberX examples you gave are almost exactly the same: they just tell you what kind of number you are dealing with, not what it means within your program.

      --
      Ignore this signature. By order.
    51. Re:Joel by zootm · · Score: 1

      There's an alternate thread to this one which is discussing this. The bottom line is I feel that for the set of things where apps Hungarian isn't replicating what could or should be done with types are just as well done by sensible naming that isn't overly-formalised.

    52. Re:Joel by zootm · · Score: 1

      Basically I just don't see what this "consistency" buys you. Abbreviating makes little sense to me – "p" or "ps" or "pos" or "position" are as easy to read as one another; humans tend to read input in terms of words so all you save is horizontal space, which shouldn't really be something which is causing you issues in most languages.

      The only difference between suffixing pos and apps Hungarian is that you have chosen to use suffixing as a standard instead of prefixing. Which is a valid choice, but certainly not something that makes Hungarian bad.

      I don't mind prefixing instead of suffixing, it's the over-abbreviation that bothers me. You need to learn a whole system for something that should have just been stated clearly in the first place. I don't understand the obsession with dropping 3 characters in a variable name.

      But coding style guidelines for large teams aren't inherently evil.

      I agree completely, I just don't see this style guideline as a particularly useful one.

    53. Re:Joel by poopdeville · · Score: 1

      Besides, intX and complexNumberX examples you gave are almost exactly the same: they just tell you what kind of number you are dealing with, not what it means within your program.

      You get my nitpicky point about Hungarian notation with respect to semantic typing. But the consequences don't seem to have sunk in. As we know from our university days, a type is (roughly -- it depends on the axiomatization of your type theory) the extension of a predicate. Equivalently, the type is the predicate. We are free to choose our predicates when defining our types for a program. Perhaps complex numbers weren't a good choice for the example (I happen to think they are, because of the ambiguity). The question is, who is to say that a program doesn't use complex numbers as "primary" objects? Who is to say that a program doesn't use integers as primary objects? You can't tell that just by looking at the Hungarian notation prefix.

      Suppose you have a program that requires both (x,y)-coordinates (say, for screen drawing) and complex numbers (for summing over). Then, presumably, complexNumbers and (x,y)-pairs deserve different sorts of variable names (since they are internally the same data structure but are not functionally interchangeable). So Hungarian notation would prescribe some abbreviated form of ComplexNumberX and OrderedPairX, where X is a context and variable dependent "root" for the variable name. Still, 'ComplexNumberX' is about as informative as 'OrderedPairX' or 'intX'.

      But what's the point? If you don't have to refer to many instances of either by name in a given scope/block/context, you might as well drop the X. And in most cases, you shouldn't have to refer to many variables by name in a given scope/block/context.

      The best solution I've found so far is to just use typing information in your variable names. The "abstract" we-define-the-predicate kind of type is the important part of the variable name. You don't add temperatures to weights unless you know what you're doing, so you really shouldn't expect to see a line like temperature += weight; in any program.
      By default, in this naming scheme, things of the Record class are called 'record' unless there's a name collision in a block/scope/context. Collections of records (with a few benign restrictions) are called 'records'. Adjectives can be used. And so on. Obviously, Hungarian notation wouldn't add to this.

      My opinion about Apps Hungarian notation is that it is a good, but incomplete idea. Yes, it gets at the notion of using "semantic typing". But it misses the key insight: usually, semantic typing information is enough.

      --
      After all, I am strangely colored.
    54. Re:Joel by cp.tar · · Score: 1

      Besides, intX and complexNumberX examples you gave are almost exactly the same: they just tell you what kind of number you are dealing with, not what it means within your program.

      You get my nitpicky point about Hungarian notation with respect to semantic typing. But the consequences don't seem to have sunk in. As we know from our university days, a type is (roughly -- it depends on the axiomatization of your type theory) the extension of a predicate. Equivalently, the type is the predicate. We are free to choose our predicates when defining our types for a program. Perhaps complex numbers weren't a good choice for the example (I happen to think they are, because of the ambiguity). The question is, who is to say that a program doesn't use complex numbers as "primary" objects? Who is to say that a program doesn't use integers as primary objects? You can't tell that just by looking at the Hungarian notation prefix.

      In that case, it is redundant.
      Check what I've said about redundancy.

      But what's the point? If you don't have to refer to many instances of either by name in a given scope/block/context, you might as well drop the X. And in most cases, you shouldn't have to refer to many variables by name in a given scope/block/context.

      The case introduced in the text still seems valid to me.

      The best solution I've found so far is to just use typing information in your variable names. The "abstract" we-define-the-predicate kind of type is the important part of the variable name. You don't add temperatures to weights unless you know what you're doing, so you really shouldn't expect to see a line like temperature += weight; in any program.
      By default, in this naming scheme, things of the Record class are called 'record' unless there's a name collision in a block/scope/context. Collections of records (with a few benign restrictions) are called 'records'. Adjectives can be used. And so on. Obviously, Hungarian notation wouldn't add to this.

      My opinion about Apps Hungarian notation is that it is a good, but incomplete idea. Yes, it gets at the notion of using "semantic typing". But it misses the key insight: usually, semantic typing information is enough.

      Usually != always.

      I see your points. Do you even try to see mine?

      --
      Ignore this signature. By order.
    55. Re:Joel by mazarin5 · · Score: 1

      Thanks for that. I read the GP and thought "Boy, don't I feel silly for assuming that the name had something to do with Hungary."

      I guess I'm double silly now.

      --
      Fnord.
  3. patent promise doesn't sound very good by Timothy+Brownawell · · Score: 4, Insightful

    Microsoft irrevocably promises not to assert any Microsoft Necessary Claims against you for making, using, selling, offering for sale, importing or distributing any implementation to the extent it conforms to a Covered Specification ("Covered Implementation"), subject to[...]
    If your implementation is buggy, does that mean you're not covered?

    To clarify, "Microsoft Necessary Claims" are those claims of Microsoft-owned or Microsoft-controlled patents that are necessary to implement only the required portions of the Covered Specification that are described in detail and not merely referenced in such Specification.
    This sounds like:
    • If there are any optional parts of the spec, those parts aren't covered.
    • If the spec refers to another spec to define some part of the format, that part isn't covered.
    1. Re:patent promise doesn't sound very good by zebslash · · Score: 2, Insightful

      Yes, you know, they are afraid that buggy implementations show their format in a bad light. For instance, that would be like writing your own buggy implementation of Java and then to distribute it in order to contaminate the market with a flawed version, just to show it under a bad light. Oh wait...

    2. Re:patent promise doesn't sound very good by Ed+Avis · · Score: 5, Interesting

      Basically, Microsoft reserves the right to sue you for software patent infringements. So do thousands of other big software companies and patent troll outfits. The new thing now is that Microsoft likes to generate FUD by producing partial waivers and promises that apply to some people in limited circumstances (Novell customers, people 'implementing a Covered Specification', and so on). The inadequacy of this promise draws attention to the implicit threat to tie you up in swpat lawsuits, which was always there - but until this masterstroke of PR the threat wasn't commented on much.

      Ignore the vague language and develop software as you always have.

      --
      -- Ed Avis ed@membled.com
    3. Re:patent promise doesn't sound very good by ContractualObligatio · · Score: 5, Informative

      If there are any optional parts of the spec, those parts aren't covered.

      RTFA. That's in the FAQ. Yes they are.

      If the spec refers to another spec to define some part of the format, that part isn't covered.

      In other words - if you do something related to a spec that isn't covered, it isn't covered. How could it be any different?!

      I'm not saying that there aren't any flaws, but this kind of ill informed, badly thought out comment (a.k.a. "+5 Insightful", of course) has little value.

    4. Re:patent promise doesn't sound very good by julesh · · Score: 4, Interesting

      If your implementation is buggy, does that mean you're not covered?

      That is my primary concern with the entire promise. None of this bullshit not-tested-in-court crap that came up the other day: it doesn't cover implementations with slight variations in functionality.

      This, it seems, is intentional. MS don't want to allow others to embrace & extend their standards.

    5. Re:patent promise doesn't sound very good by Anonymous Coward · · Score: 0, Troll

      Yes, you know, they are afraid that buggy implementations show their format in a bad light. For instance, that would be like writing your own buggy implementation of Java and then to distribute it in order to contaminate the market with a flawed version, just to show it under a bad light. Oh wait... Hurr hurr. The Microsoft implementation of Java wasn't buggy: far from it, it was actually superior to the Sun implementation. It was faster and integrated better with Windows.

      It was that last part that made Sun kill it. They couldn't stand Java running better on Windows than it did on Solaris. (Never mind the fact that Sun's Java to this day runs better on Windows than any other platform.)

      So they sued Microsoft and forced Microsoft to stop updating their Java implementation. Because they were no longer updating Java their implementation fell behind Sun's. Because Sun has this compulsive urge to bloat the Java library, the Microsoft implementation became incomplete over time.

      But back in the day, the Microsoft J++ development environment was far superior to anything Sun had to offer. We're talking a good 10 years ago. Sun has finally managed to catch up in the past two or three years, but still, Sun's problem wasn't that the Microsoft implementation was worse: their problem was that it was better.
    6. Re:patent promise doesn't sound very good by mhall119 · · Score: 1

      In other words - if you do something related to a spec that isn't covered, it isn't covered. How could it be any different?! I think the concern is that the "something related to the spec" is actually something vitally important to the spec.
      --
      http://www.mhall119.com
    7. Re:patent promise doesn't sound very good by jsight · · Score: 5, Informative

      Hurr hurr. The Microsoft implementation of Java wasn't buggy: far from it, it was actually superior to the Sun implementation. It was faster and integrated better with Windows.


      Among other issues, borderlayoutmanager did not behave properly in MS's implementation. It was buggy in incompatible ways, but your right, that in and of itself wasn't the big problem. The big problem was their insistence on both not fixing the bugs, and not going along with major initiatives (such as JFC/Swing).

      But back in the day, the Microsoft J++ development environment was far superior to anything Sun had to offer. We're talking a good 10 years ago. Sun has finally managed to catch up in the past two or three years, but still, Sun's problem wasn't that the Microsoft implementation was worse: their problem was that it was better.


      If by "2 or 3 years" you mean about 5 years, then I'd agree. Java development tools didn't really reach maturity until things like Eclipse came onto the scene about 5 years ago.
    8. Re:patent promise doesn't sound very good by Anonymous Coward · · Score: 2, Insightful

      actually the problem was that microsoft 'extended' it after the previous step of 'embrace' - and continued to call it java.
      These extensions were of course, windows only - which missed the entire point of a cross platform language.

      the old 'embrace,extend,extinguish' strategy has been in the microsoft playbook for quite a while.

    9. Re:patent promise doesn't sound very good by msuarezalvarez · · Score: 5, Insightful

      Hurr hurr. The Microsoft implementation of Java wasn't buggy: far from it, it was actually superior to the Sun implementation. It was faster and integrated better with Windows.

      If their `implementation' different from the specs, then it was not a correct implementation. If it was supposed to be a Java implementation, then by definition it was buggy. If wasn't suppose to be one, then it had no business being called Java. That is why Sun sued them.

    10. Re:patent promise doesn't sound very good by Azuma+Hazuki · · Score: 1

      Where is the "itsatrap!" tag?

      No, seriously. This looks like a patent trap. I didn't read the article, but the sheer number of weasel words in the introduction alone and some posts (like the first, which I'm replying to) make it obvious that this is another attempt to trap and destroy FOSS and its developers. Steer clear!

      --
      ~Eien no Inori wo Sasagete~ Searching for my Hatsumi...
    11. Re:patent promise doesn't sound very good by ozmanjusri · · Score: 4, Informative
      Microsoft implementation of Java wasn't buggy: far from it, it was actually superior to the Sun implementation. It was faster and integrated better with Windows.

      Ah, marketing. Where would we be without it?

      Microsoft developed J/Direct specifically to make Java non-portable to other OSs. The MS JVM wasn't better than Suns, it was just tied heavily into the OS, and code developed for it broke if run on any other VM.

      J++ was another lockin tool to ensure any "Java" developed in Microsoft's IDE would only run on Microsoft OSs. JBuilder was always a better package anyway.

      --
      "I've got more toys than Teruhisa Kitahara."
    12. Re:patent promise doesn't sound very good by zebslash · · Score: 1

      Hurr hurr. The Microsoft implementation of Java wasn't buggy: far from it, it was actually superior to the Sun implementation. It was faster and integrated better with Windows.
      So what? This is exactly the same with their "Open Specification Promise": if you implement an extended or derived version of their specifications you would be in breach with the agreement. Try to release a "superior" implementation of their specs, just to see what will happen.
    13. Re:patent promise doesn't sound very good by Anonymous Coward · · Score: 0

      "Worship Me or I will torture you forever. Have a nice day." God.

      Wow, that is the most profoundly statement. So true in it simplicity. Yet, I've never noticed it! Good one.

    14. Re:patent promise doesn't sound very good by Christopher+Rogers · · Score: 1

      Hurr hurr. The Microsoft implementation of Java wasn't buggy: far from it, it was actually superior to the Sun implementation. It was faster and integrated better with Windows.

      Back at my old job they had a system running in Microsoft's JVM, and it was very well known amongst the company that J++ was a piece of crap. There were problems with the garbage collector, such as if you did something like..

      Object myobj = new Object();
      myobj = new Object();

      ..the first instance would never be garbage collected. You'd have to say myobj = null first before setting myobj again in order to get around this. Regardless, it had to be restarted periodically because memory usage would continue to climb to no end. (The code, too, could be to blame, however.) Unfortunately, the code could not be just simply run on Sun's JVM because it used some COM stuff and some other Microsoft classes specific to J++.

    15. Re:patent promise doesn't sound very good by edwdig · · Score: 1

      It was that last part that made Sun kill it. They couldn't stand Java running better on Windows than it did on Solaris. (Never mind the fact that Sun's Java to this day runs better on Windows than any other platform.)

      I'm going to disagree on Java running better on Windows than Solaris. Did you ever try the Java port of WordPerfect? Using then modern Macs and PCs, calling it unusably slow would be a compliment. You were watching the GUI components redraw individually.

      If you ran it on a several year old low end Sun workstation it was a pleasure to use, indistinguishable performance wise from a native app.

    16. Re:patent promise doesn't sound very good by ContractualObligatio · · Score: 1

      Whether it is a genuine concern or a knee jerk reaction does not change the post from being ill informed and badly thought out.

      Besides, how do you envisage that a file format which is essentially a detailed description of the actual binary data structure is going to be missing something?

    17. Re:patent promise doesn't sound very good by mhall119 · · Score: 2, Interesting

      Besides, how do you envisage that a file format which is essentially a detailed description of the actual binary data structure is going to be missing something? Because I've read the MSOOXML spec, and that's exactly what they did in there. Since MSOOXML seems like a simple translation of the binary format into XML, it would assume that the same important parts of the spec will be missing here.
      --
      http://www.mhall119.com
    18. Re:patent promise doesn't sound very good by ContractualObligatio · · Score: 1

      I'm going on the working assumption that if something doesn't form part of the data that represents the document, then it can hardly be an important part of the spec. Could you give an example of what you'd be worried about?

      I'm also curious about your "simple translation" comment. If it were simple for M$, I can't see why they would have required the intermediary XML format they used. Is there some other proprietary binary -> XML conversion work you've done to explain why it's such a simple process? What are the differences between their two XML formats, anyway?

    19. Re:patent promise doesn't sound very good by infinitelink · · Score: 0, Troll

      ? what the hell are you complaining about...we don't want people extending ODT but COMPLYING with it! The same with Microsoft: they want...THEIR SOFTWARE TO KEEP WORKING WITH ANYTHING LABELLED UNDER THEIR SPEC! DUH. If you change it up, you'd have to change your file and make sure it doesn't say ".doc" or any bull like that. Sheesh. The only reason I sound pissy is this seems like one of those ill-thought comments aiming to attack the big, bad, ogre. When Microsoft tries to pull unethical crap...then complain. But don't complain about something reasonable. Wouldn't you guys be pissed to get an important "opendocument" file and discover someone decided to "extend" it so that you couldn't open it, or open it right?

      --
      Intelligent idiots are we. | Evil men do not understand justice.
    20. Re:patent promise doesn't sound very good by rastoboy29 · · Score: 1

      The FAQ means NOTHING.  Only the license itself matters.

      Don't be such a dupe.

    21. Re:patent promise doesn't sound very good by ContractualObligatio · · Score: 0, Flamebait

      Quite correct, an FAQ is indicative but hardly binding. OK, let's look at the promise itself then:


      Microsoft irrevocably promises not to assert any Microsoft Necessary Claims against you for making, using, selling, offering for sale, importing or distributing any implementation to the extent it conforms to a Covered Specification ("Covered Implementation")


      The "to the extent" clause covers partial implementations and optional sections, which was the first concern. The second concern was logically null - clearly external references that aren't covered, aren't covered.


      That's pretty simple, then. Was there a real point you were trying to make, or is pointing out the obvious the extent of your insight?

    22. Re:patent promise doesn't sound very good by Anonymous Coward · · Score: 0

      I agree that Microsoft was completely in the wrong for having not implemented in their JVM. Sun was right to spank them.

      However, Sun was absolutely retarded to not learn from MS. J/Direct is significantly superior to JNI. It allows you to interact with native code without having to write any native code. It supports stdcall which is a standardized exported function call convention that works on any platform, not just those on Windows. This instantly enables an entire world of interop.

      The interop scenario in .NET is significantly superior to that of Java. You have three inherent mechanisms: P/Invoke, which is effectively the same as J/Direct and permits calls to exported stdcall or cdecl functions in native libraries, COM interop which permits .NET classes and COM classes to be exposed to either technology seamlessly, and IJW which allows native code and managed code to be mixed directly in the same assembly.

      It's funny watching Java trying to play catch up back to .NET. Annotations? Enums? Iteration? Generics? Closures? Sure, they weren't invented by .NET, but .NET suspiciously had them first. It's so bad that there is an IBM Java data project called JLINQ, after the Language INtegrated Query technologies pervasive throughout the .NET 3.5 runtime.

    23. Re:patent promise doesn't sound very good by mhall119 · · Score: 1

      I'm going on the working assumption that if something doesn't form part of the data that represents the document, then it can hardly be an important part of the spec. Could you give an example of what you'd be worried about? There are parts of the MSOOXML spec that reference other specs and/or applications to define how something should be handled. If MSOOXML says "display this like it was displayed in Works97", but they haven't promised not to sue you for implementing something based on the Works97 spec, then what do you do?

      If it were simple for M$, I can't see why they would have required the intermediary XML format they used. I never looked at their intermediary XML, but I was told by others that it was mostly XML tags containing binary content, with additional XML providing some meta-data. I can only assume that the binary data was mostly identical to that of the old binary formats.

      Is there some other proprietary binary -> XML conversion work you've done to explain why it's such a simple process? It's not a simple process to do it right, but it is simple to do it wrong. Reading the XML spec, it seems like they mostly took "bytes 8-11 define attribute x", and converted it into "tag y defines attribute x", and so the document is structured around the process or loading/saving existing C/C++ data structures, and not structured around the data it contains. Also the fact that every bug, hack and work-around that existed in the binary formats exists in MSOOXML without reason. Someone at MS came up with a requirement to convert .doc and .xls into XML, and the developers came up with the easiest solution that met the letter of the requirement.
      --
      http://www.mhall119.com
    24. Re:patent promise doesn't sound very good by ContractualObligatio · · Score: 1

      "If MSOOXML says "display this like it was displayed in Works97"" ?! There is no Works97, for starters. If you mean Word97 then I presume it's escaped your attention that releasing the Word97 format is exactly what The Fucking Article is about?!

      The ability to add custom schemas suggests your simply bullshitting for the rest of the post as well. Not being a programmer, I can't comment as an expert but new functionality like mechanisms for data representation doesn't sound like the kind of thing you'd do in a cheap'n'easy conversion.

      Overall - the bullshit factor seems high on your post.

    25. Re:patent promise doesn't sound very good by mhall119 · · Score: 1

      If you mean Word97 then I presume it's escaped your attention that releasing the Word97 format is exactly what The Fucking Article is about?! Actually I think the specific example I'm thinking of was a reference to WordPerfect.

      Not being a programmer, I can't comment as an expert but new functionality like mechanisms for data representation doesn't sound like the kind of thing you'd do in a cheap'n'easy conversion. Custom schemas is a function of XML itself, not MSOOXML. And Microsoft's binary formats already allowed for integration with external sources via their OLE/Compound Document format. The problem is that not all of those sources are covered by the promise not to sue. It would be like promising not to sue you for rendering HTML, but allowing the possibility to sue you for rendering GIF images, or processing Javascript, you can't build a compatible browser without them.

      Overall - the bullshit factor seems high on your post. You've already admitted not being a programmer, is it safe to assume that you have not read any of the the MSOOXML spec as well? If that is the case, what exactly qualifications do you have to detect bullshit on this topic?
      --
      http://www.mhall119.com
    26. Re:patent promise doesn't sound very good by Anonymous Coward · · Score: 0

      Excuse me for being blunt, but are you a moron, or what?

      You can obviously type and even spell correctly, but you appear to thoroughly miss the parent's point. By misunderstanding or deliberately, is the question in my mind. I have no answer as of yet. Do you?

      To sum up, using your own style: Overall - the clueless, rude moron factor seems high on your post.

      Nay, correction, it *is* high on your post.

      To repeat the advice of another poster:
      - You admit to being a non-programmer; how about you respect the opinions of those who are, just a tad more?
      - You obviously haven't read, and definitely haven't understood, the Microsoft OOXML specification(s).
      - Since you have failed on both accounts above, you'd do well in reconsidering your arrogant style of reasoning about problems associated with implementing (through programming) document formats (related to, but not fully described by, said specifications) before dismissing others.

      Ok?

      Or you can just continue in your current style and remain a clueless, arrogant twit. And moron.

    27. Re:patent promise doesn't sound very good by the_arrow · · Score: 1

      Knowing how it is stored on disk, wether it is the new XML format or the old binary format, doesn't matter. The old binary format does not tell how to actually "display" it.

      --
      / The Arrow
      "How lovely you are. So lovely in my straightjacket..." - Nny
    28. Re:patent promise doesn't sound very good by ContractualObligatio · · Score: 1

      Actually I think the specific example I'm thinking of was a reference to WordPerfect.

      In that case, the question is: why are you criticising M$ for not promising that Corel won't sue?

      Custom schemas is a function of XML itself, not MSOOXML. And Microsoft's binary formats already allowed for integration with external sources via their OLE/Compound Document format. The problem is that not all of those sources are covered by the promise not to sue. It would be like promising not to sue you for rendering HTML, but allowing the possibility to sue you for rendering GIF images, or processing Javascript, you can't build a compatible browser without them.

      You forget the question. Do you actually have a basis for saying OOXML is a simple translation of the binary format? I would say it is not, because the functionality is different. If the custom schema functionality was simply a side effect of XML, for example, then it would be in the ODF spec as well. Clearly the M$ functional spec is not a simple translation. Are you just sticking as many facts down on the page in the hope that one of them might prove your point?

      You've already admitted not being a programmer, is it safe to assume that you have not read any of the the MSOOXML spec as well?

      Of course not. Why on earth would you think that only programmers have an interest in document formats? Are you being ignorant or egotistical?

      what exactly qualifications do you have to detect bullshit on this topic?

      Responding to accusations of bullshit with an ad hominen attack? Not exactly a powerful argument. But to answer your question - I've been a consultant in the software business for eight years, coming up through support and implementation rather than programming. I've had a lot of practice at spotting bullshit. As an example - Microsoft is one of my company's competitors. Because they're a big nasty competitor, it's important to have a reality-based understanding of what they're up to. To develop such an understanding, you have to spot the bullshit from people like you, otherwise you find yourself full of shit when attacking M$ and losing all respect amongst professionals. It's also important to understand the product lifecycle to work with R&D, and it is idiocy to think that such an important change as the Office file format would be treated the way you describe.

      To work effectively as a consultant, it's also useful to spot traits such as the ones your posts demonstrate, such as dodging the question or talking about irrelevancies. It warns you when some people's opinions are unreliable, whether it's due to incompetence or a hidden agenda. My impression in your case is that it's both.

    29. Re:patent promise doesn't sound very good by mhall119 · · Score: 1

      If the custom schema functionality was simply a side effect of XML, for example, then it would be in the ODF spec as well. It is.

      It's also important to understand the product lifecycle to work with R&D, and it is idiocy to think that such an important change as the Office file format would be treated the way you describe It would be idiocy for such an important change to be treated that way, yes, I completely agree. However, anybody with experience in a large development environment with multiple layers of management will know such idiocy is the rule, not the exception. Nothing that Microsoft has delivered in the past would make me thing they are any different.

      It warns you when some people's opinions are unreliable, whether it's due to incompetence or a hidden agenda. My impression in your case is that it's both. Again, you've already stated that you are not qualified to make informed statements on the technical aspects of our topic, so by what authority to you feel you can make informed statements about my competency?
      --
      http://www.mhall119.com
    30. Re:patent promise doesn't sound very good by ContractualObligatio · · Score: 1

      It is.
      No, it isn't. ODF uses a different approach to achieve similar goals.

      such idiocy is the rule, not the exception
      I do have experience of the development process having worked with a number of product teams, and it is not my experience that such major changes do not involve lengthy discussion and politics, particularly when it involves legal and marketing aspects. The fact that Microsoft's priorities are rarely quality, security and openness is beside the point. The normal mistake is to add unnecessary things, not to leave things alone or trust the development team's judgment. Of course, anecdotal evidence and personal experience is unreliable. What examples do you have of a major change, highly visible to legal and marketing, in an environment with multiple layers of management where it was carried out with essentially no interference or scope change?

      you've already stated that you are not qualified to make informed statements on the technical aspects of our topic, so by what authority to you feel you can make informed statements about my competency?

      I made no such statement. On what basis are only programmers capable of making informed comments on technical subjects? Presumably though you are a programmer if you feel that is the essential qualification, it's a common egotistical fallacy. Why does it require any authority whatsoever to suspect (not make an "informed statement") that a programmer with incorrect facts and sloppy logic is being incompetent with his argument? You didn't even answer a single question put to you:

      • Why are you criticising Microsoft for not promising that Corel won't sue?
      • For reasons already given, it is clear the OOXML spec is not a simple translation. Why are you putting down irrelevant facts rather than justifying or adjusting your position, or directly challenging those reasons?
      • It's not safe to assume that I haven't read any of the spec. Why do you think that only programmers would be interested?
  4. OOo developers once said, by Enleth · · Score: 1, Flamebait

    that the hell would rather freeze over - well, looks like Satan is now skating on frozen magma lakes...

    --
    This is Slashdot. Common sense is futile. You will be modded down.
  5. Obfuscation by Anonymous Coward · · Score: 2, Insightful

    Except... we all don't have this, OLE, thing on our computers nor do we all walk it easier than the languages we deal with now.

    But let's say you do. Now you have to find an API to do it for you. As an every day guy, I can write my own HTTP parser, IP connection manager and so forth, w/o requiring special API to do it. As a smarter guy, I'd look for the libraries that can do some of the heavy lifting for me. It's flexibility. The document structure is going to affect how I write code to work with ti.

    W/ office docs, Joel is arguing, I have to know the one way to interact with them. There's no TIMTOWTDI about it. There's no intuitive way to do it either. Were the format to be simple, be it "sanely" constructed CSV, XML, RTF, etc, I have more choices. I'd rather use the most well known, bestest of the best, but sometimes it's not intuitive and just hamper's work. It shuts out programmers who would think, open(file); readSomeData(); construct_a_structure();. Now it's, structure = oneOfAHandfulOfParsersThatWillEverWork().

    The worst part of that is, since I have no way *I* can choose how to mess with documents. I have to either a) spend more time figuring out the native format unless I'm a genius or have an MS crone behind me, or b) parse it incorrectly, and then have to go back and fix any number of things, including my methodology. Remember how the various encodings affected document format? I.e. UTF-8, 16, Latin-1, Unicode, etc etc etc..

    Joel, you're not right.

    1. Re:Obfuscation by wlandman · · Score: 2, Insightful

      What Joel is trying to say is that at the time that Excel and other Office products were made, it was not possible to store it in XML. Joel also reminds us that as Microsoft had new versions of the software come out, they had to keep the compatability with the older versions.

      I think Joel makes a lot of good points and gives great insight into thinking at Microsoft.

    2. Re:Obfuscation by SuiteSisterMary · · Score: 1

      Were the format to be simple, be it "sanely" constructed CSV, XML, RTF, etc, I have more choices.

      Word and Excel have supported CSV and RTF back into the DOS days, back into the 5.25" floppy days.

      And are you honestly saying that an 8088 with 640K ram could handle XML? Assuming that the concept of interchangable markup langauges even EXISTED back then?

      Jesus, it's like complaining that a 30 year old television doesn't support HDMI, therefore it's poorly designed.

      --
      Vintage computer games and RPG books available. Email me if you're interested.
  6. One possible reason for releasing the specs now by Stan+Vassilev · · Score: 5, Insightful

    One may wonder, why release the documentation now?

    If you read Joel's blog you'll see the formats are very old, and consist primarily of C-structs dumped to OLE objects, dumped directly to what we see as an XLS, DOC and so on files.

    There's almost no parsing/validation at load time.

    Having this in a well laid documentation may reveal quite a lot of security issues with the old binary formats, which could lead to a wave of exploits. Exploits that won't work on Microsoft's new XML Office formats.

    So while I'm not a conspiracy nut, I do believe one of Microsoft's goals here are to assist the process of those binary formats becoming obsolete, to drive Office 2007/2008 adoption.

    1. Re:One possible reason for releasing the specs now by Chief+Camel+Breeder · · Score: 5, Informative

      Actually, I think they're releasing it now because they were ordered to in a (European?) court settlement, not because they want to.

    2. Re:One possible reason for releasing the specs now by friedman101 · · Score: 2, Insightful

      Come on. You really think Microsoft wants to increase the vulnerability of old versions of Office (which are still the vast majority in corporate America). This not only makes their software looks bad, it increases the amount of work they have to do to support the older versions (yes, they still support Office 2003). You don't sell new cars by convincing people the last model was rubbish. I think your tin-foil hat fits a little to tight.

    3. Re:One possible reason for releasing the specs now by Stan+Vassilev · · Score: 5, Insightful

      Come on. You really think Microsoft wants to increase the vulnerability of old versions of Office (which are still the vast majority in corporate America). This not only makes their software looks bad, it increases the amount of work they have to do to support the older versions (yes, they still support Office 2003). You don't sell new cars by convincing people the last model was rubbish. I think your tin-foil hat fits a little to tight.

      Let me break your statement in pieces:

      - that would increase the vulnerability of old Office
      - the majority of corporate America is stuck on old Office
      - you don't sell old cars by convincing old ones are rubbish

      You know, have you seen those white-papers by Microsoft comparing XP and Vista and trying to put XP-s reliability and security in bad light?

      Or have you seen those ads where Microsoft rendered people using old versions of office as... dinosaur-mask wearing suits?

      If the majority of corporate America uses the old Office, then the only way for Microsoft to turn in profit would be to somehow convince them this is not good for them anymore, and upgrade. You're just going against yourself there.

    4. Re:One possible reason for releasing the specs now by Anonymous Coward · · Score: 0

      the main benefit is publisher etc, they still use the binary formats and are not popular enough to be reverse engineered

    5. Re:One possible reason for releasing the specs now by Jugalator · · Score: 3, Informative

      So while I'm not a conspiracy nut, I do believe one of Microsoft's goals here are to assist the process of those binary formats becoming obsolete, to drive Office 2007/2008 adoption. Not a chance. Microsoft is bound to release Office 2003 security updates until January 14, 2014.
      --
      Beware: In C++, your friends can see your privates!
    6. Re:One possible reason for releasing the specs now by orra · · Score: 2, Interesting

      One may wonder, why release the documentation now?

      I would say it's because they get good PR for for pretending to be transparent/friendly, whilst not actually giving away any new information.

      Look at page 129 of the PDF specifying the .doc format.. (The page is actually labelled 128 in the corner, but it's page 129 of the PDF). You will see there's a bit field. One of the many flags that can be set in this bit field: "fUseAutospaceForFullWidthAlpha".

      The description?:

      Compatibility option: when set to 1, use auto space like Word 95

      Gee, thanks. That's helpful. You know, an earlier Slashdot article said Microsoft were going to release a BSD licensed converter to convert from .doc to .docx. But this will never help anyone further understand either of the two formats: binary .doc files which are auto spaced like Word 95 will be converted to "XML" files which are auto spaced like Word 95.

    7. Re:One possible reason for releasing the specs now by Anonymous Coward · · Score: 0

      Almost No parsing ? That is strange, because with little parsing you would expect such a document to be more secure .
      For it is parsing errors that are the very essence of many exploits

    8. Re:One possible reason for releasing the specs now by kripkenstein · · Score: 1

      Actually, I think they're releasing it now because they were ordered to in a (European?) court settlement, not because they want to. I think you're wrong, or at least I didn't hear of such a court ordering (apologies if I am in error here).

      I think the reason for Microsoft publishing the specs is fairly clear: Microsoft is trying to get OOXML passed as an ISO standard. One of the complaints that prevented such standardization thus far is that OOXML relies on older formats like .DOC, and those formats aren't documented, so really OOXML isn't documented. Similarly there may be patent concerns with the old formats. It appears that Microsoft is going to do every thing in its power to get OOXML passed on the next vote, and being able to say, "the old binary formats are now completely open and no patent issues exist" makes that more likely. This is the (completely selfish) reason for releasing the specs at this time, AFAIK.

      Whether this will work, time will tell. But what it certainly shows is Microsoft's desperation and how important it sees getting OOXML stamped as a standard.
    9. Re:One possible reason for releasing the specs now by guardian-ct · · Score: 1

      That policy page is controlled by Microsoft, and I suppose that they're only bound by it to the extent it doesn't hurt their profits. Microsoft can change the policy at any time. "Microsoft makes no warranties". "Microsoft may occasionally change any of its online policies...at any time." "...but will not provide any other notice to you".

      They're only bound by that policy until they change it. Even then, there's no guarantee.

    10. Re:One possible reason for releasing the specs now by Hymer · · Score: 1

      They are releasing it because ECMA has promised ISO that either those binary formats referenced in the OOXML spec. will be released or the references will be removed from the spec. If they remove those references then they will be without a compliant product.
      ...and Microsoft needs that ISO certification really badly now, there are too many governments requiring a ISO certification.

    11. Re:One possible reason for releasing the specs now by Flammon · · Score: 1

      Oh and don't forget about Document Freedom Day on March 26th. Microsoft is getting desperate. They know they're in trouble more than anyone else. Who can compete with the Linux kernel pace of development. The first patch that took 2.6.24 to 2.6.25RC1 has 1.4M lines of diffs. I don't think their army of programmers could pull that off. Their development model doesn't scale as well as FOSS. And the Yahoo! offer was quite funny actually. Microsoft was practically begging Yahoo! to reconsider after the $40,000,000,000.00 rejection.

    12. Re:One possible reason for releasing the specs now by SEMW · · Score: 1

      I do believe one of Microsoft's goals here are to assist the process of those binary formats becoming obsolete, to drive Office 2007/2008 adoption. Whilst I agree with your reasoning, your conclusion (that they did it to drive Office 2007/8 adoption) is flawed, since you don't need to upgrade to Office 2007/8 to use the new formats; you just need to install the compatibility pack.
      --
      What's purple and commutes? An Abelian grape.
  7. Re:first post? by Timothy+Brownawell · · Score: 3, Insightful

    I'd assume it has something to do with the antitrust action the EU was taking. Didn't they order that Microsoft had to open all their protocols/formats?

  8. Office Doc Generation on the Server by VosotrosForm · · Score: 5, Informative

    I would like to point out another good option Joel doesn't have on his list. It's a software called OfficeWriter, from a company named SoftArtisans in Boston. When I last checked/worked there, it was capable of generating Excel and Word docs on the server, and I believe Powerpoint was probably coming relatively soon. Creating a product that can write office documents isn't quite as impossible in terms of labor as Joel is saying.... but it's still way beyond any hobby project. Plus, he is suggesting that you use Excel automation or the like through scripts to create documents on the server, which is a decent suggestion, if you want Excel or Word to constantly crash and lock up your server, and you enjoy rebooting them every day. If you want to do large scale document generation on a server you are going to need something like Officewriter. -Vosotros/Matt

    1. Re:Office Doc Generation on the Server by grokcode · · Score: 1

      The Apache Foundation also has the POI Java API for reading and writing MS Office Documents. Some of the subprojects are HSSF (Horrible Spreadsheet Format) for Excel 97 formats, HWPF (Horrible Word Processing Format) for Word 97 Documents, and HSLF for Powerpoint. I use HSSF for writing documents, and while it's a bit clunky it works pretty well. There are a few annoyances like trying to prevent an embedded image from skewing, but this is more a limitation of the excel format than the POI API. Write support is pretty mature, although reading could use some work. Much better solution than excel automation, and the APIs have sweet names. 'Nuff said.

  9. Re:first post? by somersault · · Score: 1

    Only took something like 5 years*, eh? :P

    * I can't actually remember how long ago it was

    --
    which is totally what she said
  10. Promise not a license by G0rAk · · Score: 5, Insightful

    As PJ pointed out over on Groklaw, MS are giving a "Promise" not to sue but this is very very far from a license. Careful analysis suggests that any GPL'd software using these binaries could easily fall foul of the fury of MS lawyers.

    --

    Nothing to see here. Move along.
    1. Re:Promise not a license by morgan_greywolf · · Score: 5, Interesting

      As PJ pointed out over on Groklaw, MS are giving a "Promise" not to sue but this is very very far from a license. Careful analysis suggests that any GPL'd software using these binaries could easily fall foul of the fury of MS lawyers. Correct.

      Here's my suggestion: someone should use these specs to create a BSD-licensed implementation as a library. Then, of course, (L)GPL programs would be free to use the implementation. Nobody gets sued, everybody is happy.
    2. Re:Promise not a license by Vexorian · · Score: 1

      And it is just a promise, so even if you are not GPLed you'll live under the "will Microsoft break the promise tomorrow when I wake up"?

      --

      Copyright infringement is "piracy" in the same way DRM is "consumer rape"
    3. Re:Promise not a license by Pofy · · Score: 2, Informative

      >As PJ pointed out over on Groklaw, MS are giving a "Promise"
      >not to sue but this is very very far from a license.

      Some (hypothetical?) questions:

      What would happen if those patents in some way was transfered to someone else?

      Despite the promise, are you still actually infringing the patent? Just with an assurance of the current patent holder that he won't do anything?

      If so, what would happen if it becomes criminal to break a patent (it was quite close to be part of an EU directive not so long ago)? Together with such suggestions one have also seen sugestions that police should be allowed (and required?) to act on those crimes even without a filing from someone suffering infringment. How would that apply to a situation with such a promise?

    4. Re:Promise not a license by Anonymous Coward · · Score: 0

      IANAL, but the common interpretation of US law seems to be that the program/library combination will be a derivative work of both the program and the library, and thus will be covered by the GPL entirely. (And yes, I believe that means if the library license was incompatible with the GPL, no distribution would be allowed at all.)

      There is also the question whether distributing the code under a BSD license is legal in the first place. BSD looks very minimalistic on the surface, but the truth is that a license does not restrict rights, it grants them, and BSD is one of the licenses granting the most rights to recipients. If you are not legally permitted to grant all of these rights to others (because of patents), I believe you are not allowed to release the software under a BSD license in the first place.

    5. Re:Promise not a license by Abcd1234 · · Score: 2, Insightful

      Except anyone being sued by MS can use promissory estoppel as a defense. 'course, you have to be able to afford to defend yourself, but I guess that's where the EFF comes in.

    6. Re:Promise not a license by morgan_greywolf · · Score: 1

      If you are not legally permitted to grant all of these rights to others (because of patents) Now, note that Microsoft promised not to sue you and they did not promise to grant you patent rights. However, they also say that they aren't saying that any patents are necessarily involved, either, and they're also not saying whether they believe such patents to be valid.

      Here's the rub for Microsoft: patent law doesn't actually prevent you from doing anything. You don't know what Microsoft has a patent on and what it doesn't. And you need to keep it that way, because that means the difference between willful patent infringement and plain ol' "oops, we infringed your patent." In any case, Microsoft's only recourse is to sue you. Which, BTW, they have to do two things to win:

      1) They have to prove actual damages. (No, this isn't automatic) and
      2) They have to prove that they took steps to mitigate having those damages occur.

      It could be argued that by putting that spec out there with a promise not to sue ... that they have failed to mitigate their own damages.

      And, you could always just put the onus on the community by posting the code under the public domain -- anonymously. Then there's nobody to sue but everybody who uses the code.
    7. Re:Promise not a license by harlows_monkeys · · Score: 1

      As PJ pointed out over on Groklaw, MS are giving a "Promise" not to sue but this is very very far from a license

      It's a legally enforceable promise. How is that different from a license? Both Sun and IBM, neither of which trust Microsoft, have released software critical to major products of theirs, that implements specifications that are covered by OSP, so evidently their lawyers don't have a problem with it.

      Careful analysis suggests that any GPL'd software using these binaries could easily fall foul of the fury of MS lawyers

      How? There's not a single thing in it that conflicts with any GPL provision or requirement. The only thing connected with OSP that has anything to do with GPL is the item in the OSP FAQ, where MS refuses to say affirmatively that there is no GPL problem. All that means is that Microsoft has competent lawyers who won't let them give general legal advice on their web site.

    8. Re:Promise not a license by harlows_monkeys · · Score: 1

      Promissory estoppel.

  11. Why not ODF or OOo? by jfbilodeau · · Score: 2, Interesting

    Why does the author avoid any mention of ODF or OpenOffice as alternatives to work with MS Office docs? He seems stuck on 'old' formats like WKS or RTF.

    I know OOo is not a perfect Word/Excel converter, but it has served me marvelously since the StarOffice days. I wish that there was a simple command-line driven tool that could convert .docs or .xls to ODS or PDF using the OOo code. Any one knows about such a tool?

    --
    Goodbye Slashdot. You've changed.
    1. Re:Why not ODF or OOo? by Anonymous Coward · · Score: 0

      It would appear the point of the article was the fact that Microsoft released the specifications. Whether or not alternatives exist or should be used are totally off topic. If you made some discussion about the merits of MS's design versus the ODF design, that would be one thing, but that's not what matters. What matters is the *fact* that most people use MS Office binary formats and this specification provides the resources (if they so desire) to see exactly what is in those files.

    2. Re:Why not ODF or OOo? by jfbilodeau · · Score: 1

      So why is he talking about converting to PDF or RTF? ;)

      --
      Goodbye Slashdot. You've changed.
  12. Their way out of long-term support by Anonymous Coward · · Score: 1, Insightful

    How to look nice and offload some work in one shot.

    With this M$ can shut off critics that say proprietary formats are evil, especially those using the long-term viability argument.

    Now that the formats are documented, hordes of open source hobbyist can develop (for free) code and tools to read / convert the old Office formats. Then M$ will tell "See, we do not lockout anybody, there are myriads of ways to read our old crap".

    Smart indeed. And anyway these format do not hold any competitive advantage anymore since most users are coping with the new ones now.

    1. Re:Their way out of long-term support by MrNaz · · Score: 1

      "There are myriad ways to read our old crap."

      Is there some secret conspiracy I am not aware of to butcher the use of this word? Why does every attempt to use it end up in miserable failure.

      --
      I hate printers.
    2. Re:Their way out of long-term support by eldepeche · · Score: 1, Offtopic

      myriad [mir-ee-uhd]
      -noun
      1. a very great or indefinitely great number of persons or things.
      2. ten thousand.
      -adjective
      3. of an indefinitely great number; innumerable: the myriad stars of a summer night.
      4. having innumerable phases, aspects, variations, etc.: the myriad mind of Shakespeare.
      5. ten thousand.

  13. Retaliation? by ilovegeorgebush · · Score: 2, Interesting

    Is this retaliation to the impending doom of the OOXML format requesting ISO standard status? Is MS's thinking: "Right, ISO has failed us, so we'll release the binaries so everyone keeps using the office formats anyway"?

  14. Hmm. by Uzuri · · Score: 1

    "[A] normal programmer would conclude that Office's binary file formats: are deliberately obfuscated; are the product of a demented Borg mind; were created by insanely bad programmers; and are impossible to read or create correctly. You'd be wrong on all four counts..." ...It's something far more sinister.

    (Sorry, sometimes ya just gotta get it out)

    --
    I'm a she-slashdotter... but I make up for it by living with my folks.
  15. I thought it was pretty well known by erroneus · · Score: 1, Insightful

    Just as OOXML files and WMF make references to Windows or Office programming APIs, I think it would come as no surprise to anyone that Office binary formats would also make similar references. The strategy behind it would be obvious -- to tie the data to the OS and to the software as closely as possible.

    1. Re:I thought it was pretty well known by leuk_he · · Score: 2, Informative

      Did you read the article? Nah, why would you do so for some MS bashing.

      If you read the article you would notice that the binary solution of winword 97 (and in fact it is compatible with it predecessors) was a good solution in 1992 when word for windows 2.0 was created. Machines did have have less memory and processing power that your phone, and still had to be able to open a document fast.

      my conclusion is that the open office devs are crazy that they ever supported the word .doct format, and did a surprisenly good job.

    2. Re:I thought it was pretty well known by BluenoseJake · · Score: 1

      Maybe they just want to reuse code? Using operating facilities to do useful work instead of reinventing the wheel makes sense, and it's just good programming practice. Maybe that tinfoils on a bit tight, not everything is a conspiracy.

    3. Re:I thought it was pretty well known by erroneus · · Score: 4, Interesting

      It's a DOCUMENT format. You know, you put words and pictures in there? Things you type in with your own keyboard with your fingers? There should be no need to have API calls in a document format. The same is true for WMF. WMF was very exploitable as a result, so not only is it bad style, it's dangerous.

    4. Re:I thought it was pretty well known by erroneus · · Score: 1

      Yes, I read the article and I don't buy into it.

      The fact is, Word in its early versions was NOT significantly faster than its competitors and neither was Excel. Word Perfect and Lotus 1-2-3 did everything people needed and they did it within the resource constraints of the day.

      The article is leading in attempting to address the "limited resources" of the day because for most of us, we find it amazingly difficult to imagine operating in a 1MB operating environment. The article also fails to identify the actual time-line of development and what platforms were like with each release of Word, Excel or Windows. They tried to make it sound like Word 2.0 was linking with Excel from day-one. It was not. And it certainly didn't do the things we expect to see (but rarely use) today back in the earlier days.

      The article was nothing more than a list of whiny excuses for what Microsoft did when others were able to accomplish the same functionality without all the nonsense.

      The reality is that when you tie your documents to the OS and the Office software, it's simply a lot harder to write competing apps that can work with the same data. If the document formats can stand alone, then writing apps that can use the data becomes a lot more simple. And since others were able to accomplish the same ends without the nonsense described in the article, I'd say there must have been some OTHER motivation behind their departure from standard coding practices of the day... and even standard coding practices of TODAY!

      I didn't think I'd have to remind anyone that the main reason why OOXML will never be an ISO standard is because the format does not stand on its own. It requires reference to Windows and Office programs to work.

      I especially loved the part about how an Excel file is a file system within a file. Sounds like an archive to me. Not like it hasn't been done before.

    5. Re:I thought it was pretty well known by Koohoolinn · · Score: 1

      Maybe that tinfoils on a bit tight, not everything is a conspiracy. M$ has a particular bad record concerning ulterior motives. People have been bitten so many times that erring on the safe side makes perfect sense.
      --
      Deze sig is in 't Nederlands geschreven.
    6. Re:I thought it was pretty well known by James+McGuigan · · Score: 2, Interesting

      From MS perspective its not a document format, its just another component in the "user experience" that is MS Office. They trade clean data formats for tightly integrated software designed for a MS only environment. Part of the trade off may be week security, which may be unacceptable to you, but may be acceptable to the MS marketing department, which considers the lack of certain frivolous features to unacceptable.

    7. Re:I thought it was pretty well known by prshaw · · Score: 2, Insightful

      >> The article was nothing more than a list of whiny excuses for what Microsoft did when others were able to accomplish the same functionality without all the nonsense.

      And what software from 1990 was writing wordprocessing files and spreadsheet files out in an standardized interchangable format? What format where they using? What programs were not writing their data out tied to the software that created it?

      What word processing documents was 1-2-3 able to link to? Or was it WordPerfect that was able to embed any spreadsheet? I think Word 2.0 was able to talk to Excel with DDE, I know I was writing code for it in 1991. I know the year is correct, not sure about the versions of Word or Excel though.

    8. Re:I thought it was pretty well known by SEMW · · Score: 1

      The fact is, Word in its early versions was NOT significantly faster than its competitors and neither was Excel. Word Perfect and Lotus 1-2-3 did everything people needed and they did it within the resource constraints of the day. Yes, because unlike Word; Wordperfect and Lotus 1-2-3 had nice, human-readable XML formats, didn't they?

      Newsflash: no, they didn't. The reason Word's .doc design goals didn't produce significantly faster results than its competitors was because its competitors' file formats had exactly the same design goals!
      --
      What's purple and commutes? An Abelian grape.
  16. "compound documents." oh no, run away! by radarsat1 · · Score: 4, Interesting

    You see, Excel 97-2003 files are OLE compound documents, which are, essentially, file systems inside a single file.

    I don't see why just because something is organized filesystem-like (not such an awful idea) means it has to be hard to understand. Filesystems, while they can certain get complicated, are fairly simple in concept. "My file is here. It is *this* long. Another part of it is over here..."

    They were not designed with interoperability in mind.

    Wait, I thought you were trying to convince us that this doesn't reflect bad programming...

    That checkbox in Word's paragraph menu called "Keep With Next" that causes a paragraph to be moved to the next page if necessary so that it's on the same page as the paragraph after it? That has to be in the file format.

    Ah, I see, you're trying to imply that it's the very design of the Word-style of word processor that is inherently flawed. Finally we're in agreement.

    Anyways, it's no surprise that it's all the OLE, spreadsheet-object-inside-a-document, stuff that would make it difficult to design a Word killer. (How often to people actually use that anyway?) It would basically mean reimplementing OLE, and a good chunk of Windows itself (libraries for all the references to parts of the operating system, metafiles, etc), for your application. However, it certainly can be done. I'm not sure it's worth it, and it can't be done overnight, but it's possible. However you'll have a hard time convincing me that Microsoft's mid-90's idea of tying everything in an application to inextricable parts of the OS doesn't reflect bad programming. Like, what if we need to *change* the operating system? At the very least, it reflects bad foresight, seeing as they tied themselves to continually porting forward all sorts of crud from previous versions of their OS just to support these application monstrosities. This is a direct consequence of not designing the file format properly in the first place, and just using a binary structure dump.

    It reminds me of a recovery effort I tried last year, trying to recover some interesting data from some files generated on a NeXT cube from years ago. I realized the documents were just dumps of the Objective C objects themselves. In some ways this made the file parseable, which is good, but it other ways it meant that, even though I had the source code of the application, many of the objects that were dumped into the file were related to the operating system itself instead of the application code, which I did _not_ have the source code to, making the effort far more difficult. (I didn't quite succeed in the end, or at least I ran out of time and had to take another approach on that project.)

    In their (MS's) defense, I used to do that kind of thing back then too, (dumping memory structures straight to files instead of using extensible, documented formats), but then again I was 15 years old (in 1995) and still learning C.
  17. access by oliverthered · · Score: 1

    Still missing the binary format for access, still never mind it's not that hard to work out

    --
    thank God the internet isn't a human right.
    1. Re:access by Hulver · · Score: 1

      Ah, a typical sourceforge project. "We're almost ready for a beta release!" (Dated 2002), and software release (version 0.0.4 also dated 2002).

      Oh right, it was so easy they got it right first time and never had to update it since?

    2. Re:access by oliverthered · · Score: 1

      I stoped working on it because mdbtools has more support, i contributed my work to the projecst and let them continue.

      --
      thank God the internet isn't a human right.
    3. Re:access by Hulver · · Score: 1

      Perhaps you should modify the sourceforge project to reflect that?

  18. Re:first post? by julesh · · Score: 3, Informative

    I'd assume it has something to do with the antitrust action the EU was taking. Didn't they order that Microsoft had to open all their protocols/formats?

    As far as I remember, they only insisted on protocols (it was on the basis of a complaint from server OS vendors that MS was tying their market-leading desktop OSs to their server OSs and gaining an unfair advantage).

  19. Worst. Workaround. Ever. by organgtool · · Score: 4, Interesting
    FTA:

    There are two major alternatives you should seriously consider: letting Office do the work, or using file formats that are easier to write.
    His first workaround is to use Microsoft Office to open the document and then save that document in a non-binary format. Well that assumes that I already have Microsoft Windows, Microsoft Word, Microsoft Excel, Microsoft PowerPoint, etc. Do you see the problem here?

    The second "workaround" is the same as the first, only a little more proactive. Instead of saving my documents as binary files and then converting them to another format, I should save them as a non-binary format from the start! Mission accomplished! Oh wait - how do I get the rest of the world to do the same? That could be a problem.

    I fail to see the problem with using the specification Microsoft released to write a program that can read and write this binary format. If Microsoft didn't want it to be used, they would not have released it. Even if Microsoft tried to take action against open source software for using the specs that they opened, how could Microsoft prove that the open source software used those specs as opposed to reverse engineering the binary format on their own? I think this is a non-issue.
    1. Re:Worst. Workaround. Ever. by malevolentjelly · · Score: 1, Troll

      I think this workaround is for companies and professionals with resources, not just zealots. Chances are if you are doing web applications that parse MS Office formats, you're intelligent enough to be running office on one of your servers, instead of pouring thousands of man hours into implementing something that you can give away to competitors through the GPL.

      If you are an open source zealot, I recommend the following work-arounds:

      * Complain that the code is somehow inferior

      * Make a conspiracy theory about how Microsoft foresaw open source and were trying to stifle it

      * Solve 40% of the problem and claim superiority

      * Hack something unreadable together in perl and pretend that it's more interoperable- once more, claim superiority

    2. Re:Worst. Workaround. Ever. by Toone_Town · · Score: 1

      And I love his suggestion to access this using ASP.net under IIS...as if I really want to be running *OFFICE* on my *WEB SERVER*...one more thing to exploit.

    3. Re:Worst. Workaround. Ever. by ContractualObligatio · · Score: 3, Insightful

      I fail to see the problem with using the specification Microsoft released to write a program that can read and write this binary format

      That is almost the the stupidest thing I've read today (RTFA with respect to development costs to figure out why), except for this:

      If Microsoft didn't want it to be used, they would not have released it.

      We can ignore the shockingly poor logic inherent to this statement and just take it at face value: doing something just because M$ wants you to would easily make the Top 10 Stupid Things To Do In IT list. It's particularly bizarre to hear it on Slashdot.

    4. Re:Worst. Workaround. Ever. by dedalus2000 · · Score: 1

      first open source "zealots" are professionals with resources. second who better to beta test a non core piece of software nicety then your competitors.

      --
      My keyboads not woking popely.
    5. Re:Worst. Workaround. Ever. by malevolentjelly · · Score: 1

      First, open source zealots could be anybody. They're either low level employees at some forgotten enterprise or they're academics who code according to theory- maybe they're CS undergrads. If something is very valuable to an enterprise and has no strategic reason to be open sourced, it won't be. If it's less cost efficient to reinvent the wheel, then they won't- unless they're an enterprise doomed to failure (or, they are google, and they have so much ad revenue that they can stay afloat while constantly blowing away engineering resources on web toys). Open sourcing code is often done because either A) you're starting with open sourced code, or B) you are trying to push interoperability with your product or C) you are looking to be hired or purchased.

      If you need to get a job done and you have the resources, why not simply license a single Microsoft Windows Server? It's not like it "attacks" your unix boxes or anything. It's really a huge time saver when compared with what a professional coder's time costs.

      My point is that you'd have to be a zealot to write some half-baked open source solution over a thousand man hours instead of simply using office (office is cheap. Windows Server is cheap).

        And the idea that your competitors are "beta testing" your software is simply ridiculous. You don't have a product until it passes your QA- and QA is relatively cheap compared with the cost of product development. It's irresponsible to spit beta software out into the cloud and expect the kindness of strangers to solve all your problems- especially in the web world where security is always an issue.

    6. Re:Worst. Workaround. Ever. by Anonymous Coward · · Score: 0

      The that problem you fail to see is not that Microsoft doesn't want you to use the spec. The problem is that it's a LOT of work. An enormous amount. Most people who would be considering such a thing would be better off, financially, taking Joel's advice and spending a couple hundred or even a several thousand dollars building a solution made out of off the shelf parts than they would spending hundreds or thousands (or more!) of man hours implementing the spec. Software and hardware is cheap, developer labor is not.

      Alternatively, convince an open source developer to do the work for you, come back months later and use his stuff. If this was a money making venture, convince yourself that all the lost income during that time was worth getting the implementation for free.

    7. Re:Worst. Workaround. Ever. by dedalus2000 · · Score: 1

      the competitors "beta testing" was a bit of rhetoric the point is still valid QA is a cost and if your talking about office document formats the number of possible scenarios that QA would have to test to have anywhere near full coverage would be huge. unless your company is a software company you get no benefit from closing the source and some benefit from opening it. if your expectations are for large numbers of unique documents office automation may not be practical or desirable there are current commercial alternatives and likely the cost is justified. however in the unlikely event that the time to develop this were justified (filling idle man hours between large projects for example) and assuming the companies core competency is not software then they could do worse than gpl license.

      as for your other point you'd have to be a bit loopy to take this on to begin with no matter what your license preference.

      --
      My keyboads not woking popely.
    8. Re:Worst. Workaround. Ever. by organgtool · · Score: 1

      If you have problems with my logic, feel free to articulate them. However, simply calling them stupid adds nothing to the discussion.

      Regarding your comment about doing something just because Microsoft wants you to, I think you misunderstood me. Microsoft does not necessarily want anyone to use this specification. I believe they have released it because they want OOXML to become an ISO standard and OOXML allows for chunks of legacy Office formats which happen to be binary. By releasing these specifications, they are hoping to take another step towards having OOXML declared an open standard. Whether or not people use the released specs for the Office binary formats is a secondary concern.

      Even if that was not the case, your stance seems to be that you should always do the opposite of what Microsoft wants. While that may work out for you most of the time, I think it would be better to focus on the needs of your user base more than on what MS wants. The open source user base would definitely benefit from being able to read Office binary formats with no compatibility issues which is what the release of this specification allows.

    9. Re:Worst. Workaround. Ever. by civilizedINTENSITY · · Score: 1

      There are lots of open source advocates in Physics. Peer review is part of the scientific method, after all.

    10. Re:Worst. Workaround. Ever. by malevolentjelly · · Score: 1

      You're right, open source is more applicable to academia than business. Now go solve for t.

    11. Re:Worst. Workaround. Ever. by ContractualObligatio · · Score: 1

      If you have problems with my logic, feel free to articulate them. However, simply calling them stupid adds nothing to the discussion.

      I almost wonder if it's worth articulating the problem if you can't be bothered to read my post. But just to repeat myself - RTFA. Look at Joel's comments on development effort and you'll see the problem.

      Whether or not people use the released specs for the Office binary formats is a secondary concern.

      And if you read your own post (do you read anything?), you might note the context is a paragraph that starts: "I fail to see the problem with using the specification Microsoft released to write a program that can read and write this binary format." Therefore whether or not someone uses the released specs is in fact the primary concern of that particular argument.

      your stance seems to be that you should always do the opposite of what Microsoft wants

      And what - apart from shockingly poor logic - would make you think that?

      The open source user base would definitely benefit from being able to read Office binary formats with no compatibility issues which is what the release of this specification allows.

      Couldn't agree more. However, that is a completely different thing to saying that the workarounds suggested are the worst there's ever been. If that is your real argument, you should have made it rather than posted a genuinely stupid critique of someone else's reasoned analysis. Bear in mind, the article states it would take thousands of years of development effort to do what you describe. The fact that it would be a benefit does not mean it is worth the cost.

    12. Re:Worst. Workaround. Ever. by jonaskoelker · · Score: 1

      take it at face value: doing something just because M$ wants you to [is stupid].
      Absolutely. Doing something just (that is, only) because $ENTITY wants you to is stupid (for all values of $ENTITY), because it means you haven't thought about whose interests it serves and whose it goes against. However, the argument is (at least implicitly) that the specification should be used because it serves the community's interest of having a good software for reading (and writing) Microsoft Office documents.

      See the difference? One is "because Microsoft says so", the other is "because it's useful".

      But let's assume that your argument works: when someone claims that "If Microsoft didn't want it to be used, they would not have released it", it follows that you think that "using it is stupid". Your argument is independent of what "it" is, so we could have "it" refer to Windows XP, or Visual Studio 2008, or Office, or anything Microsoft has ever released. If your argument is valid, it follows that you can be made to think that using anything Microsoft has released is stupid, by repeating the initial claim for the desired values of "it".

      Does this sound right or wrong to you?
    13. Re:Worst. Workaround. Ever. by ContractualObligatio · · Score: 1

      Wrong, condescending, verbose and redundant. Shouldn't you be having a go at the guy who thought "Worst. Workaround. Ever." was a useful comment?

      The article was not about the community, but workarounds an individual might take. Your implicit argument is explicitly invalid. It was only raised after I pointed out that the first post was poor.

      Further, I did not say "using it is stupid". I said the logic was inherently flawed. To draw any conclusions on what *I* think is therefore also inherently flawed. As it happens, my starting point would be similar to the argument you mention: the reason for doing something is because *you* see value in it, not because Microsoft does. That value could be personal, to the community, to someone you care for, whatever.

      And to be frank, the conclusion of your third paragraph is so ridiculous it should have set of warning bells to go back and check your workings. For instance, your "using it is stupid" hypothesis seems to assume I am virulently anti-Microsoft for some reason. Which would be curious, because initially I was defending workarounds that required the purchase of Microsoft products.

    14. Re:Worst. Workaround. Ever. by RAMMS+EIN · · Score: 1

      ``His first workaround is to use Microsoft Office to open the document and then save that document in a non-binary format. Well that assumes that I already have Microsoft Windows, Microsoft Word, Microsoft Excel, Microsoft PowerPoint, etc. Do you see the problem here?''

      See the end of my post.

      ``The second "workaround" is the same as the first, only a little more proactive. Instead of saving my documents as binary files and then converting them to another format, I should save them as a non-binary format from the start! Mission accomplished! Oh wait - how do I get the rest of the world to do the same? That could be a problem. ''

      This, actually, would be the ideal solution. We need to somehow get it into people's heads that locking themselves and the rest of the world into proprietary solutions is never a good idea. If open alternatives are available, it is even downright wrong. This applies not only to file formats but really everything.

      Now, I realize that I can't force the whole world to use open standards. Actually, I don't even want to. I want to leave everybody free to use what they want as much as possible. But the most important part of that is leaving each other free to choose. That means not exchanging information using formats or protocols for which there is only a single application. I am fine with it if you use Microsoft Office for your own documents, and if you want to save them in some proprietary format, I'm fine with that (as long as you don't come whining to me when you can't access your data anymore). But as soon as you send files to someone else, _please_ use open standards. Just because _you_ want to use Microsoft Office doesn't mean everyone else does. And just like I don't force you to use my software of choice, you should not force me to use yours.

      Fortunately, more and more people and organizations realize this. Years ago, a request to send a document in a non-proprietary format was often met with surprise or (for some reason I can't fathom) hostility. Nowadays, you will usually be sent a PDF shortly, if you didn't get sent a PDF in the first place. The fact that Microsoft submitted OOXML (oh, the confusion that name has caused...) for adoption as an ISO standard, and this has been broadly covered on the net and even in print, is telltale.

      ``I fail to see the problem with using the specification Microsoft released to write a program that can read and write this binary format.''

      Have you tried it? I think it boils down to something like:

        - Cost of Windows and Office licenses, maybe about $ 700.
        - Cost of developing a script that converts Office files to less horrible formats, maybe another couple hundred dollars.

        - Cost of implementing the specs released by Microsoft: millions of dollars.

      Much as I detest paying Microsoft extra money for having locked the world into their proprietary formats, I think it may be the most realistic option here.

      --
      Please correct me if I got my facts wrong.
  20. Joel being apologetic by porkThreeWays · · Score: 1

    Joel is being awfully apologetic. I understand why they are bad formats, but it doesn't change the fact they are bad.

    --
    If an officer ever threatens to taze you, say you have a pacemaker.
    1. Re:Joel being apologetic by slapout · · Score: 2, Informative

      Joel worked on the Excel team.

      --
      Coder's Stone: The programming language quick ref for iPad
    2. Re:Joel being apologetic by Crispy+Critters · · Score: 1
      I don't suppose that I can fault you for stating the blindingly obvious.

      Joel's writing may be interesting and insightful, but his enormous blind spots make him anything but authoritative. Take this gem:

      A lot of the complexities in these file formats reflect features that are old, complicated, unloved, and rarely used. They're still in the file format for backwards compatibility, and because it doesn't cost anything for Microsoft to leave the code around.
      So it "doesn't cost anything" for the company with a reputation for the buggiest, least secure code around to keep a bunch of legacy code in its aps? Especially code that was not written for maintainability? Joel only sees issues from a very particular perspective.
  21. Don't Adopt. Convert. by Doc+Ruby · · Score: 5, Insightful

    Spolsky's advice explains that the format code is extremely bad code from the POV of a programmer picking it up to use starting now. Because it grew like a coral reef, starting so long ago that interoperability with anything else but the app's codebase at the time was not in the designs. And every new feature was thrown in as a special case, rather than any general purpose facility for kinds of features or future expansion. The Microsoft legacy that leverages every year's market position into expansion the next year.

    But we're not Microsoft, and we don't have the requirements MS had when making these formats. So we should by no means perpetuate them. We should do now what MS never had reason to do: upgrade the code and drop the legacy stuff that makes most of the code such a burden, but doesn't do anything for the vast majority of users today (and tomorrow).

    That's OK, because Microsoft has done that, too, already. The MS idea of "legacy to preserve" is based on MS marketing goals, which are not the same as actual user requirements. So that legacy preservation doesn't mean that, say, Office 2008 can read and write Word for Windows for Workgroups for Pen Computing files 100%. MS has dropped plenty of backwards compatibility for its own reasons. New people opening the format for modern (and future) use can do the same, but based on user requirements, not emphasis on product lines if that's not a real requirement.

    So what's needed is just converters that use this code to convert to real open formats that can be maintained into the future. Not moving this code itself into apps for the rest of all time. Today we have a transition point before us which lets us finally turn our back on the old, closed formats with all their code complexity. We can write converters that can be used to get rid of those formats that benefited Microsoft more than anyone else. Convert them into XML. Then, after a while, instead of opening any Word or Excel formats, we'll be exchanging just XML, and occasionally reaching for the converter when an old file has to be used currently. MS will go with that flow, because that's what customers will pay for. Soon enough these old formats will be rare, and the converters will be rare, too.

    Just don't perpetuate them, and Microsoft's selfish interests, by just embedding them into apps as "native" formats. Make them import by calling a module that can also just batch convert old files. We don't need this creepy old man following us around anymore.

    --

    --
    make install -not war

  22. doing the right thing by carou · · Score: 5, Insightful
    From Joel's FA:

    There are two kinds of Excel worksheets: those where the epoch for dates is 1/1/1900 (with a leap-year bug deliberately created for 1-2-3 compatibility that is too boring to describe here), and those where the epoch for dates is 1/1/1904. Excel supports both because the first version of Excel, for the Mac, just used that operating system's epoch because that was easy, but Excel for Windows had to be able to import 1-2-3 files, which used 1/1/1900 for the epoch. It's enough to bring you to tears. At no point in history did a programmer ever not do the right thing, but there you have it. Nonsense.

    When Excel started importing 1-2-3 documents, the right way to do that would be to create an importer to your own native format. Not to munge a new slightly different format into your existing structures. Yes, you'd have had to convert some dates between 1900 and 1904 formats (and maybe, detect cases where the old 1-2-3 bug could have affected the result) but at least you wouldn't be trying to maintain two formats for the rest of time.

    If this is an example of programmers throughout history always doing exactly the right thing, I'd hate to see an example of code where the original author regretted some mistakes that had been made.

    1. Re:doing the right thing by Anonymous Coward · · Score: 1, Insightful

      Your assumption is that the people making the 123 'scripts' (and there were many of those) didnt depend on that bug. Remember the MS mantra 'embrace and extend'. Excel and 123 have full out programming languanges built in. It is not easy to build an inteperter that would say fit on 3 floppy discs and have memory left over. That memory can be used for OTHER things such as features people actualy use. Never mind the regression testing making sure it works. Convert in place is a perfectly logical assumption to do.

      Also remember 'small' features were most likely written by an intern, or 'the new guy'. Not some grizzeld vetren. Plus there probably have been hundreds of coders in there. I would be willing to bet some of that code has not been touched in years. I bet some of it they are afraid to change!

      Its easy to sit outside and take potshots at them. But I dont think we fully apreciate the nightmare they have to deal with every day...

    2. Re:doing the right thing by Schnapple · · Score: 3, Interesting
      When Excel started importing 1-2-3 documents, the right way to do that would be to create an importer to your own native format. Not to munge a new slightly different format into your existing structures.
      Well, ignoring the fact that the article elaborates on why they made some of the technical decisions early on, Joel, who was at one point a program manager for Microsoft Excel, actually has an article on this very thing. Basically, this is exactly what they did - Excel initially opened 1-2-3 documents, but it could not write to them. You could open up your Lotus 1-2-3 document but you'd have to save it in Excel format. Excel 4.0 introduced the ability to write to Lotus 1-2-3 documents, and Excel 4.0 was the version that served as the "tipping point" - it was the version that businesses started buying in mass numbers and it was the version that signaled the end for Lotus 1-2-3.

      Why? Because, as the article states, Excel 4.0 was the first version that would let you go back. You could just try out Excel and if it didn't work no big deal, just go back to Lotus 1-2-3. It seems completely counter-intuitive to do so, and it apparently wasn't the easiest thing to convince Microsoft management to do so, but it worked and now everyone uses Excel and Lotus 1-2-3 is ancient history.

      The programmers did both the right thing and the thing which would be successful. With all due respect to the OpenOffice folks, they're not in the business of selling software. If people don't move to OpenOffice in mass numbers it doesn't spell doom for the company, because there is no company. Doing what you suggest might be the right thing in a programmer's perspective (and I agree), it's not compatible with a company that is trying to make a product to take over the market with. This is why Microsoft is so successful - they're staffed by a large number of people (like Joel) who get this.
    3. Re:doing the right thing by NullProg · · Score: 1


      When Excel started importing 1-2-3 documents, the right way to do that would be to create an importer to your own native format. Not to munge a new slightly different format into your existing structures.


      Remember, these were the XT/AT/x386 days. It was easier to munge than waste CPU cycles and memory doing conversions.

      Enjoy,

      --
      It's just the normal noises in here.
    4. Re:doing the right thing by carou · · Score: 1

      Remember, these were the XT/AT/x386 days. It was easier to munge than waste CPU cycles and memory doing conversions. I don't buy that. You convert the file only once, alternatively you waste cycles munging your calculations at runtime for the life of the document.

  23. Joel's Advice by Anonymous Coward · · Score: 1, Insightful

    Joel is usually spot on, but the advice he gave in the article is actually pretty terrible if you are going to have to generate any volume of Excel reports. Automating Excel is slow and unwieldy, and should not be hooked up to a server. You will be limited to a few workbook generation requests per second, and if you need to handle more, buying another Windows/Office license and load balancing is pretty awful. The only way that this might be workable is to set up a process that sits in the background with a "pool" of automated excel instances launched and waiting for work, so that when there is a high volume of requests, they get forwarded to different instances. Still not very scalable.

    There are companies out there that have reverse engineered the file format (the one I have experience with is SoftArtisan ExcelWriter, which is buggy), but overall there will be no clean, scalable solution for this until Excel 2007/the Excel 2003 compatibility pack are more prevalent you can just generate the XML to represent the workbook.

  24. Hey! That was MY suggestion! by Anonymous Coward · · Score: 0

    Unfortunately, I think I BSD released it... :-(

    I know! I'll get Theo to rant at you!!! :-)

  25. Re: "compound documents." oh no, run away! by ContractualObligatio · · Score: 4, Insightful

    It's interesting you give a nicely egotistical critique of a well-regarded expert's article, but don't suggest a single alternative to how M$ could have met their design goals, nor explain why the no-interoperability assumption was unreasonable at the time. If you can't appreciate the design goals, nor suggest a way to meet them, what's the point of the rest of your post?

  26. I will gladly pay anyone by flanders123 · · Score: 5, Funny
    ...to take this spec and create an identical .doc format, circumventing Word's bullet AI.
    • it
      • never
        • ever
    • ever
        • works
    1. Re:I will gladly pay anyone by Anonymous Coward · · Score: 0


              1 I

              2 Completely

      3 Agree
              1 It doesn't

              1 Work well

              2 at all.

    2. Re:I will gladly pay anyone by naoursla · · Score: 1

      In Word 2007
      Step 1: Type "it never ever ever works" with each word on a single line.
      Step 2: Select all of the words and hit the "start a bulleted list" tollbar button
      Step 3: Place the cursor after each word before a blank lines and press enter.
      Step 4: Repeat for the second "ever" (it will have two blank lines both with bullets).
      Step 4: Select "never\n\ever" and press "Increase Indent"
      Step 5: Select the indented "ever" and press "Increase Indent"
      Step 6: Select "works" and press "Increase Indent" twice.
      Step 7: Put the cursor on the line above "works" and press backspace.
      Step 8: Your bullets are probably different based on indentation. For each unique level, put the cursor on the line and select the drop down menu next to the "start bullet" button. Select the round bullet to change that level to that bullet type.
      Step 9: Save the file.

      If Word 2007 doesn't save a file that fits that specification then I don't know what to say.

  27. Hmm by woolio · · Score: 1

    In their (MS's) defense, I used to do that kind of thing back then too, (dumping memory structures straight to files instead of using extensible, documented formats), but then again I was 15 years old (in 1995) and still learning C.

    Except for the "1995" part, wasn't that pretty much how Microsoft got started?

    They haven't advanced from that point by much....

  28. Seems that these aren't the full specs by amazeofdeath · · Score: 3, Interesting
    Stephane Rodrigues comments:

    "I first gave a cursory look at BIFF. 1) Missing records: examples are 0x00EF and 0x01BA, just off the top of my head. 2) No specification: example is the OBJ record for a Forms Combobox," Rodriguez wrote. "Then I gave a cursory look at the Office Drawing specs. And, again, just a cursory look at it showed unspecified records." http://www.zdnet.com.au/news/software/soa/Microsoft-publishes-incomplete-OOXML-specs/0,130061733,339286057,00.htm
    --
    U+F8FF
  29. No insanely bad programmers ? by bytesex · · Score: 1, Insightful

    Then what's with that 2Gb limit ? Or what's with the decision to use such formats for mail-storage and databases ?

    --
    Religion is what happens when nature strikes and groupthink goes wrong.
    1. Re:No insanely bad programmers ? by rijrunner · · Score: 1



      Were they using 32 bit machines? Seems to me that 32 bit machines can only address 4GB of memory total. Allowing for the OS and other apps running in memory, you can't use that last bit in addressing anyway. (ie, the OS's and machines of the day maxed out at 4GB of RAM. You could make the whole thing of memory addressable, but it was not needed).

      2GB is the limit on a lot of OS's. Right now, I can think of several filesystems that limit file sizes to 2GB. (FAT16, AIX's jfs). The first of those listed filesystems is the important one. They were externally limited to 2GB as that was the largest file size allowed on their platform.

  30. The file format is not really important by wrook · · Score: 5, Interesting

    I've worked on some of these file formats quite a bit (I was the text conversion guy when WP went to Corel -- don't blame me, it was legacy code! ;-) ) Anyway, while the formats are quite strange in places, they aren't really that difficult to parse. I would be willing to speculate that this was never really much of a problem in writing filters for apps (or at least shouldn't have been).

    No, the difficulty with writing a filter for these file formats is that you have no freaking clue what the *formatter* does with the data once it gets it. I'm pretty sure even Microsoft doesn't have an exact picture of that. Hell, I barely ever understood what the WP formatter was doing half the time (and I had source code). File formats are only a small part of the battle. You have all this text that's tagged up, but no idea what the application is *actually* doing with it. There are so many caveats and strange conditions that you just can't possibly write something to read the file and get it right every time.

    In all honesty I have at least a little bit of sympathy for MS WRT OOXML. Their formatter (well, every formatter for every word processor I've ever seen) is so weird and flakey that they probably *can't* simply convert over to ODF and have the files work in a backwards compatible way. And lets face it, they've done the non-compatible thing before and they got flamed to hell for it. I honestly believe that (at some point) OOXML was intended to be an honest accounting of what they wanted to have happen when you read in the file. That's why it's so crazy. You'd have to basically rewrite the Word formatter to read the file in properly. If I had to guess, I'd say that snowballs in hell have a better chance...

    I *never* had specs for the word file format (actually, I did, but I didn't look at them because they contained a clause saying that if I looked at them I had to agree not to write a file conversion tool). I had some notes that my predecessor wrote down and a bit of a guided tour of how it worked overall. The rest was just trial and error. Believe it or not, occasionally MS would send up bug reports if we broke our export filter (it was important to them for WP to export word because most of the legal world uses WP). But it really wasn't difficult to figure out the format. Trying to understand how to get the WP formatter (also flakey and weird) to do the same things that the Word formatter was doing.... Mostly impossible.

    And that's the thing. You really need a language that describes how to take semantic tags and translate them to visual representation. And you need to be able to interact with that visual representation and refer it back to the semantic tags. A file format isn't enough. I need the glue in between -- and in most (all?) word processors that's the formatter. And formatters are generally written in a completely adhoc way. Write a standard for the *formatter* (or better yet a formatting language) and I can translate your document for you.

    The trick is to do it in both directions too. Things like Postscript and PDF are great. They are *easy* to write formatters for. But it's impossible (in the general case) to take the document and put it back into the word processor (i.e. the semantic tags that generated the page layout need to be preserved in the layout description). That also has to be described.

    Ah... I'm rambling. But maybe someone will see this and finally write something that will work properly. At Corel, my friend was put on the project to do just that 5 times... got cancelled each time ;-) But that was a long time ago...

    1. Re:The file format is not really important by Anonymous Coward · · Score: 0

      You have just reinvented LaTeX.

  31. Comment removed by account_deleted · · Score: 1

    Comment removed based on user account deletion

  32. Re: "compound documents." oh no, run away! by Anonymous Coward · · Score: 2, Insightful

    I don't see why just because something is organized filesystem-like (not such an awful idea) means it has to be hard to understand. Filesystems, while they can certain get complicated, are fairly simple in concept. "My file is here. It is *this* long. Another part of it is over here..." He didn't say File systems were complex, he said Ole compound documents were complex. Look it up on MSDN. It's a tad painful to work with.

    "They were not designed with interoperability in mind."

    Wait, I thought you were trying to convince us that this doesn't reflect bad programming... Wholly out of context, Batman! They made a design decision to ignore interoperability and optimized towards small memory space. What part of that is hard to understand? You think everything should be designed up front for interoperability, regardless of context? In the mid to late 80s, there just wasn't a huge desire for this feature, as Joel states.

    but then again I was 15 years old (in 1995) and still learning C. Ah, now your post makes sense. You completely lack perspective. The Word/Excel doc formats were around 10 years before you. You lack the knowledge about why dumping C data structures directly to disk was necessary--even though Joel spells it out. You don't understand what OLE truly solved (not just embedding spreadsheets inside of word, by the way). And most importantly, you seem to lack the ability to understand design trade-offs.
  33. Re: "compound documents." oh no, run away! by radarsat1 · · Score: 1

    It's interesting you give a nicely egotistical critique of a well-regarded expert's article, but don't suggest a single alternative to how M$ could have met their design goals, nor explain why the no-interoperability assumption was unreasonable at the time. If you can't appreciate the design goals, nor suggest a way to meet them, what's the point of the rest of your post?


    I think the design goals were flawed. That's my point. Their design goals should have included, how can we ensure that our customer's data will be (usefully) readable in the future? Sure, back then maybe it was worth it to skimp on validation in order to squeeze out a few extra microseconds of processing time, because the competition would avoid doing this and beat you with claims of efficiency. I guess we've all learned a lot about how to deal with data since the 90's. A big part of that was learning the importance of metadata. (ie., tagged, extensible formats)

    Anyways, just because it was done years ago, under different conditions, doesn't mean it wasn't bad programming. Maybe everyone else would have done it the same way, maybe I would have too. Still doesn't mean it wasn't bad programming. (I shouldn't say "bad programming" of course, the code could be fine for all I know.. I should say "bad design", in hindsight. Like a lot of things.)

    By the way, "the no-interoperability assumption" is _always_ unreasonable. (IMHO of course.)
  34. Originally Hungarian did not encode type info by Anonymous Coward · · Score: 0

    ...or at least, not much, encoding type info is not what he intended. And you've just demonstrated AKAImBatman's point that "Programmers didn't understand why Hungarian originally used his famous notation". That's not to say Hungarian Notation is necessarily good or bad (I'm not arguing about it! heh), but you're not making your judgment on the facts.

  35. Re: "compound documents." oh no, run away! by Anonymous Coward · · Score: 0



    In their (MS's) defense, I used to do that kind of thing back then too, (dumping memory structures straight to files instead of using
    extensible, documented formats), but then again I was 15 years old (in 1995) and still learning C.



    Well, 1995 microsoft wasn't much older than you so it's kind of understandable.

  36. Some "solutions" from TFA by mariuszbi · · Score: 2, Insightful

    In many situations, you are better off reusing the code inside Office rather than trying to reimplement it. Here are a few examples.
    1. You have a web-based application that's needs to output existing Word files in PDF format. Here's how I would implement that: a few lines of Word VBA code loads a file and saves it as a PDF using the built in PDF exporter in Word 2007. You can call this code directly, even from ASP or ASP.NET code running under IIS. It'll work. The first time you launch Word it'll take a few seconds. The second time, Word will be kept in memory by the COM subsystem for a few minutes in case you need it again. It's fast enough for a reasonable web-based application.
    2. Same as above, but your web hosting environment is Linux. Buy one Windows 2003 server, install a fully licensed copy of Word on it, and build a little web service that does the work. Half a day of work with C# and ASP.NET. So if you are on a Linux system, you are screwed . I think this article is written by some M$ fanboy. Nothing wrong here. But saying that Linux user should just dump their software, and go for Microsoft stuff , just because

    It's very helpful of Microsoft to release the file formats for Microsoft and Office, but it's not really going to make it any easier to import or save to the Office file formats. I think it's wrong wrong wrong.
    1. Re:Some "solutions" from TFA by SuiteSisterMary · · Score: 1

      No, he's just saying that it might be cheaper to buy a goodyear than to reinvent the wheel.

      --
      Vintage computer games and RPG books available. Email me if you're interested.
    2. Re:Some "solutions" from TFA by SEMW · · Score: 1

      I think this article is written by some M$ fanboy.

      +2, Insightful Oh, come on, Slashdot.

      To the parent: Spolsky was the program manager on the Excel team who developed VBA. Would you maybe prefer to read about the MS Office file formats from Erris/twitter, rather than someone who knows about them?
      --
      What's purple and commutes? An Abelian grape.
  37. Compatibility is important across systems and time by Grampaw+Willie · · Score: 0

    this is a good discussion. compatibility is important to us, not only from one system to another but also across time.

    it has often seemed to me that proprietary solutions should be avoided for this reason.

    i recently converted my Win 3.11 computer to XP. quite a move, but look how much i saved not doing all the interim updates!

    i did have some documents in the old WordPerfect 5.1 format but I managed to acquire a program that will read these and write them as .rtf

    I like .rtf and would like to see it become an ISO/ANSI standard

    but think how many libraries are loaded with .xls and .doc files that will need to be converted to OOXML or risk becoming un-usable

    hmmm

  38. So, then MS will release the XP source next? by patrixx · · Score: 0, Flamebait

    So that XP get exploited and thus puts Vista in better light...

  39. Chunky File Format by mlwmohawk · · Score: 5, Interesting

    While I was a contractor for a now defunct contracting company, we did a contract for Microsoft. This was pre windows 3.1. We did some innovations which I think became the bases for some of the OLE stuff, but I digress, Microsoft had a spec for its "Chunky File Format."

    The office format based on the chunky file format does not have a format, per se' It is more similar to the old TIFF format. You can put almost anything in it, and the "things" that you put in it pretty much define how they are stored. So, for each object type that is saved in the file, there is a call out that says what it is, and a DLL is used to actually read it.

    It is possible for multiple groups within Microsoft to store data elements in the format without knowledge of how it is stored ever crossing groups or being "documented" outside the comments and structures in the source code that reads it.

    This is not an "interchange" format like ODF, it is a binary application working format that happens to get saved and enough people use it that it has become a standard. (With all blame resting squarely on M$ shoulders.)

    It is a great file format for a lot of things and does the job intended. Unfortunately it isn't intended to be fully documented. It is like a file system format like EXT2 or JFS. Sure, you can define precisely how data is stored in the file system, but it is virtually impossible to document all the data types that can be stored in it.

  40. Microsoft marketing by Comboman · · Score: 3, Insightful
    You don't sell new cars by convincing people the last model was rubbish.

    You're kidding right? That's been exactly Microsoft's marketing strategy for the last ten years. Remember the Win9X BSOD ads for Windows XP? Microsoft is in the difficult position where their only real competition is their own previous products.

    --
    Support Right To Repair Legislation.
    1. Re:Microsoft marketing by Anonymous Coward · · Score: 0

      Microsoft is in the difficult position where their only real competition is their own previous products.


      And Firefox and Apple.
  41. Re: "compound documents." oh no, run away! by Thundersnatch · · Score: 4, Informative

    Anyways, it's no surprise that it's all the OLE, spreadsheet-object-inside-a-document, stuff that would make it difficult to design a Word killer. (How often to people actually use that anyway?)

    At my company, our users do that every day. Excel spreadsheets embedded in Word or PowerPoint, Microsoft office Chart objects embedded in everything. It's what made the Word/Excel/PowerPoint "Office Suite" a killer app for businesses. MS Office integration beat the pants of the once best-of-breed and dominant Lotus 1-2-3 and WordPerfect. When you embed documents in Office, instead of a static image, the embedded doc is editable in the same UI, and can be linked to another document maintained by somebody else and updated automatically. It saves tremendous amounts of staff time.

  42. Re: "compound documents." oh no, run away! by radarsat1 · · Score: 1

    He didn't say File systems were complex, he said Ole compound documents were complex. Look it up on MSDN. It's a tad painful to work with.


    I didn't say this. I said I don't see why the fact that OLE documents being like file systems (according to TFA), means that they must necessarily be complex. i.e., I'm saying file systems aren't necessarily complex concepts, and therefore it's not an excuse for a convoluted file format. Anyways, maybe it's straining his analogy further than he intended, so I'll give you that.

    Wholly out of context, Batman! They made a design decision to ignore interoperability and optimized towards small memory space. What part of that is hard to understand?


    What makes you think I don't understand it? It's still bad programming. Not that I have statistics, but there were plenty of examples of software that used the same or less memory than Word but managed to have better document formats.

    Ah, now your post makes sense. You completely lack perspective. The Word/Excel doc formats were around 10 years before you. You lack the knowledge about why dumping C data structures directly to disk was necessary--even though Joel spells it out. You don't understand what OLE truly solved (not just embedding spreadsheets inside of word, by the way). And most importantly, you seem to lack the ability to understand design trade-offs.


    No, I understand them. I just don't think they made the right trade-offs. It's not like they had no competition at the time, other companies that a lot of people other than me still claim had better software. Anyways it's sort of a moot argument, since what's done is done. We don't really need to write these formats any more, just read them.
  43. Re: "compound documents." oh no, run away! by petermgreen · · Score: 1

    It reminds me of a recovery effort I tried last year, trying to recover some interesting data from some files generated on a NeXT cube from years ago. I realized the documents were just dumps of the Objective C objects themselves.
    IMO the powerfull serialisation formats of modern langauges are even worse than just dumping out C structs. If an app just dumps out C structs then you can probablly figure out the binary format pretty quickly with just the source for the app and a pagefull or so of information on the C compiler used. The application designer still has to pay some attention to file format design because structures containing pointers can't be saved directly.

    For a modern serialisation format things are typically far worse, the app developer is less likely to pay attention to KISS when he can serialise any arbitary graph of objects and you need both the apps code and a load of information on how the language serialises stuff.

    --
    note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  44. Re:Joel - Hungarian Notation by mlwmohawk · · Score: 1

    The Hungarian thing - no, I still don't see it. Hungarian should not be used in any language which has a reasonable typing system;

    A "typing" system doesn't help you read and understand the code. It doesn't give you any clues to the types of data being acted upon in a section of code. While I never bought in to the whole hungarian notation thing, at the time it was an "ism" that people went nuts about, it did address a specific problem with code readability. The concepts addressed by hungarian notation are still valid and some of the naming techniques are still also valid.

    One can look at code and see "szKeyName" and know, without having to find the declaration, that it is a zero terminated character string used as a key. That's the crux of hungarian notation, but IMHO Microsoft went crazy with it and focused more on the notation and less on the naming, which actually made things harder to read. Like I said, I didn't go crazy, but even today I still try to incorporate some clue to the type of thing a variable represents in its name.

    Hungarian notation is an example of a good idea in moderation that completely destroys itself when overused.

  45. lol ... XML is the hindsight, not the foresight by Anonymous Coward · · Score: 0

    people ... of course it's impossible for anyone, including MS, to produce perfect code, structures or output, in anticipation of future developments. Clearly, coding is evolving in response to a weakness. Wasn't a new standard, XML, engineered for just this exact reason? If everyone would look beyond complaing and just implement engineering standards, we would all be ok. After all, it is what it is, just deal with it.

  46. L&O: sFoo by poot_rootbeer · · Score: 5, Insightful

    "Apps Hungarian", which adds semantic meaning (dx = width, rwAcross = across coord relative to window, usFoo = unsafe foo, etc) to the variable, not typing, is what is good and what he is advocating.

    What is the justification for putting that semantic meaning into a variable name, instead of incorporating it into class definitions?

    For example, if a string can be "safe" or "unsafe", why not have "SafeString" and "UnsafeString" classes that extend String, and use instances of those, instead of having instances of the base String class names 'sFoo' and 'usFoo'?

    1. Re:L&O: sFoo by man_of_mr_e · · Score: 1, Redundant

      What is the justification for putting that semantic meaning into a variable name, instead of incorporating it into class definitions?

      Hungarian is not so necessary in this day of extensive IDE support, but back when it was invented it was useful because simply looking at a variable name did not give you any idea of what type it was, requiring you to frequently jump around in code you were maintaining.

      Let's say you open up some code to fix a bug. You see a variable named "windows_coords". What is it? A RECT structure? A CRect class? An array of ints? An array of floats or doubles? Something the programmer wrote himself? You have to go look at it's definition (which usually involes greping the code) which completely throws off your train of thought.

      Nowadays, you just have to hover your mouse over the name it most IDE's will tell you, which makes hungarian a bit vestigular.

    2. Re:L&O: sFoo by edwdig · · Score: 2, Insightful
      For example, if a string can be "safe" or "unsafe", why not have "SafeString" and "UnsafeString" classes that extend String, and use instances of those, instead of having instances of the base String class names 'sFoo' and 'usFoo'?

      For strings its a little more straightforward, but it gets messy quick with numeric values. You have to overload every operator you might possibly use, including every variant where it might make sense to operate on another type. The amount of support code needed builds up fast.

      And you get weirdness like this:

      class MyInt {
      ...
          MyInt(int i);
          MyInt operator+(int i);
      ...
      }
       
      MyInt x = 1, y;
      y = x + 1; // works
      y = 1 + x; // error, needs a cast on the 1
    3. Re:L&O: sFoo by curious.corn · · Score: 1

      "vestigial" is what I think you meant to say, isn't it?

      --
      Mi domando chi à il mandante di tutte le cazzate che faccio - Altan
    4. Re:L&O: sFoo by psamty · · Score: 1

      I smell a Java programmer...

    5. Re:L&O: sFoo by david_thornley · · Score: 1

      MyInt operator+(int i, MyInt j) { return j + i; }

      --
      "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
    6. Re:L&O: sFoo by mdfst13 · · Score: 1

      What is the justification for putting that semantic meaning into a variable name, instead of incorporating it into class definitions? It's to take two variables that are otherwise identical and interchangeable in behavior of operations and mark out the difference in the data they hold. For example, if you have x and y coordinates, should you have separate types for x and y? Or should you just have two integer types with x and y in the name? Remember that it's perfectly reasonable to do arithmetic operations that combine x and y values (e.g. calculating distance from origin). Should you have to redefine all possible arithmetic operators? Twice? (Once for rows and once for columns -- same code in two places.) Or just use a type like Long for each and name appropriately?

      In the Safe/Unsafe string example, if there is something about the constructor that could be used to make the string "safe" (e.g. verifying that it is shorter than a certain length), then you may be right, a separate class is better. However, if you are doing complicated business logic outside the constructor to establish safety (e.g. doing multiple database lookups and correlating the results), then you're better off with a naming convention. In that case, the class is not enforcing its own meaning. However, the purpose of classes is to enforce their own meaning. By contrast, a variable name does not enforce its own meaning, but one wouldn't expect it to do so. It does indicate meaning, which is what Apps Hungarian was designed to do.

      There are also cases (WSDLs come to mind) where the language simply doesn't support strong typing for some of the cases (e.g. a types enum doesn't indicate that it is a type). In those cases, Hungarian notation helps compensate for a weakness in the language.

      Database table names are another example. One should put Type or Status in the table name if that's what it holds. Database tables don't have typing, so the name is where you express that. Class names are another example. You are defining the class, so you name it appropriately. E.g. SafeString rather than Foo or MyClassName.
    7. Re:L&O: sFoo by shutdown+-p+now · · Score: 1

      If you use C++, then it has all been done already and wrapped in convenient templates.

  47. Outlook by c00rdb · · Score: 2, Interesting

    Why is Outlook missing from the released formats? I've spent some time reverse engineering meeting requests myself and I'd love to see the complete .msg file specification. You could find some useful on MSDN already but it was nowhere near as complete as these releases appear to be.

  48. Re:Joel - Hungarian Notation by zootm · · Score: 1

    Ok, I was going to respond to this but I will not get dragged into another one of these discussions. It's worse than tabs vs. spaces, I tells ya.

    Since you're talking about C/C++ code though, I'm going to assert that that doesn't fall into the class of language I was talking about anyway. You're playing with essentially-untyped data there a lot more.

  49. Re:Don't Adopt. Convert. by mxs · · Score: 1

    Just don't perpetuate them, and Microsoft's selfish interests, by just embedding them into apps as "native" formats. Make them import by calling a module that can also just batch convert old files. We don't need this creepy old man following us around anymore. Be very careful down that road. Particularly, don't confuse "I can import it and save it in MY format" with "this document is now accessible". The application doing the import might die off just the same in 10 or 15 years; and XML is not a wonderpill that makes a document format interchangeable. If you want to do the user a favour, don't just support full import of Office documents, but full export into a standardized format as well (and not just lip-service export).

    Interoperability goes both ways; this is often (and often deliberately) forgotten. There are a lot of programs that offer you the ability to import all manner of files or settings from other competing programs (just look at your favourite mail clients), but have no decent support for exporting the full data, as well. Same with web services and whatnot. You might just be trading in something bad for something worse if there is no avenue provided to export all the data into a standardized format, or at least a well-known one.
  50. Re:Joel - Hungarian Notation by mlwmohawk · · Score: 2, Informative

    Ok, I was going to respond to this but I will not get dragged into another one of these discussions. It's worse than tabs vs. spaces, I tells ya.

    I have to disagree, tabs and spaces are easily handled with an "indent" program.

    On VERY LARGE projects where there are hundreds of include files and hundreds of source files, it is not convenient or even possible in all cases to find the definition of an object that may be in use.

    Context and type information in the name makes it easier to quickly read a section of code:

    for(int ndx=0; ndx nLimit; ndx++)
    {
            pnUsrData[ndx] = pnReceived[ndx];
    }

    To anyone versed in your prefixing, it is easy to see pnUsrData is an array of integers, and we are assigning values from another array of integers.

    However:
    for(int ndx=0; ndx nLimit; ndx++)
    {
            pnUsrData[ndx] = foobar[ndx];
    }

    In the above, it is clear we are assigning data to elements in an integer array from a subscript on an object, but what kind of object? Where do we find its definition?

    Now, renamed it looks like this:
    for(int ndx=0; ndx nLimit; ndx++)
    {
            pnUsrData[ndx] = mytypeFoobar[ndx];
    }

    No we can see it is a "mytype" object and we can easily find its reference and declaration.

    That's what Hungarian notation provides and it is not useless, IMHO, it's over zealous use made code less readable. Rather than give hints, zealous proponents attempted to create a whole new language for specifying variable and function names that was virtually impenetrable.

  51. Re:Joel - Hungarian Notation by zootm · · Score: 1

    It's funny, I've argued in the past that Java's very verbose typing has advantages in exactly the way you list in your post. In the case of Java, in fact, you wouldn't need the type warts since the types would be readily available.

  52. Re:Don't Adopt. Convert. by Doc+Ruby · · Score: 1

    No, XML is indeed that wonderpill. Not because it's some magic format, but because it's open and human readable, not some obfuscated binary format like .DOC . The apps doing the import will be open as well. And if they "die off" later, it's because no one is using them, so who cares? The rare need in the distant future for reading whatever does get left behind in those formats will be served for whichever archivist needs it by the more recent open converter apps that should still be archived somewhere, too.

    But really we agree. That "standardized format" you prefer is XML. ODF is XML. That's what I'm talking about, and it seems what you're talking about, too.

    --

    --
    make install -not war

  53. Re:first post? by Anonymous Coward · · Score: 0

    I'd assume it has something to do with the antitrust action the EU was taking. Didn't they order that Microsoft had to open all their protocols/formats? They always were open - you just had to email them for the files rather than just download them. Now you can just download them.
  54. Eclipse vs. IDEA by pauljlucas · · Score: 1

    Java development tools didn't really reach maturity until things like Eclipse came onto the scene about 5 years ago.
    While I agree that Eclipse did a lot to improve Java development, I have to say that, having used both it and Intellij IDEA, IDEA just seems better. Yes, this could be just another instance of vi vs. emacs, but, to me, IDEA just seems better thought out and works more smoothly. Yes, I know IDEA costs money, but I get things done faster using IDEA, and that's worth a lot.
    --
    If you reply, do so only to what I explicitly wrote. If I didn't write it, don't assume or infer it.
  55. Re: "compound documents." oh no, run away! by sohp · · Score: 1
    This really is the key bit in Joel's article:

    Every checkbox, every formatting option, and every feature in Microsoft Office has to be represented in file formats somewhere. That checkbox in Word's paragraph menu called "Keep With Next" that causes a paragraph to be moved to the next page if necessary so that it's on the same page as the paragraph after it? That has to be in the file format. And that means if you want to implement a perfect Word clone than can correctly read Word documents, you have to implement that feature.


    Hard to believe the programmers who did it that way were doing exactly the right thing. Separating data and representation is a basic programming skill.

  56. old code costs nothing.. by sohp · · Score: 2, Informative
    ...is total BS.

    A lot of the complexities in these file formats reflect features that are old, complicated, unloved, and rarely used. They're still in the file format for backwards compatibility, and because it doesn't cost anything for Microsoft to leave the code around.


    You better believe it costs Microsoft quite a bit to keep it around. At the lowest level, having the codebase that big means the tools and practices needed to manage it have to be equal to the task. Here's a hint: MS does not use SourceSafe for the Office codebase. (They use the Team tools in visual studio, so they do eat their own dogfood, but not the lite food).

    Far more insidious is the technical debt incurred by carrying around that backwards compatibility with Version-1-which-supported-123-bugs-and-all. Interdependencies that mean a bug either can't be fixed without introducing regressions, or can only be fixed dint of a complex scheme involving things like the 1900 vs. 1904 epoch split that Joel discusses.

    Oh yes, it costs a small fortune to carry around that baggage, and only a company as big as Microsoft with Microsoft's revenues can afford it. The price might seem like 'nothing' in the billions of dollars that flow in and out of Microsoft, but ignoring the elephant in the room doesn't make the elephant go away.
  57. Joel doesn't know Jack by bobjones1234 · · Score: 1

    I've seen multiple links to Joel's advice. It's bad advice. He is talking out of his ass.

    Do not under any circumstances run a server that automates Microsoft Office unless you can afford to pay an intern or maybe a homeless person to babysit the server 24/7. They will have to close dialog boxes when it gets stuck waiting for user input and reboot the server ocassionally because of memory leaks. Anyone that has tried to do anything with the file formats has gone this route and given up.

    There is a very robust and competitive market for third-party developer components that read and write Office file formats for most popular development platforms. This is the way to go. or use Apache POI if you can put up with the missing features.

  58. Hungarian Notation is a Visual Grammar by HopeOS · · Score: 2, Insightful

    Actually, when possible, you should do both. Hungarian notation is a grammar. In the same way that English has rules for writing which include capitalizing the first letter of a sentence, proper names, and so on, Hungarian notation provides visual cues to programmers that make certain types of semantic errors "sTanD oUt." There's nothing particularly unusual about the text "sTanD oUt," and it's meaning does not change by writing it that way, but it violates the English grammar and your brain's pattern recognition identifies it as an outlier. So too with Hungarian notation. Code that does not use at least some form of Hungarian notation looks devoid of the meta content I expect my follow programmers to provide, namely what decision they've made, and whether the code conforms to those decisions. To someone accustomed to Hungarian notation, finding "double fValue;" or "if (uCount < 0)" in the code prompts the eye to linger, the brain to reparse. Ultimately, many conceptual errors are identified and resolved this way, even if the compiler fails to catch them.

    Also, like any grammar, the rules depend on the circumstance and should be followed in order to resolve an existing problem or ambiguity. Fully qualifying a variable name "caiIndex" to imply "constant array index" is silly. That is cargo cult mentality. Any of the following would be fine according to the guidelines at my company and each reflects a different decision by the coder: "int nIndex;" "unsigned int uIndex;" "index_t index;". The first works best if the index will be used backwards and the loop constraint is that the index is positive. The second works best if the index is random access, so that functions that use it can check the range with one comparison rather than two. The last case indicates that the semantics and nature of the index could be dependent on a variety of factors including processor architecture, and care should be taken. Therefore, the code "--nIndex," "++uIndex," and "next_index(&index)" look correct while "for (uIndex = 4; uIndex >=0; --uIndex)" looks very bad, and "++index" should make one immediately recognize that any of the following are possible: 1) the ++operator has been overridden, 2) index_t is typecast to an integer type, or 3) this won't compile as would be case if index_t was a struct.

    And so, after 28 years of programming, dealing with all different styles of C and C++, I've come to recognize that understanding and using Hungarian notation correctly is a skill. Your productivity increases as you use it, eventually you don't even notice it, and the benefits come later, particularly when refactoring, or making changes to older code, especially if written by someone else. Like syntax highlighting for your brain, if you use it long enough, you'll know when there's an error in the code without having to compile it because it will look wrong. Supposedly for lisp programmers, the same epiphany comes when you no longer see the parentheses.

    Happy Programming,
    -Hope

  59. Re: "compound documents." oh no, run away! by Anonymous Coward · · Score: 0

    OK, *you* design a system that allows incremental saves to a floppy disk that are fast enough for people to do it often.

    dom

  60. Re:Don't Adopt. Convert. by bar-agent · · Score: 1

    No, XML is indeed that wonderpill...because it's open and human readable, not some obfuscated binary format like .DOC

    ASCII does not mean human-readable. Instead of an obfuscated binary format, XML documents end up in an obfuscated text format.

    --
    i'd hit it so hard, if you pulled me out you'd be the king of britain [bash.org]
  61. Re:Don't Adopt. Convert. by Doc+Ruby · · Score: 1

    And some humans are illiterate.

    XML is human readable because it doesn't require a machine (or superhuman skills) to read its meaning. Its field names and structure are embedded in the data, not only in a decoder context. Of course it's up to the person specifying the XML dialect to make those tags and structure comprehensible by a normal person, but that's not the format's defect. Any format can be obfuscated by design or carelessness. XML is harder to do that in. And even the most basic tools that render XML data according to their embedded schema make most XML self-evident. And that's not cheating: even this post in English requires a reader app, as does all stored data.

    --

    --
    make install -not war

  62. MS definately don't know how their formatter works by TERdON · · Score: 1

    I'm pretty sure MS doesn't know what its Word formatter does, and I even have proof for it:

    If I switch printers between the Adobe PDF and the HP printers at the office, the layout of my documents I edit in MS Word 2003 changes slightly (line lengths, row breaks, distance between rows, etc). This has been a major issue when I've had to submit papers etc. and switched to the PDF printer from the HP printers (I like to read drafts on paper as it is easier to correct), just to see the paper that I had just crafted to barely fit under the 8 page limit now is 11 pages long (with 4 of them left half blank due to formatting issues). :-(

    --
    I have a really elegant proof for Fermat's last theorem. If this sig was only a bit longer...
  63. What about OOXML? by ArtDent · · Score: 1

    Weren't these specs released in response to criticisms about unspecified aspects of OOXML? It makes reference to legacy behaviors implemented as in various Microsoft (and, in a few cases, non-Microsoft) products, and I suppose these specs were supposed to help since they more or less specify some of that stuff.

    But Joel basically tells us not even to bother trying to implement them. They were designed to be fast and to rely on Windows libraries, they're burdened by decades of legacy, and they were never intended to provide interoperability, he says. We should just use Office.

    What does that say about OOXML? When you take these lock-in document formats and just translate them to XML, how does that help anyone? As OOXML's opponents have said time and again, it is a "standard" that will be meaningful implemented by exactly one party, Microsoft, and it will do nothing to promote interoperability.

    It's a pity Joel didn't address this, but it's not hard to connect the dots.

  64. Re: "compound documents." oh no, run away! by ContractualObligatio · · Score: 3, Insightful

    I think the design goals were flawed. That's my point.

    And I think your ability to assess another's work is flawed courtesy of an over sized ego. That was my point.

    You have yet to provide an alternative solution to the problem. Given that one constraint is memory, your inability to be concise suggests you're not capable of coming up with one either. Certainly your "squeeze out a few extra microseconds" comment suggests you have absolutely no clue what you are talking about. Yet you persist in calling it bad design. You are strangely smug about what was quite possibly an implicit assumption forced by tough constraints, with no actual interoperability requirements, at a time when they were rarely offered let alone expected. I would stop using "IMHO" - clearly there is nothing humble about your opinion.

    Why the bit about metadata, out of interest? It's as if you think the more irrelevant things you can fit into the post, the more we're supposed to be impressed.

  65. Re:MS definately don't know how their formatter wo by lskovlund · · Score: 1

    That is not a bug, it's a feature. I once helped someone with a problem where Word would crash (segfault) a few seconds after displaying the empty starting page. I tracked the actual faulting to a buggy HP driver that couldn't deal with printers that were connected to a powered off machine on a wireless network. Switching to a different printer fixes it.

    Yes, MS Word makes calls to the printer driver while you're working and has its pagination algorithm adapt to its characteristics.

  66. Re:MS definately don't know how their formatter wo by guardian-ct · · Score: 2, Interesting

    This is caused by the "WYSIWYG" feature. Your HP printer driver is probably set to choose fonts that are "close" to the ones Windows uses, but are instead native fonts for the HP printer. Your PDF uses the Windows, and/or Adobe, fonts directly. Word uses the printer driver settings while you're editing, and if you change printers, the document repages with any different native fonts.

    In Windows 2000, you can open the printers control panel, choose "printing preferences" on your HP, poke the "Advanced..." button, and tell it to "Download as SoftFont". This should make changing between PDF and printer less painful, at the expense of increased memory usage and time to print with the HP. For the real advanced version, you can try and find which Adobe fonts are exactly the same size as the HP native ones, and tell the PDF writer to use those.

  67. Re:MS definately don't know how their formatter wo by TERdON · · Score: 1

    Yeah, I missed the part that it's supposedly a feature in my comment. Still it's quite obvious that the behaviour of MS' formatter is heavily dependant on how HP's printing driver works. Of course MS doesn't know how all printer makers' driver works, so that makes the behavior of the formatter unspecified. If MS calls it a "bug" or a "feature" I don't really care - to me it's an obvious design error, nothing else.

    --
    I have a really elegant proof for Fermat's last theorem. If this sig was only a bit longer...
  68. Contrast with OOXML by seebs · · Score: 1

    Well, wait.

    This is just OOXML without the angle brackets, isn't it?

    --
    My blog: http://www.seebs.net/log/ --- My iPhone/iPad app: http://www.seebs.net/seebsfrac/
  69. Re:MS definately don't know how their formatter wo by TERdON · · Score: 1

    They should rename it to WYSIWYGALAYDSP, methinks (What you see is what you get as long as you don't switch printers)... :-(

    --
    I have a really elegant proof for Fermat's last theorem. If this sig was only a bit longer...
  70. Re:Don't Adopt. Convert. by baboo_jackal · · Score: 1
    I completely agree with your idea, but disagree with your interpretation of MS's intent.

    The MS idea of "legacy to preserve" is based on MS marketing goals, which are not the same as actual user requirements.
    Now, why would you preserve crufty code and file formats if you didn't have to? The only reason *I* can think of is to preserve backwards compatibility, which is a *major* user requirement!

    How about this interpretation of MS's actions? They've offered an open, XML-based format for document storage. They've also just shared with us the old, crufty, proprietary formats they've historically used for document storage. I think they actually *want* to move away from their old proprietary methods and use an open format. They've clearly been reading the tea leaves and realize that a universal office document standard is in the works. All they're trying to do is make sure they're ready for it.

    As far as OOXML vs. ODF, my guess is that they're pushing their own open document format because the alternative doesn't offer the functionality that their apps require, and if a universal standard is going to be adopted, they simply want it to be expressive enough to work with their applications.
  71. Re:Don't Adopt. Convert. by Doc+Ruby · · Score: 1

    Maybe MS does now want to move away from their old proprietary formats into new open ones. But the old formats were built over decades without that goal.

    If you read Spolsky's analysis linked from this Slashdot story summary, you'll see how the formats "evolved" ("devolved" more like) withing MS goals often dictated by their unique marketing position.

    The summary also points out with links to why this release might not actually indicate MS is really releasing their formats to break with that past after all.

    --

    --
    make install -not war

  72. Re:MS definately don't know how their formatter wo by Chaos+Incarnate · · Score: 1

    That's why I often print to PDF, then print the PDF...

    --
    Benford's Corollary to Clarke's Law: "Any technology distinguishable from magic is insufficiently advanced."
  73. Re:Don't Adopt. Convert. by baboo_jackal · · Score: 2, Informative

    Maybe MS does now want to move away from their old proprietary formats into new open ones. But the old formats were built over decades without that goal.
    No argument there.

    The summary also points out with links to why this release might not actually indicate MS is really releasing their formats to break with that past after all.
    No. The article doesn't make that claim. That's your own interpretation. The overall intent of the article is simply to convey a few simple points:

    1) Why the MS office document format is so crufty (minus conspiracy theories).
    2) How to work *with* the Windows OS to use those documents.
    3) How to use better, more open, alternatives to creating office documents.

    Nothing in the article contradicts anything I said earlier.
  74. Re:Don't Adopt. Convert. by Doc+Ruby · · Score: 1
    Not the article, the summary.

    Before jumping up and down gleefully, those working on related open source efforts, such as OpenOffice, might want to take a very close look at Microsoft's Open Specification Promise to see if it seems to cover those working on GPL software; some believe it doesn't.
    --

    --
    make install -not war

  75. Re:MS definately don't know how their formatter wo by TERdON · · Score: 1

    Yeah, I've also made sure to use that silly workaround. Fortunately, my work computer is equipped with Adobe...

    --
    I have a really elegant proof for Fermat's last theorem. If this sig was only a bit longer...
  76. Re: "compound documents." oh no, run away! by prshaw · · Score: 1

    >> Separating data and representation is a basic programming skill

    Since when? I have to say that in over 25 years of this stuff I never heard that as a basic programming skill.

    There are applications (like html/web) where it is a good idea, but most of those are fairly recent (like the last 10 years? Even HTML was orginally designed to be all together).

    But for a word document, what do you think is stored in the file? Data or presentation?

    I'll give you a little hint, if you only want the data store it in a text file. If you want the document formatted then store both so it is available.

  77. Re:Don't Adopt. Convert. by baboo_jackal · · Score: 2, Insightful
    Unfortunately, the summary makes entirely unsupported assertions, which you claim as support for yours. Did the person who wrote the summary actually read Microsoft's Open Specification Promise?

    Before jumping up and down gleefully, those working on related open source efforts, such as OpenOffice, might want to take a very close look at Microsoft's Open Specification Promise to see if it seems to cover those working on GPL software; some believe it doesn't.

    From MS's own mouth - and mind you that these quotes probably had to be vetted by a billion lawyer-types to ensure that MS wouldn't incur any sort of bizarre liability fifty years down the road by saying them. Based on what is said here, the only other thing that MS reserved is the ability to sue anyone who sues them for violating the patents that they already own, and are releasing to the public. That would be kind of like placing a legal disclaimer on your Halloween candy bowl: "Attention: You can all take as much candy from this bowl as you want, and I legally give up my right to prosecute anyone taking candy from this bowl of Theft, forever. But if any of you accuses me of Theft for eating candy from *my own candy bowl,* then I reserve the right to accuse that person (and *only* that person) of Theft, too." Here's a few pertinent excerpts:

    Q: Is the Open Specification Promise intended to apply to open source developers and users of open source developed software?

    A: Yes. The OSP applies directly to all persons or entities that make, use, sell, offer for sale, imports and/or distributes an implementation of a Covered Specification. It is intended to enable open source implementations, and in fact several parties in the open source community have specifically stated that the OSP meets their needs. Moreover there are already a significant number of implementations of Covered Specifications that have been created and/or distributed under a variety of open source licenses as well as under proprietary software development models. Because open source software licenses can vary you may want to consult with your legal counsel to understand your particular legal environment.

    Q: Is this Promise consistent with open source licensing, namely the GPL? And can anyone implement the specification(s) without any concerns about Microsoft patents?

    A: The Open Specification Promise is a simple and clear way to assure that the broadest audience of developers and customers working with commercial or open source software can implement the covered specification(s). We leave it to those implementing these technologies to understand the legal environments in which they operate. This includes people operating in a GPL environment. Because the General Public License (GPL) is not universally interpreted the same way by everyone, we can't give anyone a legal opinion about how our language relates to the GPL or other OSS licenses, but based on feedback from the open source community we believe that a broad audience of developers can implement the specification(s).

    Q: I am a developer/distributor/user of software that is licensed under the GPL, does the Open Specification Promise apply to me?

    A: Absolutely, yes. The OSP applies to developers, distributors, and users of Covered Implementations without regard to the development model that created such implementations, or the type of copyright licenses under which they are distributed, or the business model of distributors/implementers. The OSP provides the assurance that Microsoft will not assert its Necessary Claims against anyone who make, use, sell, offer for sale, import, or distribute any Covered Implementation under any type of development or distribution model, including the GPL. As stated in the OSP, the only time Microsoft can withdraw its promise against a specific person or company for a specific Covered Specif

  78. Re:Don't Adopt. Convert. by Anonymous Coward · · Score: 0

    DocRuby = poop

    Love,

    your slashstalker

  79. Re:Don't Adopt. Convert. by mxs · · Score: 1

    Sorry, but no, XML is not "open" and "human readable". Without a proper format documentation, it's every bit as opaque as a binary format.

    XML is not a standardized document format. ODF is standardized document format with documentation and smenatics. XML is just the language it's expressed in.

    For instance, the following is a proper XML tree :

    aa

    Without any documentation on what those tags mean, it's every bit as opaque as

    $$!51%5g33F1 (admittedly a bad analogue to a truly "binary" format, but you get the idea).

    Sure, you can build a tree out of that. That is still useless, especially considering that you can put arbitrary linking formats into attribute or element values.

    Hell,

     

    qualifies.

    Without a proper XML Schema to go with your XML document, you have nothing. And even IF you have a XML Schema, without documentation, you can only use it to validate stuff against it. And even IF you have documentation, it will have to be accurate. XML alone is not a silver bullet. OOXML all but proved that already.

  80. Re:Don't Adopt. Convert. by mxs · · Score: 1

    of course, slashdot ate my markup.

    <a><b><c /><d e="f" g="h" /> </b></a>

    and

    <a> <![CDATA[ SOMETHING REALLY SCARYLOOKING HERE ]]></a

    would be the codesnippets.

  81. Re:Don't Adopt. Convert. by Doc+Ruby · · Score: 1


    No, XML is indeed that wonderpill...because it's open and human readable, not some obfuscated binary format like .DOC

    ASCII does not mean human-readable. Instead of an obfuscated binary format, XML documents end up in an obfuscated text format.
    --

    "So wait, who protects the people from their government?"
    "Terrorists."
    "...oh."
    [ Reply to This | Parent ]

            *
                Re:Don't Adopt. Convert. (Score:2)
                by Doc Ruby (173196) on Wed Feb 20, '08 01:26 PM (#22491236) Homepage Journal
                And some humans are illiterate.

                XML is human readable because it doesn't require a machine (or superhuman skills) to read its meaning. Its field names and structure are embedded in the data, not only in a decoder context. Of course it's up to the person specifying the XML dialect to make those tags and structure comprehensible by a normal person, but that's not the format's defect. Any format can be obfuscated by design or carelessness. XML is harder to do that in. And even the most basic tools that render XML data according to their embedded schema make most XML self-evident. And that's not cheating: even this post in English requires a reader app, as does all stored data.
                --

                --
                make install -not war

    --

    --
    make install -not war

  82. Re:Joel on RTF. by Anonymous Coward · · Score: 0

    Clearly, you are retarded and can't spell worth shit. No wonder you drool. Most institutionalized head cases do.

  83. Re:first post? by dave87656 · · Score: 1

    I'd assume it has something to do with the antitrust action the EU was taking. Didn't they order that Microsoft had to open all their protocols/formats? They always were open - you just had to email them for the files rather than just download them. Now you can just download them. The SMB protocol, for example, was never open. And, in fact, it still isn't really open. You can get the specification now from MS but it will cost you a five-figure sum and you must sign an agreement not to disclose it to anyone else.

    DOC is still not open. You can't get it by asking for it with an email. Not sure where you got that piece of information.
  84. Re:Don't Adopt. Convert. by mxs · · Score: 1

    You are very much mistaken, sorry. XML is not that wonderpill. It's a tool. It can be abused.

    Structural properties are WORTHLESS if you do not know what they mean. It's cool to build a tree, but without semantic meaning, you really have nothing. Nothing is self-evident if the designer hasn't taken care to make their format transparent.

    Clearly you don't see that point so it does not bear discussing further. I wish you luck trying to figure out what a,b,c,d,e,f,g, k,m, i,l, x,y, and z mean as tags. Really, I do.

  85. Borland Java Builder by Anonymous Coward · · Score: 0

    Both of you forgot Borland Java Builder was the better Java Windows RAD

  86. Re:Don't Adopt. Convert. by joto · · Score: 1

    Look, I'm sitting at a fucking computer. I know how to use a hex editor. I'm a programmer, I know how to write programs to do what I want. If, as the article states, MS office formats are designed to be copied directly into C structs, then that makes parsing simpler, not harder. I'm not going to load that fucking office document into my brain, so human-readable means absolutely nothing to me. I'm going to load it into a computer. And unless the file-format is designed with interoperability in mind, making it XML won't help one single bit. All XML would mean is that in addition to all the other work I have to do, I also need an XML parser.

  87. Re:Don't Adopt. Convert. by Doc+Ruby · · Score: 1

    Yo, XML is for fucking permanent storage, not your fucking tranitory C struct. Dumping your fucking C struct to your fucking computer will leave you fucking scratching your head in 5 years when you try to fucking decipher the fucking thing. Or you can use fucking XML and have a fucking clue later when you try to use that data on some other fucking platform where your other code won't fucking run and you don't want to fucking pore through the source to decipher the fucking one-off format you made to import Word data into that fucking program you never used again. Got it, fucker?

    --

    --
    make install -not war

  88. Re:Don't Adopt. Convert. by Doc+Ruby · · Score: 1

    Of course it's up to the person specifying the XML dialect to make those tags and structure comprehensible by a normal person, but that's not the format's defect. Any format can be obfuscated by design or carelessness. XML is harder to do that in. And even the most basic tools that render XML data according to their embedded schema make most XML self-evident. And that's not cheating: even this post in English requires a reader app, as does all stored data.

    Of course, my post was human readable, but since you can't even understand written English enough to stop making the spurious point that a human can make even a readable format unreadable, I don't expect you to accept that there is even such a thing as human readable.

    Sorry you couldn't benefit from that simple insight no matter how easy to read I made it. Your loss.

    --

    --
    make install -not war

  89. Use your posts wisely. by Anonymous Coward · · Score: 0

    When you're posting at -1, you can't afford to waste the limited number of posts you get per day. I've noticed that challenges to your assertion that non-Open software manufacturers upload and read the search indexes of users' PCs have gone unanswered.

    You might want to respond to that sort of thing if you want to get out of this deep hole you're in. Or maybe apologize for lying to your fellow Open Source advocates.

    You do realize that Slashdot karma is earned, right?

  90. Re:Don't Adopt. Convert. by joto · · Score: 1

    If the problem is the complexity of the format itself, embedding human-readable names for each field into the file-format, isn't going to reduce the complexity one bit. And if you already have a specification (even if it's reverse engineered), human-readability or embedded field-names is not of importance. Granted, XML doesn't make it any more complicated, so it doesn't hurt much, but a straightforward translation of word or excel file-formats into XML is not particulary helpful. XML has its uses. Relying on it like it was magic, is not one of them. First and foremost because it isn't magic.

    I've written tons of library code for reading and writing old proprietary binary file-formats on newer incompatible computers with different byte-order, different floating point formats, etc... It's not particulary hard. Anyone can do it. It's code-monkey stuff, or a computing 101 exercise, not something you even need real programmers to tackle. The problem with office formats isn't that it's binary, it's that it's complicated.

  91. Re:Joel - Hungarian Notation by shutdown+-p+now · · Score: 1

    On VERY LARGE projects where there are hundreds of include files and hundreds of source files, it is not convenient or even possible in all cases to find the definition of an object that may be in use.
    No, it's not. You hover the mouse over the variable name in your IDE,and it tells you the type. Yes, in a "very large" project, with hundreds of include files. A plain VS2005 or 2008 install with no plugins. And if you install Visual Assist on top of that, you get plenty more.

    All these problems have been solved a long time ago, really. Unless one is coding in Notepad (but why??), Hungarian notation serves no purpose whatsoever.

  92. Re:Joel - Hungarian Notation by mlwmohawk · · Score: 1

    No, it's not. You hover the mouse over the variable name in your IDE

    Noob, listen. Being able to read something without having to hover the mouse is far far easier. If every time I come across a variable I have to reach out and grab the mouse, hover over the variable and hope that the IDE can find it, which doesn't always work in abstract types. Its like actually having a vocabulary when you read a book instead of having to consult a dictionary every time you get a word with more than 6 letters.

    Secondly, Visual Studio is probably the LEAST productive environment I have ever worked in. It is nothing more than the GUIfication of Microsoft's PWB which was universally referenced as "Programmers Waste Basket."

    Lastly, most software development in the world is NOT ON Windows.

  93. Re:Don't Adopt. Convert. by mxs · · Score: 1

    Of course it's up to the person specifying the XML dialect to make those tags and structure comprehensible by a normal person, but that's not the format's defect. Correct. It's not. And by virtue of that argument, neither is XML a silver bullet.

    Any format can be obfuscated by design or carelessness. Correct. See OOXML, or better yet, something that has no documentation at all.

    XML is harder to do that in. In XML it's harder to hide the fact that you are doing it, but it is not harder to do that in.

    And even the most basic tools that render XML data according to their embedded schema make most XML self-evident. To a trained eye, most naive binary formats also contain self-evident data structures. Go ahead,
    look at some binary formats you do not know in decent viewers ... Chances are, you will discover structure.

    XML does not have embeddes schemas. It is semi-structured, true -- but that does not a schema make. You get a tree of nodes. That, in and of itself, does not make the contents self-evident. By that logic, I should be able to deduce what a btree stores if only I know that it's a btree.

    And that's not cheating: even this post in English requires a reader app, as does all stored data. You keep coming back to this analogy, and I fail to see how it applies. Sure, if you get a data format containing this post in one of its nodes (or even a nice tag soup with markup in it), you can probably deduce that it's a post and you can read what it says. Same is true of a pure binary format containing this text. The issue becomes less clear when you are talking about data that does not appear as utf8-text in its "natural" form.

    Of course, my post was human readable, but since you can't even understand written English enough to stop making the spurious point that a human can make even a readable format unreadable, I don't expect you to accept that there is even such a thing as human readable. If in doubt, go ad hominem, eh ?

    I simply don't agree with your premise.

    Sorry you couldn't benefit from that simple insight no matter how easy to read I made it. Your loss. Getting defensive, are we ? Ah well, it seems the art of discourse is lost on you. Your loss, as it were.
  94. Re:Joel - Hungarian Notation by shutdown+-p+now · · Score: 1

    Being able to read something without having to hover the mouse is far far easier.
    It may be faster, but you waste time prefixing the variables in the first place. Not all of which are going to be looked up later.

    If every time I come across a variable I have to reach out and grab the mouse
    You don't. It's the easiest way, but not the fastest. Every IDE I've seen has some shortcut for that. Some also have the "Definition" window at the bottom, which is always synced with whatever is under the cursor.

    ... which doesn't always work in abstract types.
    It does work just fine with all kinds of types short of specialized templates of templates and the like.

    Secondly, Visual Studio is probably the LEAST productive environment I have ever worked in.
    You're entitled to your own opinion, of course. But a recent discussion on the topic in comp.lang.c++.moderated had a rather different conclusion.

    Lastly, most software development in the world is NOT ON Windows.
    References?
  95. Re:Joel - Hungarian Notation by mlwmohawk · · Score: 1

    It may be faster, but you waste time prefixing the variables in the first place.

    There is no logic in this statement, typing 2 or 3 additional characters is hardly even measurable.

    You don't. It's the easiest way, but not the fastest. Every IDE I've seen has some shortcut for that. Some also have the "Definition" window at the bottom, which is always synced with whatever is under the cursor.

    As opposed to simply looking at it and knowing? There is no rational argument that can claim that an alternate contextual lookup of information is easier then just seeing it in context. No one with any intellectual integrity can pursue such an argument.

    It does work just fine with all kinds of types short of specialized templates of templates and the like.

    I have had many situations where it does not and can not correlate the variable with the definition. Surely you are not saying that it works 100.00% of the time, are you?

    "Lastly, most software development in the world is NOT ON Windows."

    References?


    Bureau of Labor Statistics, excluding unaccounted open source Linux people, the majority of software jobs are system, web, embedded, scientific, with applications programming far down on the list. Remember, every electronic device that has a blinking LED has a computer in it, and that takes software development. There are far more cell phones than there are P.C.s There are for more Microwave ovens, far more automobiles (which have computers in them), there are far more everything electronics than P.C.s and WinCE is a small percent of that.

    Like they said on T.V. "In Mayberry, he's world famous."