Slashdot Mirror


Why Can't We Reverse Engineer .DOC?

DanPeng asks: "It looks like Autodesk has been pulling the same kind of proprietary file-format monopoly tactics with AutoCAD that Microsoft has been pulling with Office. The difference between Office and AutoCAD, however, is that an organization, the OpenDWG Alliance has been formed by competing companies to reverse-engineer the AutoCAD DWG format. With the amount of funding that it gets, it is actually quite functional and successful, with millions of users. Even when Autodesk revised the format for AutoCAD 2000, the OpenDWG Alliance fully reverse-engineered it within eight weeks. Now, why can't Corel, Lotus, Sun, etc. band together and reverse-engineer Microsoft's file formats properly?"

Good question.

I wonder if it has something to do with the mentality of the players involved. I don't think Sun, Corel or Lotus ever thought that they might be able to get together so that they could compete on the Office market, I think they all looked to carve out pieces of the market with their own suites, making such collaboration impossible. Despite popular misperception, Applix does not convert DOC, it converts RTF (which may be close enough for some people). Star Office is striving toward this holy grail, but they aren't quite there yet. So maybe it's not too late for folks to pool resources and finally get the job done. In fact, with the eyes of the court on Microsoft, now might be the perfect time.

On the other hand, we have DWG, which is a fairly rich format that deals with the description of 3D objects. Could decoding a file format that deals with text and its presentation really be that much more difficult to reverse engineer? I'd guess this depends more on the design behind said file format. If one of the main goals of the .DOC format is obfuscation, this could be difficult indeed, but I wouldn't say that it's impossible ... not for three big corporations, nor for thousands of loosely organized coders. It's one thing to have control of a file format, but it's another to be put into the position of having to change the format constantly in order to stay in the game. If Microsoft is placed in this situation, the onus would be on them to either concede the format until the next major release is made, or shorten the upgrade cycle on Office. How many businesses would stick with an office suite which forced users to upgrade every eight weeks just to remain compatible? If something like this were to happen, we might finally be able to put a dent in the everpresent Office monopoly.

So why hasn't .DOC been reverse engineered? I would think that if this can happen to the DWG format then it can happen to any proprietary format. Have we tried, or has Microsoft's reputation, both professionally and legally, kept people from really thinking about it?

337 comments

  1. Aaargh... everything going wrong again by Coma+of+Souls · · Score: 1
    That's supposed to read:

    The first posts from the last 136 stories:

    1. 81 posts: Anonymous Coward
    2. 2 posts: Coma of Souls
    3. 2 posts: Sicknal 11
    4. 2 posts: Signal 12
    5. 1 post: /
    6. 1 post: addbo
    7. 1 post: Anonymous Cowart
    8. 1 post: bapya
    9. 1 post: BgJonson79
    10. 1 post: bitchslapboy
    11. 1 post: BlowChunx
    12. 1 post: CardiacArrest
    13. 1 post: chandler
    14. 1 post: crazy_speeder
    15. 1 post: DavidOgg
    16. 1 post: Decklin Foster
    17. 1 post: dJOEK
    18. 1 post: Doofus
    19. 1 post: Dr Caleb
    20. 1 post: DrEldarion
    21. 1 post: erik umenhofer
    22. 1 post: FascDot Killed My Pr
    23. 1 post: flipppy
    24. 1 post: fluxrad
    25. 1 post: gdulli
    26. 1 post: gkAndy
    27. 1 post: gt_croz
    28. 1 post: jims
    29. 1 post: JKR
    30. 1 post: LinuxFreak12
    31. 1 post: Machina
    32. 1 post: MalaclypseJr
    33. 1 post: MaximumBob
    34. 1 post: mr_biggs
    35. 1 post: nerdling
    36. 1 post: Old Wolf
    37. 1 post: Ophidian Jones
    38. 1 post: osm
    39. 1 post: Paradox`
    40. 1 post: philipm
    41. 1 post: QBasic_Dude
    42. 1 post: qbasicprogrammer
    43. 1 post: rjamestaylor
    44. 1 post: rms
    45. 1 post: session
    46. 1 post: sheriff_p
    47. 1 post: Signal l1
    48. 1 post: Spameroni
    49. 1 post: stokessd
    50. 1 post: Stskeeps
    51. 1 post: tealover
    52. 1 post: Tim_F
    53. 1 post: TRoLLaXoR
    I already took two firsts away from an Anonymous Coward.
  2. .DOC not exactly proprietary by Matts · · Score: 5
    I don't know why this myth continues to propogate:
    • .DOC is an OLE Document
    • OLE Document parsers are available for most platforms. Theres even one for Perl
    • The .DOC format is documented on the MSDN CD's - where else would you expect this documentation to appear?
    • So no reverse engineering is needed. Just follow the spec
    What truth remains is that the doc format changes from release to release of MS Word. So developers have to track these changes. The format is also a large and complex format, so its remained fairly niche in the open source world.
    --

    Matt. Want XML + Apache + Stylesheets? Get AxKit.
    1. Re:.DOC not exactly proprietary by NP · · Score: 2

      > documented on the MSDN CD's

      Or you could register at developer.microsoft.com (yes, I know, that hurts ...) and read the spec online.

      But since not even m$ can get diffrent versions of word to read each others files, there is a only a slim chance that someone else will get it right. So far I haven't seen anyone that have got it right.

    2. Re:.DOC not exactly proprietary by CarrotLord · · Score: 2

      So why is it that MS Word competitors struggle so much in importing and exporting .DOC documents? MS doesn't actually release new versions of Word _that_ often, and if the format is well-documented, implementing readers and writers for it shouldn't be as hard as it appears... Of course, not having looked into it myself, I don't understand the issues fully here, and an ugly, ambiguous format would certainly make life difficult... rr

      --
      Quidquid latine dictum sit, altum videtur.
    3. Re:.DOC not exactly proprietary by sj12fn · · Score: 2

      >The .DOC format is documented on the MSDN CD's

      The problem there is that only part of .doc is on the MSDN CDs. Just enough is kept out to make it impossible to build an office clone from it.

    4. Re:.DOC not exactly proprietary by martin-k · · Score: 5
      Close but no cigar.

      1. Physically reading a storage file is not the problem. Making sense out of the streams in the file much more so ...

      2. The Word 97 *was* on the MSDN CDs. Microsoft has pulled it about two years ago. (So much for keeping hundreds of old MSDN CDs around ...)

      3. The Word 2000 additions have never been documented in public.

      4. The MSDN documentation is vague and sometimes plain wrong.

      You get about 85% of a Word converter from coding along the Microsoft docs. It's the remaining 15% that's the hard thing.

      -Martin

    5. Re:.DOC not exactly proprietary by Schnedt+McWapt · · Score: 1

      Microsoft does not even TRY to get 'different versions of word to read each others files.'

      Their intent is for the format to be 'forward compatible' (drives the adoption of newer versions) and hence their only goal is to make sure newer versions will read docs created in earlier versions.

      So it's a bald-faced lie (or at worst a horrible distortion by someone very clueless) to claim that Microsoft is hopelessly lost and can't read their own formatted data.

    6. Re:.DOC not exactly proprietary by Mawbid · · Score: 2
      In the past, when people here have pointed out that the .doc spec is available on msdn, others have pointed out that it comes with a license which prohibits its use for making converters or import plugins for competing products.

      If that's true (and I'm not saying it is--I don't believe everything I read on Slashdot) then that spec doesn't help much. Sure, you could use it to write the converter but it might land you in jail (and living in Norway apparently wouldn't protect you).
      --

      --
      Fuck the system? Nah, you might catch something.
    7. Re:.DOC not exactly proprietary by Matts · · Score: 3
      1. Physically reading a storage file is not the problem. Making sense out of the streams in the file much more so ...

      This is true, although many people and projects have done a fairly good job - I wasn't trying to say that the format is totally freely available, more of a "What is this question doing here except trying to flame Microsoft?".

      2. The Word 97 *was* on the MSDN CDs. Microsoft has pulled it about two years ago. (So much for keeping hundreds of old MSDN CDs around ...)

      I wasn't aware it had dropped off. But then a lot of information has dropped off the MSDN CD's in favour of a link to www.microsoft.com. I'm willing to bet the Word 97 format is still on there somewhere.

      3. The Word 2000 additions have never been documented in public.

      I'm of the understanding that there weren't any (or at least very few). From what I heard, the additions were a few minor features, certainly nothing that would cause interoperability issues. But then what I've heard could be wrong...

      4. The MSDN documentation is vague and sometimes plain wrong.

      As is the GNU documentation, and sometimes the Perl documentation (actually its a lot better lately) and... well I could go on. Developers hate documenting internals. I don't blame the microsofties for that. Documenting things is boring. I'd rather add an animated paperclip ;-)

      --

      Matt. Want XML + Apache + Stylesheets? Get AxKit.
    8. Re:.DOC not exactly proprietary by Anonymous Coward · · Score: 1

      Source is available for GNU and Perl, so *all* the information required to interoperate *is* available.

    9. Re:.DOC not exactly proprietary by unilynx · · Score: 2

      I wasn't aware it had dropped off. But then a lot of information has dropped off the MSDN CD's in favour of a link to www.microsoft.com. I'm willing to bet the Word 97 format is still on there somewhere Nope, it isn't - it was pulled off MSDN. Apparently you can still request it from Microsoft by email (officeff@microsoft.com), and copies are still floating on the web. Also, according to Microsoft, there is no official documentation yet available on the Word 2000 format. The changes don't seem that interesting though - a lot of SPRMs were added and most of them do nothing more than confuse Word2000.

    10. Re:.DOC not exactly proprietary by SuperDee · · Score: 1

      Couple of points here: 1) Somebody *HAS* written a Word converter for Linux--it's called WV, and it has also been incorporated into AbiWord for the AbiWord Word importer. It works pretty well, but as mentioned, that last 15% is the hardest part, and that part is constantly being worked on. 2) As for COM on Linux, what about Mozilla's XPCOM?

    11. Re:.DOC not exactly proprietary by drinkypoo · · Score: 1

      If you're trying to work in perl, and you don't know C so well, having the source to perl might not be a big help. I know that I (with my admittedly extremely limited coding experience) have a hard time figuring out where to start reading through large C sources to figure out just what's going on.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    12. Re:.DOC not exactly proprietary by SciBoy · · Score: 1

      How do you pronounce that abbrevation: SPRMs?

      --
      "War is God's way of teaching Americans geography." - Ambrose Bierce (1842-1914)
    13. Re:.DOC not exactly proprietary by whoop · · Score: 1

      I was wondering, what sort of "license" does MS put on those specs? Is it similar to those fine Kerberos specs, "You can look at these but don't you dare try doing anything with them, especially on non-Windows platforms?"

    14. Re:.DOC not exactly proprietary by IntlHarvester · · Score: 1

      That's not true, they do TRY, sorta.

      Every new release of the Word format (6.0, 97), there's been a semi-working converter released for the old versions of Word.

      Plus, once you apply the service pack to Word 97, the default format becomes 6.0/95, unless you add some 97-only feature, in which case Mr Clipass warns you. (Microsoft added this behavior after some huge heat from their own customers.)

      --

      --
      Business. Numbers. Money. People. Computer World.
    15. Re:.DOC not exactly proprietary by nafmo · · Score: 1

      It's not entirely trivial to convert something designed to fit the inner workings of another program (in this case, MSWord), to fit your own. Just see how hard it is to get other conversions correct.

    16. Re:.DOC not exactly proprietary by lunatik17 · · Score: 1
      But the point is not to make it readable by everyone, just accessible to that one critical person who can write Word interoperability into his editor.

      Here's my DeCSS mirror. Where's yours?

      --

      Here's my DeCSS mirror, where's yours?

    17. Re:.DOC not exactly proprietary by pen · · Score: 1
      Word 2000 also comes with options for saving in older formats. When forced to use Word documents (you'd be surprised how many people spit your HTML file back at you and demand a .doc, even if no editing is necessary) I always save them in the 6.0 format.

      --

    18. Re:.DOC not exactly proprietary by mpe · · Score: 1

      Their intent is for the format to be 'forward compatible' (drives the adoption of newer versions) and hence their only goal is to make sure newer versions will read docs created in earlier versions.

      Even the latter dosn't always happen, or the document ends up being mangled in the process.

    19. Re:.DOC not exactly proprietary by drix · · Score: 2

      3. The Word 2000 additions have never been documented in public.

      That's because there aren't any "additions." Word 2000 is 100% backward-compatible with the Word 97 format.

      --

      --

      I think there is a world market for maybe five personal web logs.
    20. Re:.DOC not exactly proprietary by mr3038 · · Score: 1
      That's because there aren't any "additions." Word 2000 is 100% backward-compatible with the Word 97 format.

      AFAIK word2k supports tables inside table cells and word97 doesn't - how that can be 100% compatible? There is some other (small?) changes also. I for one wouldn't edit word2k document with word97 and still hope to not broke "formatting". I admit that you can see information inside word2k file with word97 and that's much better than with word95/word97. However, opening .doc file format would be The Right Thing.
      _________________________

      --
      _________________________
      Spelling and grammar mistakes left as an exercise for the reader.
    21. Re:.DOC not exactly proprietary by zigzag · · Score: 1

      Yes it is. It's the ultimate form of documentation. That's the point.

    22. Re:.DOC not exactly proprietary by Robert+S+Gormley · · Score: 2
      And when your job is to come in cold, and maintain this mess, with no documentation, because it was supposedly able to document itself?

      documentation: the act or an instance of furnishing or authenticating with documents.

      The product is not its documentation.

      --

      Open Source. Closed Minds. We are Slashdot.

    23. Re:.DOC not exactly proprietary by Malcontent · · Score: 1

      First rule of debugging.
      Debug the code not the comments.

      Point being that if you wanted to clone any part of perl you could because you have source. Documentation is nice but wholly unrequired.

      --

      War is necrophilia.

    24. Re:.DOC not exactly proprietary by alarmo · · Score: 1

      If the format spec didn't change, the way Word uses it apparently did - I've seen this trying to use word processors which handle word-97 files properly, watching them make ugly messes out of word-2000 documents with such features as revision tracking, some list formatting options, etc.

      You don't have to change very much to have a compatibility problem, at least for the people - professional users, mostly - who use those advanced features.

      Microsoft Word has a file format only slightly less complex than the human brain. - Linux Magazine, from a month or two ago.

    25. Re:.DOC not exactly proprietary by FooRat · · Score: 1

      And in what way was that "flamebait"????

      Stupid moderators.

      Go to Microsoft's MSDN site. .DOC files do use what are called "docfiles" and they are proprietary, fact.

      And with regard to MS API's, I've programmed with DirectX, Win32, MFC, Windows sockets, Windows CE API's, and for every single one of those there have been blatant errors and inconsistencies in the documentation.

      So how the hell was that flamebait?

    26. Re:.DOC not exactly proprietary by bmajik · · Score: 1

      PCfileviewer for solaris has no problems reading .DOC, excel files, or whatever.

      It can also do .DWG iirc.

      Reading these files is easy.. writing may be more complex.

      --
      My opinions are my own, and do not necessarily represent those of my employer.
  3. Uh duh by TummyX · · Score: 5

    DOC isn't a difficult file format. It's pretty well documented in various places around the web.

    The thing is DOC is a compound file format. Meaning it is made up of various serialized data streams from embedded components. Word itself won't even know what many parts of a DOC file means, it'll just pass it on to Visio, Excel, Photoshop etc to read and understand.

    DOC is a hugely extensible file format, and you can't support everything DOC can cause DOC can theorectically support just about anything...especially windows applications.

    And no that was not done through evil intent. Believe it or not, integration of applications is very much something that good software engineers strive for.

    If you have a problem with it, just wait a few years (or maybe a decade) for KOffice etc to mature, and watch people complain as documents created on the Linux version of KOffice won't work because someone decided to embed in their document some python code, or an xpaint image.

    1. Re:Uh duh by tzanger · · Score: 2

      I really hope I get a response, Slashdot blows for how it handles the user info screen... you need to remember how many replies you had to a given comment instead of having it know that you clicked on it when it was at 3 replies and give you some kind of visual cue that there are now 5 or 7 or 23 :-)

      The thing is DOC is a compound file format. Meaning it is made up of various serialized data streams from embedded components. Word itself won't even know what many parts of a DOC file means, it'll just pass it on to Visio, Excel, Photoshop etc to read and understand.

      If the spec is right then, I should be able to import my .doc files, creating tables, lists, text, all formatting and most graphics (.gif, .jpg, .png, etc.) without any trouble, as Word doesn't need any part of any other program to do this. Why can't I?

      I agree fully with you that Word doesn't handle most of the complex streams (excel data, powerpoint data, visio data, etc.) but in my documents I don't have any of these, it's all text and a lot of formatting, which Word would have to handle on its own.

    2. Re:Uh duh by TummyX · · Score: 1

      Um, do you even know what you're talking about?

      Sure python is open, but how many users on all platforms will have a python engine that'll be compatable with any future KOffice implementation?

      Or what about xpaint? How many people will be pissed when they open their KWord document in Word for Windows and can't edit the image cause they don't have xpaint or an xbm compatible editor?

      We are talking about comptiblility across platforms here right?

      I'm saying it's not that easy cause of the generalisation and extensibility of the DOC format.

    3. Re:Uh duh by TummyX · · Score: 1

      What program are you tring to use to import DOC files?

      Staroffice should be able to handle tables and images.

      Formatting might get you into trouble as different programs have different ideas. (Compare Word, StarOffice, Abiword and Word Perfect and notice the differences).

      I suppose you could say Word is the right one ;)

    4. Re:Uh duh by Arker · · Score: 1

      Um, do you even know what you're talking about?

      Yes, unfortunately for you, trollboy, I do.

      Sure python is open, but how many users on all platforms will have a python engine that'll be compatable with any future KOffice implementation?

      Any that don't can download one for free. If M$ truly gave a *&^% about open standards, they would include one by default, but even lacking that it is easy and free to correct their mistake.

      Or what about xpaint? How many people will be pissed when they open their KWord document in Word for Windows and can't edit the image cause they don't have xpaint or an xbm compatible editor?

      I am running win32 right now (I have linux on this machine as well, but at the moment I am in windows) and I have no difficulty viewing and editing X-window bitmaps (.xbms) with freely available tools without even rebooting. Try again, trollboy.

      We are talking about comptiblility across platforms here right?

      Yep. So what's your excuse for not following open protocols, microserf? I keep asking but you never reply.

      I'm saying it's not that easy cause of the generalisation and extensibility of the DOC format.

      Bzzzzt! Wrong! Try again, troll boy. If you and your employer cared one bit about generalisation and extensibility, M$ Word would be outputting TeX files. Come up with another excuse, before all the naive moderators that are trying to prove they aren't biased by moderating your posts up catch on to your game...

      --
      =-=-=-=-=-=-=-=-=-=-=-=-=-=-
      Friends don't let friends enable ecmascript.
    5. Re:Uh duh by TummyX · · Score: 1

      You don't know what you're talking about at all.

      We're talking about embedding these custom objects inside a compound document.

      And why the hell should microsoft output LaTeX files? Microsoft Word is based on COM/OLE, why would they go backwards to LaTeX?

      The future of verbose file formats and office formats is in XML NOT TeX.

    6. Re:Uh duh by Arker · · Score: 1

      You don't know what you're talking about at all.

      No matter how many times you say that, it still is not true.

      We're talking about embedding these custom objects inside a compound document.

      Umm... doh! First off, in well over 99% of cases, we are not. Secondly, in those few cases where we are, this can be done in a standard format anyhow.

      And why the hell should microsoft output LaTeX files? Microsoft Word is based on COM/OLE, why would they go backwards to LaTeX?

      Hrmm, so "backwards" and "forwards" are completely dependent on time of introduction? So if I introduce a ridiculously complicated way of doing X and a much simpler and better supported way of doing X is already published, my way of doing X is necessarily a step "forward" by virtue of being later, right?

      Or is this only true when I == Micro$haft?

      You may say I have an axe to grind, if you do, it will be the first unambiguously true and correct statement you have made today. As I mentioned in another post, I've used Micro$haft products since well before 1990, when they at least felt a need to pretend to care what the user wanted, and I am bloody well pissed off at you these days. Yes, you have a right to tie your "innovations" (which I almost never have any need for anyhow) to your own products, and make them incompatible with the rest of the world, but it makes your product less - not more, attractive to me. And don't give me this bull-pucky about "moving forward" when what you are doing is moving back. I'm not an idiot, and I am sick of being treated like one.

      --
      =-=-=-=-=-=-=-=-=-=-=-=-=-=-
      Friends don't let friends enable ecmascript.
    7. Re:Uh duh by tzanger · · Score: 1

      What program are you tring to use to import DOC files?

      My own creation, just to convert .doc to .html, creating tables and standard formatting (lists, bold, underline, etc., and doing some "intelligent" font manipulation and such.

      Formatting might get you into trouble as different programs have different ideas. (Compare Word, StarOffice, Abiword and Word Perfect and notice the differences). I suppose you could say Word is the right one ;)

      Indeed. I was just looking more for the standard BIU and font size/"class" stuff. Nothing too tricky.

    8. Re:Uh duh by Robert+S+Gormley · · Score: 1
      Any that don't can download one for free. If M$ truly gave a *&^% about open standards, they would include one by default, but even lacking that it is easy and free to correct their mistake.

      An observation, not a troll or a flame, but why is downloading components evil in browsers and plug-ins, but not for things like this?

      --

      Open Source. Closed Minds. We are Slashdot.

    9. Re:Uh duh by spectecjr · · Score: 2

      The thing is DOC is a compound file format. Meaning it is made up of various serialized data streams from embedded components. Word itself won't even know what many parts of a DOC file means, it'll just pass it on to Visio, Excel, Photoshop etc to read and understand.

      That's actually not necessarily the case; OLE has a mechanism known as "View Caching" which keeps a snapshot of the embedded data in a form which can be displayed or printed in the document without the generating application/control needing to be present on the machine that's viewing the document. You can't edit it, but you can view it or print it - and unless you need to mess with the document (most people just read it), that's enough.

      Simon

      --
      Coming soon - pyrogyra
    10. Re:Uh duh by Rhys+Dyfrgi · · Score: 1

      If you want to have various data types in a single document (say, text with various defined style classes and some exceptions, a few images of various types, and a movie with audio), then which is the better choice: XML or TeX? Even without the movie, XML is still better. XML can do just about anything with the right DTD; TeX can format text.
      ---

      --
      END OF LINE
    11. Re:Uh duh by Malcontent · · Score: 1

      "Come up with another excuse, before all the naive moderators that are trying to prove they aren't biased by moderating your posts up catch on to your game..."

      Pro MS posts are always moderated way up. Check the last few topics and see. Slashdot is no longer a Linux or open source forum it is now filled with people who love MS. If you don't believe sort by points.

      --

      War is necrophilia.

  4. Ok, here we go again... by TummyX · · Score: 2

    Here's another try to act professional, but bash microsoft at the same time type post. Pretty typical of Linux users...


    On the other hand, we have DWG, which is a fairly rich format that deals with the description of 3D objects. Could decoding a file format that deals with text and it's presentation really be that much more difficult to reverse engineer?


    Well considering DOC can store ANYTHING - including the description of 3D objects yes.


    I'd guess this depends more on the design behind said file format. If one of the main goals of the .DOC format is obfuscation, this could be difficult indeed


    I see, Microsoft == Evil, so DOC must be created to obfusticate. Very smart of you.
    Why would a company with the smartest people in the world make life more difficult on themselves by making their own formats hard to read? I guess Microsoft will go out of it's way next to obfusticate their source code to make it more difficult for the OSS community to read their source?


    but I wouldn't say that it's impossible ... not for 3 big corporations, nor for thousands of loosely organized coders.


    Yes, those poor, poor companies like SUN with their open software like Java and Corel Office need to band together and blow up microsoft. resistance is not futile!

    Please.

    DOC isn't going to be very important in a few years anyway, Microsoft are moving to XML based everything. Serialization of com services will be XML based rather binary based as they are today as well. Just don't complain when your documents are 100MB.

    1. Re:Ok, here we go again... by nagora · · Score: 5
      Why would a company with the smartest people in the world make life more difficult on themselves by making their own formats hard to read?

      Well I can't imagine why but Microsoft, on the other hand, has a strong profit motive. Once the file format changes, as it does every year (or faster) people start getting emails with the new format in attachments. If they could just use a filter then they wouldn't have to upgrade from Word 6 or whatever was the last version that actually offered them new features they needed.

      An obfusticated format means that filters are hard to write so such people are forced to upgrade which == cash for Bill. In fact, according to M$ this is their single biggest source of revenue.

      I guess Microsoft will go out of it's way next to obfusticate their source code to make it more difficult for the OSS community to read their source?

      Undoubtedly, if they're ever forced to release it. In fact, since you mention it, the release of the source code would be useful almost exclusively for the .h files with the data structures in them. Frankly, who gives a damn about the rest of the code? I can write my own bugs, thanks.

      DOC isn't going to be very important in a few years anyway, Microsoft are moving to XML based everything.

      Which means that at some point they'll start changing the definition of XML to close out competitors. They've always taken this approach, why do you think they won't this time?

      When a twit like you starts defending M$ the question I always want to ask is "If they're not a pack of shits why do they bribe, threaten, steal and lie? Do you think it's some sort of hobby?"

      TWW

      --
      "Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
    2. Re:Ok, here we go again... by Balial · · Score: 1

      You don't need to pay $$$ to read MS word documents, you can download a free reader for most of the MS Office formats from microsoft.

    3. Re:Ok, here we go again... by igaborf · · Score: 3
      If they could just use a filter then they wouldn't have to upgrade from Word 6 or whatever was the last version that actually offered them new features they needed.

      You mean like this one?

      When a twit like you starts defending M$ the question I always want to ask is "If they're not a pack of shits why do they bribe, threaten, steal and lie? Do you think it's some sort of hobby?"

      When twits like you attack M$ for the wrong reasons it makes it harder to get the unobsessed to listen to the valid complaints against M$.

    4. Re:Ok, here we go again... by TummyX · · Score: 1


      Once the file format changes, as it does every year (or faster) people start getting emails with the new format in attachments.


      The last time the DOC format changed was 1997.

      2000 - 1997 is 3, which is not less than 1 last time i checked.

    5. Re:Ok, here we go again... by sj12fn · · Score: 1

      >>On the other hand, we have DWG, which is a fairly
      >>rich format that deals with the description of 3D
      >>objects. Could decoding a file format that deals >>with text and it's presentation really be that >>much more difficult to reverse engineer?

      >Well considering DOC can store ANYTHING -
      >including the description of 3D objects yes

      OK, about a minute ago you said in an earlier post that .doc wasn't that hard to decode and now you say it is. Well, which is it?

    6. Re:Ok, here we go again... by nagora · · Score: 1

      You don't need to pay $$$ to read MS word documents, you can download a free reader for most of the MS Office formats from microsoft.

      Only if you are a Windows user.

      The vast majority of Windows users don't know about this, and wouldn't try it if they did. Closed source software discourages attempts to understand the system you're using and makes simple solutions like this daunting to their users. Oops, off-topic.

      TWW

      --
      "Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
    7. Re:Ok, here we go again... by nagora · · Score: 1

      The last time the DOC format changed was 1997.

      The copy of Word 97 in our office chokes on Word 2000 files. Perhaps there is another reason for this. I no longer use it so I didn't look too deeply into the subject. What other reason do you know off that this would happen?

      TWW

      --
      "Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
    8. Re:Ok, here we go again... by elbuddha · · Score: 1

      "Why would a company with the smartest people in the world make life more difficult on themselves by making their own formats hard to read?"

      They wouldn't. But we aren't talking about a company with the smartest people in the world, we are talking about Microsoft.

    9. Re:Ok, here we go again... by Reality+Master+101 · · Score: 1

      Wrong-o.

      Quote: "If you are using Word 6 and 95 for Windows 32-bit Operating Systems, this converter will allow you to open files created by people using Word 97/2000."

      I wish people on Slashdot would learn to read. Oh, I forgot, we're talking about Microsoft, therefore they probably put up this page just to fool the DOJ and converter doesn't really work.


      --

      --
      Sometimes it's best to just let stupid people be stupid.
    10. Re:Ok, here we go again... by LionMan · · Score: 1

      >>On the other hand, we have DWG, which is a
      >>fairly rich format that deals with the
      >>description of 3D objects. Could decoding a file
      >>format that deals with text and it's
      >>presentation really be that much more difficult
      >>to reverse engineer?

      >Well considering DOC can store ANYTHING -
      >including the description of 3D objects yes.

      This has /nothing/ to do with decoding the file. The fact that it might be able to store a description of 3D objects makes no difference, same thing as storing _another_ file linked/embedded into it. That 3D description is not part of the file format! (Do you think Microsoft could make a 3D modeler?)

      Secondly, Corel already has an excellent converter for DOC files - it not only imports them correctly, but exports them (IMHO) better than Word. Look at the HTML that Word makes out of DOC files (not that I would use that "feature") - tags are littered around randomly, there is no consistancy. But if you convert a WPD file into DOC and let Word convert it to HTML, it /is/ cleaner! Try it.

      --
      -Leo
    11. Re:Ok, here we go again... by Azure+Khan · · Score: 1

      "They wouldn't. But we aren't talking about a company with the smartest people in the world, we are talking about Microsoft."

      Wow, this is ALMOST flamebait. Regardless of what you or I think of the executive and marketing practices of Microsoft, in terms of engineering and programming talent, Microsoft hires and maintains what is the probably the largest pool of genius on planet earth. They recruit from MIT, Berkeley, Caltech, etc. on a regular basis. Don't belittle the people trying to earn a paycheck just because the checkstub says Microsoft on it. Not everyone shares a hatred of Microsoft.

      Azure Khan

      --

      --- I'm going sane in a crazy world.
    12. Re:Ok, here we go again... by TummyX · · Score: 1

      DOC isn't a difficult fileformat (it's documented in various places), it lays down a framework for multiple seemingly anonymous streams to be persisted.

      However in that post and this post, I did try to say that reverse engineering for the purpose of displaying and editing the document can be extremely difficult because it's a compound document format. You'd have to basically reverse engineer all the other data streams that 'objects' people embedd in their word documents generate.

    13. Re:Ok, here we go again... by TummyX · · Score: 1


      Are you one of the smartest people in the world


      No. But I believe that Microsoft has some of the smartest people in the world working for them.

    14. Re:Ok, here we go again... by nagora · · Score: 1
      strange...my copy of word 2000 happens to be able to save any format it understands in word 6/95 format

      What? We're all talking about being forced to upgrade to 2000 to read files sent by other people. The whole point is that we don't have 2000, so what formats it saves in is irrelevant.

      TWW

      --
      "Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
    15. Re:Ok, here we go again... by BorgDrone · · Score: 1

      I hate M$ like the next guy... but you can't say there aren't some really smart people at microsoft.
      the problem is, they don't run the company, marketing does.

      ---

    16. Re:Ok, here we go again... by Darchmare · · Score: 4

      Yes, but doesn't that require that you own the Latest And Greatest (*cough*) version of Word?

      I think the point is that you have to pay Microsoft the full price of the office suite for the 'privelege' of using newer document formats. That effectively limits the life of your software purchase so that you have to buy a completely new copy whenever there is a document format change - at that point, why not just use it as your primary version?

      THAT is where the rub lies - at that point, you start sending out copies that can only be read in the newer version, and your colleagues begin upgrading as well. It's an endless cycle.

      People want to break that cycle so that they can either use a competing Office program based only on its merits or stick with a previous version which they feel was better than the next version (ie. Mac users who upgraded to Word 6, but wish they had stuck with the previous version) .

      I believe this is a worthy goal.

      - Jeff A. Campbell
      - VelociNews (http://www.velocinews.com)

      --

      - Jeff
    17. Re:Ok, here we go again... by Darchmare · · Score: 2

      Re: XFree - Perhaps you should ask for your money back, then? How much was it that you paid?


      - Jeff A. Campbell
      - VelociNews (http://www.velocinews.com)

      --

      - Jeff
    18. Re:Ok, here we go again... by nagora · · Score: 1

      Actually you didn't say that.

      I didn't need to. Or did you think the original post was about reverse-engineering under Windows? I suppose it could be read that way, but it seems clear to me.

      You also lied when you said they change the file format every year.

      Well, it changed this year and it seems like about a year since the last one. I might be wrong, but it certainly changes enough to be a problem.

      I've got news for you. THEY ALL DO.

      So what? Are you suggesting that that makes it alright?

      But don't talk about MS like they are so much worse than other companies.

      I didn't; pay attention.

      --
      "Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
    19. Re:Ok, here we go again... by binarybits · · Score: 1

      I didn't need to. Or did you think the original post was about reverse-engineering under Windows? I suppose it could be read that way, but it seems clear to me.

      You claimed that Microsoft changes the file format so it can sell more copies of Office. Now, since Office is only sold on Windows and Macintosh, only users of those platforms are potential customers. So how does changing the file format increase their bottom line among other platforms? It doesn't.

      If you are claiming that they do it to prevent reverse-engineering, fine. But that's a different claim than saying they do it to sell more copies of Office.

    20. Re:Ok, here we go again... by elbuddha · · Score: 1


      My post was meant to be toungue-in-cheek.

      However, if you want to look at the statement I replied to logically, the phrase "company with the smartest people in the world" implies that all of the smartest people in the world work for Microsoft. Numerous counterexamples can be made of some of the smartest people in the world who do not work for Microsoft. Therefore the statement is obviously false. Thus, my rebuttal that Microsoft is not a company with the smartest people in the world.

      If the original statement had instead stated something like "a company with some very smart people", or even "a company with [arguably] some of the smartest people in the world" it would have been a much more defensible argument.

    21. Re:Ok, here we go again... by TummyX · · Score: 1

      Two of the most famous are..

      Tony Hoare (Came up with Quicksort).
      Dave Cutler (Original architect of NT).

      Most of the people developing and research at Microsoft probably deserve mention too.

      I'm continually suprised at how sophisticated the software they release are, and how well they work despite the sophistication (this is especially true when it comes to creating UIs for their apps).

      I've found creating application logic is simple, but trying to actually design and implement a UI that people can use is extrodinarily difficult. Even just the programming of the UI is difficult.

      Like, I'd like to see any Unix application that allows me to 'draw' tables. Even frontpage lets you grab a pen tool and draw an HTML table line by line.

      Takes a few lines of code to have a dialog request rows/col. Takes skill and dedication to do what Microsoft often do.

    22. Re:Ok, here we go again... by nagora · · Score: 1
      Goddamn that sounds like some BS you heard in a college class.

      No, it's called (you might want to write this down) observation of the real world. I see it every day. Tell people that there are "no user-servicable parts inside" and they assume that they have no control over it beyond what it says in the manual. They (non-hackers) will not attempt to make it do things they think of as messing about with the internal workings. They're scared.

      It's just human nature. Tell people to keep their hands off a technical thing and they often do.

      And it's not laziness or stupidity, it's ignorance. Ignorance is easy to cure, the other two aren't.

      TWW

      --
      "Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
    23. Re:Ok, here we go again... by xamichee · · Score: 1

      Start button, Settings, Control Panel, Add/Remove Programs, Windows Setup tab, and Outlook Express is right there in the components lists

    24. Re:Ok, here we go again... by nagora · · Score: 1
      You claimed that Microsoft changes the file format so it can sell more copies of Office. Now, since Office is only sold on Windows and Macintosh, only users of those platforms are potential customers. So how does changing the file format increase their bottom line among other platforms? It doesn't.

      I'm not being clear.

      The original post is (I think) about non-windows, but the motive for M$ to upgrade applies to both their own users (to move to the latest version) and non-windows/mac users (to get them to convert to M$ from Linux etc.).

      The problem for both set of users is largely the same: they get emails etc in the new form and need new import filters.

      --
      "Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
    25. Re:Ok, here we go again... by Patrix · · Score: 1

      But Softimage hasn't been made by Microsoft - it has been in Canada (I forget which company, but it was in Quebec though).

      Patrix.

    26. Re:Ok, here we go again... by Alien54 · · Score: 1
      and of course, who writes the programming specs?

      alot of things that microsoft does seems to be the easier way out, and just "happens" to benefit them from a marketing viewpoint.

      we enter into a new world here, the marketing perspective of software design specifications. but then again wasn't there the slogan that "it doesn't ship until Lotus chokes?" or some such thing?

      --
      "It is a greater offense to steal men's labor, than their clothes"
    27. Re:Ok, here we go again... by nagora · · Score: 1
      Okay, I'm confusing two issues here.

      1. M$ users themselves are forced to upgrade, which is the reference to Word 6 (the last version I personally used - I quite liked it).

      2. Linux etc users need new filters.

      Two differing sets of people but, in essence, the same problem for both: reading the new format.

      I'm glad 1997 seems like a year ago for you. you must be having more fun than I am.

      You must be miserable then!

      What's wrong with having a new product every 2-3 years that is backwards compatible?

      The issue is that the file format changes even if none of the new features is needed for the document, this forces people that don't want the new version to buy it (on Windows) or to change OS because their boss doen't understant the issues (other platforms).

      Just because you hate one company and like another you ignore the bad things they do?

      Look, if we're talking about Charles Manson do I have to stop to condem a long list of other serial killers before continuing? Can't we just assume that if I dislike M$'s activities that I disliked it when Netscape did it, or when Apple do it?

      --
      "Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
    28. Re:Ok, here we go again... by igaborf · · Score: 1
      No, I mean one that works on other OS's. Pay attention.

      But what you said was:

      If they could just use a filter then they wouldn't have to upgrade from Word 6 or whatever was the last version that actually offered them new features they needed.
      So what "other OS's" were you running Word 6 on that you need to upgrade it?

      So, if lying, stealing, blackmail, and ignoring court orders is not the valid complaints against M$, what the hell are!?

      Lying and ignoring court orders (well, court-approved settlements, anyway) are certainly valid complaints. But those have nothing to do with your assertion about M$ forcing existing Word users to upgrade in order to read newer Word formats. And when you tell lies like that, you are behaving no better than M$. Worse, you're giving M$ cover: "See? Our critics are saying untrue things about us so everything they say is untrue."

    29. Re:Ok, here we go again... by nagora · · Score: 1
      So that means they wont go to www.microsoft.com and download a viewer for word?

      No, they won't. I know this is /. and we're all heavy computer users but TRY to think like a person who does nothing on a computer except type letters for his/her boss. They will no more download a viewer for word than they would have opened up their old typewriter and tried changing the motor. And its partly because they're used to being unable to do anything not in the manual and partly because they're not paid to do that and aren't interested enough to try it.

      TWW

      --
      "Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
    30. Re:Ok, here we go again... by Andrew+Cady · · Score: 2
      I see, Microsoft == Evil, so DOC must be created to obfusticate. Very smart of you. Why would a company with the smartest people in the world make life more difficult on themselves by making their own formats hard to read?
      Because they only make it marginally more difficult for themselves, but at the same time make it vastly more difficult for their competitors. This is the tactic sited in the Halloween documents, and it is known that such tacticts were used in the SMB protocol.

      From Halloween: "OSS projects have been able to gain a foothold in many server applications because of the wide utility of highly commoditized, simple protocols. By extending these protocols and developing new protocols, we can deny OSS projects entry into the market."

      We *KNOW* that MS is specifically making protocols, not to enhance the experience of the user or add capabilities (although these may also be done sometimes), but to decrease the ability of free software to interoperate.

    31. Re:Ok, here we go again... by kjhambrick · · Score: 1

      Hmmm ...

      Is Tony Hoare the same person as C.A.R. Hoare,
      the inventor of quicksort in 1962 ?

      See:

      http://www.npac.syr.edu/copywrite/pcw/node302.ht ml

      -- kjh

    32. Re:Ok, here we go again... by TummyX · · Score: 1

      I'd say so.

      Search the net for "quicksort" and "tony hoare" and you'll get quite a few results from various universities etc.

      Dr. Hoare's website is here

    33. Re:Ok, here we go again... by nagora · · Score: 1
      You forgot rape, incest, murder and crimes against humanity.

      Well, I was only listing things they've been found guilty of in a court of law, but perhaps you've other information.

      [Yes, you are a fool, just so you know]

      You just want to be a mug, don't you?

      --
      "Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
    34. Re:Ok, here we go again... by boloni · · Score: 1

      I guess the "smartest people in the world" stuff is a bit misleading. They indeed hired a number of "big names" for the research department. Unfortunately none of them had done anything serious AFTER they've moved to Microsoft. One of the reasons might be that the research department is completely isolated from the rest of the company, and regular employees speak about them as "those idiots and weenies, they don't have release schedules, they never do anything useful". In conclusion: the big names are there to serve as a "prestige facade". For the rest of the company, the hiring philosophy is "We don't care what you know, we will teach you everything." They are not hiring "software engineers" but "smart kids". The interview questions usually contain programming riddles, not software design questions. The Bill Gates style of enthusiastic amatorism, you see. On the other hand, right now, even those wunderkids are not going to MS: what in the hell would they do there? Participate in releasing one more version of the operating system in the next 8 years? Let's remember that the internal creativity of Microsoft was generally very limited even in their heydays: I can not remember one software which in its original form was started at MS: all of them very bought from external companies! (with the companies, usually). Ok, maybe except the original Basic interpreter written by Bill Gates, but that's long ago, when he was the wunderkid. Even high ranking officials at Microsoft first leave the company and then start a startup: because the corporate culture do not let them innovate inside. Very funny that the otherwise very rigid old AT&T, even without being in the computer business created a large number of contributions: because they had a sheltered group of researchers who could do basically what they wanted at Bell Labs, while keeping a strong discipline on the external, telephony side. In conclusion: I don't think MS has the smartest people. They are certainly not hiring for the cream of universities: look at the extraordinarily drive Transmeta had done to hire Linus and the best new PhD-s around. And they are stiffling even those which they have. I am wondering what would have Linus done if hired by MS. Fixed bugs in Win2k? He certainly would have not been allowed to start a port to a new architecture: that's a "political" decision. Etc, etc. greetings Lotzi

    35. Re:Ok, here we go again... by nagora · · Score: 1
      But these same people WOULD be interested in running Linux?

      They can and do. What really stops them is not desktops or window managers or all that cack, but the inability to do their work. Which is where the filters and Word come in.

      Most Windows users never use Windows itself, they just use the apps and go home. Many, many Windows users don't know how to find a file on their drives without using the Find File app. To these people Linux and Windows as tools in themselves are equally opaque. But under Linux they aren't constantly told "don't touch" and they slowly grow into REAL users.

      If I was being cruel, I'd say they had no choice.

      TWW

      --
      "Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
    36. Re:Ok, here we go again... by drinkypoo · · Score: 2

      Why would a company with the smartest people in the world make life more difficult on themselves by making their own formats hard to read? I guess Microsoft will go out of it's way next to obfusticate their source code to make it more difficult for the OSS community to read their source?

      It's not harder for them to read, because all they have to do when they make mods to the format is make changes to the DLL that handles parsing DOC files at the same time. Only the people who work on the file format itself have to work with the contents of a DOC file directly (at microsoft, that is), and everyone else just deals with the data after it's been parsed out.

      Incidentally, back in the days of the Amiga (Oh no, here we go down nostalgia lane again) lots of people used IFF format for their files, which is an extensible hunk-based file format where you can include any kind of data, so a palette file can be an IFF with only the palette hunk, whereas an image has both the palette and the image hunks. Even Amiga binaries seemed to be either IFF or something closely akin to it, which is an observation I made purely by watching powerpacker go off on my binaries back in the days before I had a hard disk, and may be purely BS.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    37. Re:Ok, here we go again... by dragonfly_blue · · Score: 1
      Yeah, I can't wait until we get to use MS-XML, it'll be the answer to everything. (/sarcasm)

      Ever look at the XML output of Office 2000? Well, I have, and Microsoft has once again pissed openly in the standards river.

      It is the ugliest, most proprietary excuse for standards compliance that I have ever fricking seen. Embrace and extend this!

      --
      Free music from Jack Merlot.
    38. Re:Ok, here we go again... by jetson123 · · Score: 2
      I see, Microsoft == Evil, so DOC must be created to obfusticate. Very smart of you. Why would a company with the smartest people in the world make life more difficult on themselves by making their own formats hard to read?

      I think it's an expedient combination: using object serialization for I/O makes it both easy for Microsoft to read/write data, it makes it difficult for competitors to do anything with the format on other platforms, and it forces users to upgrade their copies of Office with every new release.

      This is, in fact, at the heart of what people are complaining about Microsoft: Microsoft adopts strategies that give them a quick time-to-market, lock users into upgrade paths, and that are also effectively exclusionary. I wouldn't necessarily call that deliberately "evil". I'm sure many people at Microsoft view it as the natural way of doing software development, and they view everybody else in the industry who bothers with standardized or well-documented formats as people who foolishly waste time and money.

      DOC isn't going to be very important in a few years anyway, Microsoft are moving to XML based everything. Serialization of com services will be XML based rather binary based as they are today as well.

      While it may help a little, serializing objects in XML format will not necessarily result in formats that are significantly more readable, accessible, or backwards compatible. To make sense of a big and complex XML model, you still need a formal definition of what it is.

      This is really an issue for users and customers: users should insist that their data is in well-documented formats that remain constant and compatible across releases. That's why many government offices have insisted on using SGML in the past.

      Using serialization for document storage is simply poor engineering, whether it is done by Sun or by Microsoft or by anybody else. Skipping the step of formally defining a storage format is expedient to the company but harmful to users. In the long run, users have too much invested in their content to store it in such an ephemeral format.

    39. Re:Ok, here we go again... by IntlHarvester · · Score: 1

      First of all, I don't think that the next version of Word's format will be pure XML. Probably more like HTML4 + CSS2 + Some made up or unapproved parts of CSS3 + Some extra stuff MS invented + XML. I would expect it to be able to somewhat gracefully devolve and be somewhat viewable on IE4/5 and NS4.

      If I'm right, then Unix users are in better shape than they are today. Of course, I could be totally wrong - it could be pure XML voodoo, or worse, require some special IIS-only middleware software before being viewable on the web.

      serializing objects in XML format will not necessarily result in formats that are significantly more readable, accessible, or backwards compatible

      Well, this is pretty much certain, given their current architecture. Ugg, I can see it now:

      <object type='msword10voodoo' encoding='base64' id='{ACF0-415BC...}'>150873AB32456CDA1AC4...</obje ct>


      --

      --
      Business. Numbers. Money. People. Computer World.
    40. Re:Ok, here we go again... by TummyX · · Score: 1

      I'm sorry but many MS engineers are probably smarter than Linus.

      All Linus did was copy a 30yr old OS using well known algorithms and code.

      I need not remind you that Linux is NOT the best implementation of Unix out there - in fact it's one of the poorest. It pulled a Homer.

      To suggest things like he could have fixed bugs in Win2k (good software engineers wouldn't even suggest that anyway)...is just retarded.

      Windows 2000 is the best product Microsoft has released, sure it has some 'bugs', but we all know EVERYTHING has bugs.

      And Microsoft might not have come up with any product centred around a "new" concept on their own. But they certainly are damned good at implementing existing ideas with some damn good software engineering.

      They do things which other software engineers don't bother doing cause it's too hard.

      I mentioned elsewhere in this thread an example about the ability to 'draw tables' in frontpage/word/excel etc as opposed to just specifying rows/cols.
      Other things I can think of are edit-and-continue in VC++, intellisense in Visual Studio (including autoprompting of C++, Java, HTML, JavaScript, ASP etc).

      These are the things which make their products stand out from the competition, and that's what counts.

    41. Re:Ok, here we go again... by DGregory · · Score: 1

      Yeah but the reader doesn't help you if you want to do something extremely difficult such as, oh, PRINT the document.

    42. Re:Ok, here we go again... by FigWig · · Score: 2

      Well considering DOC can store ANYTHING - including the description of 3D objects yes.

      This comment is meaningless. Any file can store anything. BFD. Does DOC have predefined data structures to store a 3D database? No. It does have the ability to 'serialize' (is this a Java only term?) OLE/whatever objects. Not at all the same.

      Why would a company with the smartest people in the world make life more difficult on themselves by making their own formats hard to read?

      I don't know, maybe to make more money? I try to stay relatively sane when it comes to MS bashing, but doesn't it only make sense that if the file format is what is locking your product into the market, you will do everything you can to keep it a secret? Autodesk did it with DWG (I had to muck around with reading DWGs a couple years ago and there was incredibly little info out there).

      Microsoft are moving to XML based everything. Serialization of com services will be XML based rather binary based as they are today as well. Just don't complain when your documents are 100MB.

      I won't be complaining since all my XML docs will be gzip'ed and all my apps will automagically decompress them before reading them. XML is a standard data markup format, but just wait until MS goes crazy with its DTD. Just because you can parse a file doesn't mean you have a clue as to how it works.

      --
      Scuttlemonkey is a troll
    43. Re:Ok, here we go again... by kerner · · Score: 1
      Which means that at some point they'll start changing the definition of XML to close out competitors. They've always taken this approach, why do you think they won't this time?

      This wouldn't shock me at all.

      If you look at how Office 2000 saves RTF files they have extra information stored within them. Older versions of Office open this files just fine, but non MSFT software inserts a bunch of crap into the document.

      Isn't RTF supposed to be a standard document type?

    44. Re:Ok, here we go again... by seichert · · Score: 1
      When a twit like you starts defending M$ the question I always want to ask is "If they're not a pack of shits why do they bribe, threaten, steal and lie? Do you think it's some sort of hobby?"

      Not to get into this, but heck why not.

      1. Who did Microsoft bribe? I thought that was the reason they are getting screwed by the government, because they didn't bribe people like various Valley firms.
      2. Who did Microsoft threaten with loss of life and limb(a threat to take away market share is just business)?
      3. Who's building did Microsoft break into, who's computer did they hack and how exactly did Microsoft steal somebody's private property?
      4. Which customer, competitor, etc. did Microsoft defraud(in your words lie)?
      I'm not saying you don't have answers to these questions, but if you could give me some good ones then maybe I could see a reason for legal action to be brought against Microsoft or some of its employees. Otherwise the government is just doing this to flex its muscles and make good on the bribes it received from Scott McNealy and others.


      Stuart Eichert

      --

      Stuart Eichert

    45. Re:Ok, here we go again... by Phokus · · Score: 1

      Will you shut up already MS fanboy? The fact that someone like you exists, sickens me. BTW, your first post was owned by the guy who replied right under you. GG.

    46. Re:Ok, here we go again... by Chris+Johnson · · Score: 2

      Funny- I would never be seen calling them a 'pack of shits', but surely the fact that they bribe, threaten, steal and lie are valid complaints? If these are not valid complaints then what are?

    47. Re:Ok, here we go again... by nagora · · Score: 1
      I'm just going to point you at the paragraphs of the findings of fact, I haven't time to go into this in deep detail (I'm working today).

      1.Who did Microsoft bribe? I thought that was the reason they are getting screwed by the government, because they didn't bribe people like various Valley firms.

      Paras 66, 105.

      2. A threat to remove (not reduce) market share is a threat which is as real to the target as a threat to life and limb. paras 106-110, 115-132, 144-174.

      3. This is the easy one - Double Space. There are others but that one is the one we all remember.

      4. They lied to the judge (Jackson) when they said that they would abide by his previous rulings. From the memorandum and order of the DOJ case:

      "Third, Microsoft has proved untrustworthy in the past. In earlier proceedings in which a preliminary injunction was entered, Microsoft's purported compliance with that injunction while it was on appeal was illusory and its explanation disingenuous. If it responds in similar fashion to an injunctive remedy in this case, the earlier the need for enforcement measures becomes apparent the more effective they are likely to be."

      TWW

      --
      "Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
    48. Re:Ok, here we go again... by Oloryn · · Score: 1
      YOu CANNOT equate the laziness of a user with Closed-Source.

      You don't have to. Battening on the laziness of the consumer is a typical marketing tactic nowadays. That's why you get so many offers that go "We'll sign you up now, but there's no obligation - if you decide you don't want it within X days, just call 1-800-xxx-xxxx and cancel" or "We'll sign you up now, and send the full information. If, after looking at the information, you decide you don't want it, just call, etc, etc, etc". Marketers *have* to know that many people won't ever get around to making the call, even though they don't really want the product. This is why I now routinely turn down any such offers. I want my own laziness working *against* my making extraneous purchases, not for it.

      Same goes for mail-in rebates. Attract a sale via the mail-in rebate offer, knowing that many people won't get around to getting together the serial numbers, receipts, etc, required for getting the rebate.

      Microsoft, as a marketing company, has to know this. And strange as it may seem to us, who are likely used to finding things on the net, many consumers, faced with a choice between going down to the store and buying an upgrade vs. hunting the net for a free viewer, will opt for the store. They understand buying things at the store. They're less sure of their knowledge of finding things on the net. Faced with doing something they know, vs having to learn something new, an appreciable number of them will opt for what they know. Given the state of marketing nowadays, I can't imagine a marketing type passing this up.

    49. Re:Ok, here we go again... by treedragon · · Score: 1
      TummyX: Why would a company with the smartest people in the world make life more difficult on themselves by making their own formats hard to read?

      They aren't the smartest. I answered the question on difficulty back in April:
      http://www.treedragon.com/ged/mc/te/officeformats. htm

      David McCusker, former OpenDoc guy

      --
      Values have meaning only against the context of a set of relationships.
    50. Re:Ok, here we go again... by arodrig6 · · Score: 1

      Why would a company with the smartest people in the world...

      Oh here WE go again... do you really think that MS has the smartest people in the world? Or even a large number of them?

      Certainly MS has some brilliant people, but there are a lot of very smart people who work elsewhere - sometimes not even in computers (*gasp*!)

      I doubt that most microsoft employees, or even a signifigant chunck of them are as smart as say, nobel prize winning physicists, or neruosurgeons, or biologists unlocking the secrets of life, or even most CS professors at major univiersities.

      Computers are a big part of today's society, but lets not forget that to to vast vast bulk of the world they are just a tool used to solve problems.

      ask yourself, if you were a brilliant engineer or scientist, which would you rather work on - discoverign the nature of the universe, unlocking the cure for deadly diseases, or advancing a word processing program from version 8.0.1 to 8.1?

      --

      Who am I? Subscribe and find out
    51. Re:Ok, here we go again... by The_Messenger · · Score: 1
      All Linus did was copy a 30yr old OS using well known algorithms and code.
      I'm sorry to say that... you're absolutely correct. Not only Microserfs are trying to wake people up about GNU/Linux, but we UNIX zealots are too. Linus didn't invent anything; he just followed the POSIX spec. Possibly some of the implementations have been revolutionary, but I doubt it.

      The reason that GNU/Linux beats NT in many regards is that, IMNSHO, the UNIX system is superior. In fact, it is so much superior, that even a hack of it such as Linux looks great.

      But GNU/Linux doesn't look quite so awesome compared to my obsession, Solaris. And I don't want to hear all of that "Slow-aris" crap. Dismissing Solaris because of the perceived speed of its UI makes about as much sense as deciding that a 1GHz Pentium III is "better" than a 400MHz UltraSPARC because of the extra clock cycles. As I've said before, GNU/Linux has a better chance of taking the desktop away from Windows, than the server room away from SunOS, AIX, HP-UX, and Irix. I suppose that GNU/Linux is impressive for coming this far with relatively little commercial support, but it's been ten years, for God's sake, and you're still not close to state-of-the-art.

      I won't rehash everything I've said here over the last few months, but I don't think that Uncle Sam is being very smart about Microsoft, and I don't think that NT is Satan. It's not a bad product. No-where near the level of commervial SVR4, but still not a bad product. UNIX and Windows can coexist. And there's even a place for GNU/Linux, if its spokespeople could get it through their heads that they're not revolutionary heros fighting "the man". The only really interesting part of the whole mess is the GPL. <flamebait>Maybe the next GPLed OS will have more mature, reasonable users.</flamebait> (But that system will be developed by old Linuxers, and we'll run into the Second System Effect, and end up with the Windows 95 of the Unix-clone world. Right, TummyX? ;-)

      I'll shut up now. For more ranting, see my previous posts. :-)

      Whether you be troll or not, I hope that you stick around, TummyX, for I find your POV refreshing. You've managed to make it this long without being completely engulfed in Slashdotter flames, so I trust that I needn't worry.

      intellisense in Visual Studio (including autoprompting of C++, Java, HTML, JavaScript, ASP etc).

      Ummmm... heh heh heh... as a member of the Java Lobby, I feel an urge to take issue with a couple of those products mentioned. But you've been flamed enough for today, so I will restrain myself.

      ---------///----------

      --

      --
      I like to watch.

    52. Re:Ok, here we go again... by Oloryn · · Score: 1
      Dr. Hoare's website is here

      My word, what a horrible, almost unreadable page! I refuse to believe Dr. Hoare maintains that page himself. He'd have done a much better job. Actually, the typical luser would have done a much better job, let alone someone as intelligent as Dr. Hoare. The page is an insult to him, and makes you wonder if anyone at Microsoft really appreciates who they have working for them.

    53. Re:Ok, here we go again... by Cassandra · · Score: 1

      Which means that at some point they'll start changing the definition of XML to close out competitors. They've always taken this approach, why do you think they won't this time?

      Perhaps because XML is a meta language?

      Agreed they might try to mess with standards that use XML (like Math ML, and Music ML etc.)

    54. Re:Ok, here we go again... by Oloryn · · Score: 1
      I guess the "smartest people in the world" stuff is a bit misleading. They indeed hired a number of "big names" for the research department. Unfortunately none of them had done anything serious AFTER they've moved to Microsoft. One of the reasons might be that the research department is completely isolated from the rest of the company, and regular employees speak about them as "those idiots and weenies, they don't have release schedules, they never do anything useful". In conclusion: the big names are there to serve as a "prestige facade".

      This helps explain a bit. It's a little disconcerting to find that the author of "The Emperor's Old Clothes" is working for Microsoft. What does one of the prime, classic spokespersons for simplicity in software construction have to do with this bastion of bloatware?

      Although its interesting to note that some of his commentary regarding Ada seems to be singularly appropriate to Windows:

      At first I hoped that such a technically unsound project would collapse but I soon realized it was doomed to success. Almost anything in software can be implemented, sold, and even used given enough determination. There is nothing a mere scientist can say that will stand against the flood of a hundred million dollars. But there is one quality that cannot be purchased in this way -- and that is reliability. The price of reliability is the pursuit of utmost simplicity. It is a price which the very rich find most hard to pay.
    55. Re:Ok, here we go again... by Cassandra · · Score: 1

      Older versions of Office open this files just fine, but non MSFT software inserts a bunch of crap into the document.

      WordPad (the app that is bundled with Windows, and can do most useful things Word does) also fails to open newer RTF-files.

      Actually the RTF business upsets me much more than most evils in Word, because RTF is actually claimed to be a format for compatibility.

    56. Re:Ok, here we go again... by Azure+Khan · · Score: 1

      elbuddha, King of Semantics! :)

      --

      --- I'm going sane in a crazy world.
    57. Re:Ok, here we go again... by zigzag · · Score: 1
      From a marketing standpoint they are pretty evil yes. But from a community standpoint they give more money to charity then just about any other companies out there.

      Big-time drug lords in South America are also known to be very generous in their communities.
    58. Re:Ok, here we go again... by xod · · Score: 1
      I see it there, but does it work? The converter for Word 5.1 MacOS which allegedly converted Word docs from some higher version crashed consistently on my machine, and half the time on others.

      It is in Microsoft's economic interest to release a buggy converter, to encourage upgrades. And that is how Microsoft works.

    59. Re:Ok, here we go again... by raka · · Score: 1

      Hmm, this is probably a bit late, but it needs to be said. DOC isn't going to be very important in a few years anyway, Microsoft are moving to XML based everything. Serialization of com services will be XML based rather binary based as they are today as well.ust don't complain when your documents are 100MB. Mine won't be. You're the one who will be using word.

    60. Re:Ok, here we go again... by Malcontent · · Score: 1

      Maybe they are killers and rapists (I would not put it past them). What he really forgot was perjury, evidence tampering, witness tampering, racateering. These people are felons and belong in jail. In a jsut society they would be there. In our society they get to intall a puppet in the white house.

      --

      War is necrophilia.

    61. Re:Ok, here we go again... by Malcontent · · Score: 1

      So let me get this straight now.
      I own word97, I get a word2K doc. I open it up in word97 and save as a word97 doc right?

      Or were you just being an idiot.

      --

      War is necrophilia.

    62. Re:Ok, here we go again... by LocalH · · Score: 1

      Ah. The Interchange File Format. Off the top of my head, it supports graphics (image, palette, animation, color-cycling, etc - ILBM), audio (remember 8SVX?), and countless other things that I don't know about. Truly a versatile format.

      And from my experience with Turbo Imploder on the Amiga, binaries are indeed hunk-based, although the similarities to IFF are unknown to me. I've watched an executable with 30+ hunks be trimmed down to 5 or less hunks with Imploder.
      _______
      Scott Jones
      Newscast Director / ABC19 WKPT
      Commodore 64 Democoder

      --
      FC Closer
    63. Re:Ok, here we go again... by Tony-A · · Score: 1

      Any truth to the rumor that the real reason Bill Gates is fighting the breakup of Microsoft is that he would lose the ability to have the systems group somehow make the application actually sort of work?

    64. Re:Ok, here we go again... by Holgrave · · Score: 1

      The way you just described it, MS Office sounds like a virus.... ;-)

  5. .DOC by Old+Wolf · · Score: 1

    Um, this is lame. DOC format is specified on MSDN. I remember in a C programming course, learning how to read and display MS Word 6 .DOC files.

    How do you explain the various programs for Linux that all read and MS Word .DOC files perfectly well? For example, Corel's word processor, and all those DOC -> PS convertors.

    This article seems to be just FUD.

    1. Re:.DOC by frank249 · · Score: 3
      Corel does a pretty good job of converting doc files. In fact they have been certified by the American Bar Association as 100% MS Office compatable. They can do that since lawyers use mainly text documents. Conversion problems arise when you have complex documents with graphics, tables charts etc. Corel's conversion is not bad but still requires some minor editing. Lawyers love to receive files in doc format since they can go in and see the previous revisions of offers etc.

      BTW remember when Office 97 came out and could not save to an Office 95 .doc format? It actually saved to RTF but gave a .doc extension. Corel's WP could save to the real Office 95 .doc which made it more MS compatable than MS was.

      Perssonally I think MS is using its illegal monapolistic practices to make calls to secret windows APIs to give it an advantage.

      --

      Today's vices may be tomorrow's virtues.

    2. Re:.DOC by nagora · · Score: 2
      How do you explain the various programs for Linux that all read and MS Word .DOC files perfectly well?

      As a figment of your imagination. I've tried them all and they all fail almost as soon as you leave the area of text only docs. In fact none of them print even text only docs well enough for professional use.

      TWW

      --
      "Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
    3. Re:.DOC by emir · · Score: 2

      hmm corels wordprocessor (wordperfect) wont read .doc's perfectly. sun's staroffice is even worse on importing .doc's.

      they are good at importing simple .doc files (.doc's mainly consisting of text) but are pretty bad at importing .docs that have images, tables and other stuff...

      --
      -- http://electronicintifada.net --
    4. Re:.DOC by Old+Wolf · · Score: 1

      Take for example a Word document that has an Excel graph embedded in it via OLE. This is great from a user's point of view. But if your WP is not equipped to read Excel objects then you will have a problem. The trouble is NOT with the Word part of it, it is with an embedded object.
      One single .DOC (or any other file for that matter) can contain all sorts of objects of other sorts embedded via OLE. All of the specifications and interfaces for this is published. It's just a lot of work for a package to handle embedded objects it wasn't designed for.

    5. Re:.DOC by emir · · Score: 1

      its not just embedded stuff in .docs that get fucked up when you try to read them in wordperfect/staroffice, even simple formating of the text gets screwed up from time to time.

      >But if your WP is not equipped to read Excel
      wordperfect suite has support for excel files but still excel graphs embeded into .doc's get screwed.

      >All of the specifications and interfaces for this is published
      there is a lots of documentation on .doc's but there are no real specs for .doc's. there is a huge difference between specifications and documentation.

      --
      -- http://electronicintifada.net --
    6. Re:.DOC by sqlrob · · Score: 1
      Why are you assuming that Word understands Excel files? It does not have to.

      THAT IS THE WHOLE POINT OF OLE. Know the protocol, you can use objects from any app (well, theoretically) that also understands the protocol.

      Of course, that means you need Excel to edit the files. There is a WMF stream in the file, so you can view/print it even if you don't have the app.

    7. Re:.DOC by DunLurkin · · Score: 3

      Let's not lose sight of the real goal here: that .DOC will become a quaint historical curiousity as Open Source file formats become the standard! Do your part by NEVER using MS's proprietary file formats. Even if you use MS at work, save your files as .RTF and advise your less-hip coworkers to do so as well. (I would say save as .HTM, except that Word produces EXTREMELY ugly HTML).

      --

      I am very much afraid that we live in interesting times.

    8. Re:.DOC by nagora · · Score: 1
      Is it Microsoft's fault that Linux word processors can't print text only documents well enough for professional use?

      Oh, try reading the posts before acting like a dick. Any system which can run TeX is better than anything M$ can produce for professional text only documents. With pstricks it's also better for everything else.

      I said that the Linux DOC file readers were not up to the job, not that Linux as a whole is incapable of professional documents.

      blah blah blah, drivel,as smoothly and productively as a Windows 3.1 app.

      The biggest waste of my time in the office is helping the Windows users get out of trouble with their crappy apps. The rest of the time I'm producing code and documents that I just couldn't be arsed trying to do under Windows. Some of us like an OS which only gets rebooted when new hardware is put in.

      Linux was basically a bunch of people mireing around in the past...

      Actually I agree, but the other option is to mire around in shit.

      TWW

      --
      "Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
    9. Re:.DOC by Anonymous Coward · · Score: 2

      I've reverse engineered a number of Microsoft file formats.

      Several versions of the .DOC file format were only available by signing an NDA. The 97 format was released publicly, but the latest releases of the .DOC format have not been documented.

      I was somewhat involved in the reverse engineering of one of the .DOC formats when I was reverse engineering the .HLP file format. The person doing the .DOC format believed there would be some similarities in the two, so we worked on them together.

      It turned out that there were some very small similarities, but not enough to be very helpful to us.

      Reverse engineering a .DOC file would be fairly easy. It's also incredibly tedious.

      The best way to do it is to start with small files: Start with a file with 1 letter, then two letters, then three.

      Then make one of the letters bold, then make one italic, then make one bold italics. Then put each letter in a cell in a table, and so on ad infinitum.

      Between each step, do a hex dump and compare the files. Eventually every thing starts to fall into place.

      After that's done, then write a converter or dumper for a .DOC file. Then start testing that on a bunch of different .DOC files until you find files that break it. Look to see what's different about those files, fix you code, and repeat, again, ad infinitum.

      Depending on how diligent you are, you can probably get 99% of it.

      Personally, I've done about all the reverse-engineering that I want to do, so I'm not going to do it, but if someone wants to follow these instructions, it's probably the easiest way to go. Also, I'd keep the Word 97 specs handy so you can see any similarities that have been carried over from that version to the latest.

      Good luck.

    10. Re:.DOC by Old+Wolf · · Score: 1

      Glad you agree, even if I didn't express myself well. I guess the obvious question now is: is the WMF specification public? Again, there are several non-MS tools that deal in WMF files.

    11. Re:.DOC by sqlrob · · Score: 1
      The WMF format is detailed in MSDN. However, it does look like it would be a PITA to port to other platforms. It basically is a list of the calls to the GDI. The documentation is available here

      Again, as with the documentation for the .doc file, this may not be complete enough to implement something that does read a metafile in a non-Windows system

    12. Re:.DOC by Anonymous Coward · · Score: 1

      Actually, there's a program you can use for viewing BOTH AutoCAD and Word files... and a number of other formats. (www.autovue.com)

      Autovue does a decent job of displaying. The catch is, you can't *write* any of these formats. In the case of AutoCAD stuff this makes sense, but the funny thing is in terms of Word formats, it's a lot easier to *write* the format than to read it. That's because in order to read the format and display it properly, you have to render properly every nuance and combination, whereas when you write you can simply limit the writing capabilities to what you understand.

    13. Re:.DOC by biohazard99 · · Score: 1

      Getting rid of .doc by not using it is the best long-term options. Write your own html 4.x (loose with embeded styles), and you get single file, ascii plain text = minute sizes, just embed your images and sounds.

      Even powerpoint could be x(html|ml)ized using smil, inline frames/page refreshes, or MNG very soon

      The W3C browser/editor amaya handles this task fairly well, but there is no reason any wysisyg html editor couldn't perform this task.

      Praying for the day when NS6/Mozila is fast enough to be usable for this sort of task

      P.S. Is anyone else noticing that the combination of Win98SE, IE 5.5b, and checking your hotmail ocasionaly/randomly requires a browser restart to jump to a new site after you are done checking your mail. I think it is related to msn messenger but even if i enter my gaming windows profile (ctrl-alt-del only shows explorer) nothing in systray), the error still occurs

  6. A Site About File Formats by ekmo · · Score: 5
    --

    | Ceci n'est pas une pipe.
    1. Re:A Site About File Formats by Tha+Pope · · Score: 1

      Wotsit is good, here's another:
      http://www.halyava.ru/document/ind_form .htm

      it's russian based, but much info is in english.

  7. DMCA? by MostlyHarmless · · Score: 1

    Wouldn't this be made illegal under the DMCA? After all, we can only hack the .doc format by circumventing its encryption scheme.

    OK, OK, you are going to say that .doc's aren't encrypted. But even their encoding scheme could be regarded as a form of data hiding.

    Oh, I forgot one minor detail. The government is in bed with the movie industry, not the software industry. So it's ok to bypass the encryption on anything except mp3s and dvd.

    nuclear cia fbi spy password code encrypt president bomb

    --
    Friends don't let friends misuse the subjunctive.
    1. Re:DMCA? by doogieh · · Score: 1
      DMCA? Yes and no. If .doc format has -any- feature which purports to give copy protection to whatever file it is holding, then (at least according to what the MPAA is saying in the DeCSS case) the DMCA anti-circumvention provisions apply.

      Similarly, they could XOR-obfuscate the released code, and any attempt at REing the .doc format would be considered a violation of the DMCA.

      Of course, MS would never do that.

    2. Re:DMCA? by MostlyHarmless · · Score: 1

      Of course they wouldn't, and I doubt they could prove that .doc is copy protected. It's just that it's early in the morning, I have a stomach ache, and the cynicsm is really starting to get me.

      But that brings up a question that is perhaps a little more serious. What about unintential obfuscation? If XOR encryption is considered valid, what about a file format that's so obfucsated that it is nearly impossible to reverse-engineer it? Would that be considered copy protection, even if the intent was not to obfuscate?

      It makes you think.


      nuclear cia fbi spy password code encrypt president bomb

      --
      Friends don't let friends misuse the subjunctive.
  8. Two responses predicted by Amphigory · · Score: 2
    Response1: The .doc file format isn't proprietary, it's on the TechNet CD!

    Response2: Yeah! Let's just do it!

    This question misses the whole point. The problem (from following the AbiWord list for a while) is not that the .doc file format needs to be reverse engineered, it's that the format is such a piece of crap that you can't implement the spec.

    Basically, you have to emulate all of Word's bugs in handling it's own file format to get the expected results. And trying to copy 65,000 bugs is non-trivial. :)

    --

    --
    -- Slashdot sucks.
    1. Re:Two responses predicted by TummyX · · Score: 3


      Basically, you have to emulate all of Word's bugs in handling it's own file format to get the expected results. And trying to copy 65,000 bugs is non-trivial. :)


      Care to point out these 65000 bugs that relate to DOC formats?

    2. Re:Two responses predicted by Tha+Pope · · Score: 1

      I can't be sure, seeing as how I have no sense of humor, but I think that line may have been a joke.

      Maybe.

    3. Re:Two responses predicted by TummyX · · Score: 1

      Seeing as that was the only real point made in that post and the post got moderated to 3 insightful, I expected it not to be a joke.

    4. Re:Two responses predicted by Linux+Freak · · Score: 1

      My *GOD* man, why don't you just come out and admit which department in M$ you work for?

      (I would have thought your astroturf trainer would have taught you to be a little more subtle).

      Bye bye karma. (And yet this moron gets some "Insightful" mods. Wait until meta-moderation.)

    5. Re:Two responses predicted by (void*) · · Score: 2
      Well I remember trying to write a 6 page research paper during my college days using Word 6.0. I spent a lot of time tweaking the format, making sur that it would stay on 6 pages and not more. Then I brought that file to school to print, and when Word 7.0 opened it, BINGO, all the formatting was destroyed and it now took 6 pages and 2-3 lines! I tweaked it there and when it got sent to the HP laserjet, it came out as 6 pages and 2-3 lines again! Truly MS word is WYSIWYG!

      Can you please explain why MS can read the document graphics, but can't maintain format consistency? They seem to have improved a lot in this regard, but so what? All the other guys trying to write a compatible editor are exactly in the position MS itself was a few years ago.

      The point is that MS's .doc is a joke specification, if it ever was at all. Sure you could read the files, but the specification is NOT COMPLETE. That is why many people are having a hard time converting.

    6. Re:Two responses predicted by Penrif · · Score: 1

      Right, and that's what AbiWord is trying to do, write an XML converter. However, the problem has been (if I remember right from stalking AbiWord-dev) that in order to use the format of a Word document, you not only have to have similar features to Word, but you've got to have the same problems as Word. So, it would seem, in order to use Word format correctly, you need to be Word.

    7. Re:Two responses predicted by jacobm · · Score: 3

      Actually, I think that a post along the lines of:

      "Those things that you think of as bugs? Those are not bugs. They are actually hot grits. Which are in my pants."

      would have been considerably less lame than the actual post made. Just my two cents.
      --
      -jacob

      --
      -jacob
    8. Re:Two responses predicted by TummyX · · Score: 1

      Maybe I should have asked for only say 5 examples?

      The number was obviously supposed to be funny (W2K, 65K bugs and all), that much was obvious.

    9. Re:Two responses predicted by The_Messenger · · Score: 1
      The whole child molester thing is just in bad taste, even for shitty trolls.
      ...tell that to Jon Katz. =)

      Oh wow, I gotta do a repost soon.

      ---------///----------

      --

      --
      I like to watch.

    10. Re:Two responses predicted by spectecjr · · Score: 3

      Well I remember trying to write a 6 page research paper during my college days using Word 6.0. I spent a lot of time tweaking the format, making sur that it would stay on 6 pages and not more. Then I brought that file to school to print, and when Word 7.0 opened it, BINGO, all the formatting was destroyed and it now took 6 pages and 2-3 lines! I tweaked it there and when it got sent to the HP laserjet, it came out as 6 pages and 2-3 lines again! Truly MS word is WYSIWYG!
      Can you please explain why MS can read the document graphics, but can't maintain format consistency? They seem to have improved a lot in this regard, but so what? All the other guys trying to write a compatible editor are exactly in the position MS itself was a few years ago.

      The point is that MS's .doc is a joke specification, if it ever was at all. Sure you could read the files, but the specification is NOT COMPLETE. That is why many people are having a hard time converting.


      This is because Word 6.0 rendered according to screen metrics, Word 7.0 rendered according to printer metrics for better quality output at low font sizes, and Word 8.0 now renders according to *font design* metrics, which means that while it'll look reasonably like what you get on the printer, and obey margins, it will squish the fonts a pixel or two together at times to get the best fit.

      It's nothing to do with the .DOC format at all - it's all to do with the render layer.

      Simon

      --
      Coming soon - pyrogyra
    11. Re:Two responses predicted by Malcontent · · Score: 1

      Well no because you are refering to human beings and MS is not a human being. The bugs reference was about W2K which is also not a human being.

      Apparently you need to go to biology classes again.

      --

      War is necrophilia.

    12. Re:Two responses predicted by Baki · · Score: 1

      If everyone just used postscript, all rendering would be the same and there would be no such problems.

    13. Re:Two responses predicted by (void*) · · Score: 2
      So there is nothing in the specification about whether it should render according to screen, printer or font design metrics? The specification is thus INCOMPLETE, since users rely upon it to paginate, so that they can submit under _publisher_ standards of page counts. If the software gives users the illusion (WYSIWYG) that this can be done, when it cannot, it is an INCOMPLETE, and BUGGY implementation.

      Because of this I maintain that MS has a joke of a document specification.

    14. Re:Two responses predicted by jafac · · Score: 1

      duh!

      If it ain't broke, fix it 'til it is!

      --

      These are my friends, See how they glisten. See this one shine, how he smiles in the light.
  9. "Documented" part of format... by Alex+Belits · · Score: 2
    ...is just an umbrella to store the data that should be fed to never documented code that actually produces the layout. "XML-based" formats won't change it -- as long as no one knows how to display the formatted document, it's as good as never documented. A lot of programs can parse .doc, extract text, imitate Word formatting, etc., but since there is no precise description, what should be done to display the document (other than "just use Word itself", what works on Windows over COM and is touted as "openness" of Word by TummyX and other Microsoft supporters here), it can be only approximated unless it will be possible to force Microsoft to either write specs for that code (I am sure, specs never were written, because if they were, at least backward compatibility wouldn't break in every Office release), or, failing this, put the rendering code into the public domain.

    When I used StarOffice I have seen horribly broken formatting that was magically cured when I have installed Microsoft/Monotype fonts into my Linux box with StarOffice. This suggests that Word formatting is very inflexible regarding changing parameters of the media (as opposed to, say, TeX that will adapt to any size of anything as long as it makes sense), and every slight difference in algotithms (never documented ones, not "packaging") can cause horrendous miscalculation of the formatting.

    --
    Contrary to the popular belief, there indeed is no God.
    1. Re:"Documented" part of format... by sporkboy · · Score: 1

      just like HTML doesn't specify the display of its contents, and just offer suggestions. How is this any different?

    2. Re:"Documented" part of format... by Alex+Belits · · Score: 2

      HTML is never designed to fit some formatting "unit" pixel-to-pixel (or point-to-point) into piece of formatting that is specified by other formatting "unit". If font size is different in different clients, HTML renderer must format everything according to the font sizes available on the client, and relations between tables, paragraphs, images and page width won't change in any significant way. Not so with Word -- in Word formatting is based on the sizes of elements, and it breaks horribly if even one of them is not the same as it was expected when document was written. Since the procedures that generate formats are not documented, programming turns into a constant struggle for generating everything in precisely the same way as Word would do, or formatting breaks.

      --
      Contrary to the popular belief, there indeed is no God.
  10. hmm by int · · Score: 1

    Might become a problem when companies start patenting file formats, like this ASF patent

  11. LAOLA is GPL by Money__ · · Score: 1

    LAOLA looks like a good solution.
    ___

  12. DataViz has been doing this for years by kriegsman · · Score: 1
    DataViz has been doing this for years. They have reverse engineered hundreds of file formats and they sell stand-alone and integrated document converter software. The Windows product is ConversionsPlus and the Mac version is called MacLinkPlus. I have found that the translators are easy to use and work extremely well.

    Apple used to bundle MacLinkPlus with MacOS, so any Mac user could open any file from any program -- PC or Mac. (I used to annoy PC users by using my Mac PowerBook to translate files for them that they couldn't open, from programs that they didn't have and that weren't even available for the Mac, e.g., Lotus AmiPro. The stuff works.) Apple doesn't bundle it any more (?!) for their own inscrutable reasons.

    There is no Linux version (yet) of DataViz's translator package, but they do offer translation packages for Palm users, so there's some indication that they're open to addressing "non-traditional" platforms if they see a market. I have hope.

    1. Re:DataViz has been doing this for years by Drashcan · · Score: 1
      >Apple doesn't bundle it any more (?!) for their own inscrutable reasons.

      Wanna know why? Because MS ported Office to the MacOS. And every penny of extra revenue counts (especially if you (=Jobs) are offered a jet as fringe benefit) so you are forced to bundle only the stuff which is really of utmost importance or for which they pay you an immoral amount of money.

      The dangerously sick tarantula

      --
      The nice thing about Windows is: it does not just crash; it displays a nice little dialog box and let's you press 'OK'
    2. Re:DataViz has been doing this for years by IntlHarvester · · Score: 1

      DataViz allowed Apple to ship their converters in an attempt to get more users on their upgrade treadmill.

      It didn't work, and in fact less people paid for their software because it was free with the OS.

      Face it -- being able to open Ami or WPG files on the Mac is a extremely small nitch market. As a Mac user, I've got worse problems (like these damn FreeHand 3 files I made long ago, now unopenable by anything except FreeHand 4. Meanwhile, FreeHand 9 is shipping.)
      --

      --
      Business. Numbers. Money. People. Computer World.
  13. The Problem is MS's doc by (void*) · · Score: 1
    is that there are no real specifications for it! Sure there is documentation. But a documentation is not a specification. Documentation is the label on your VCR that says record, play, fast-forward, etc. But specification is more than that. Specifications detail the size of the VCR. It says that there should a readonly tab that the VCR should respect if it tries a record, etc.

    The problem with MS Word is that the way to see how a command or comment would work is to try it on the screen. Will inserting this graphic cause the rest on the page to lose alignment? Not sure? Try it out!

    This is fine for people using the software, but a nightmare for other people trying to write compatible software. Try it! Take out your old copy of MS 5.0 and write a fairly complex document (a 5-6 page research paper with graphs and annotation is a good example). Take that and use MS 6.0 to read it. Even MS themselves can't maintain consistency of conversion. That's becuase they basically made the document format up as they went along - no formal software engineering specs were ever written. If they were, then they obviously weren't detailed enough.

    Contrast that to TeX. You don't have to copy a single line of TeX source to create your own teX compiler. All you have to do is to examine the picky formatting tests, and ensure that you write your code to reproduce the desired tests. And the specs were designed sensibly, if a little idiosyncratically.

    The binary format of the .doc file is hardly the issue!!

    1. Re:The Problem is MS's doc by Schnedt+McWapt · · Score: 1

      Hell no you can't!

      Ever heard of EBCDIC?

      Ever heard of BAUDOT?

    2. Re:The Problem is MS's doc by Oloryn · · Score: 1
      no formal software engineering specs were ever written.

      Last I knew, Microsoft doesn't do formal software engineering specs. Bill thinks they're a waste of time. "No design documentation but the code itself" were the operative words, IIRC.

  14. Word for Lawyers by Anonymous Coward · · Score: 1

    Microsoft Word is made for lawyers. Try writing anything scientific with it and you're screwed. LaTeX is still the way to go.

    1. Re:Word for Lawyers by TummyX · · Score: 1

      Why would you be 'screwed'?

      Every tried using Microsoft Equation? (it comes with Word/Office).

    2. Re:Word for Lawyers by frank249 · · Score: 1
      There is a reason why so many lawyers use WordPerfect instead of MS Word. Word does not do the things they need. For example, Word's word count does not include footnotes in the count. Since certain submissions to courts have a maximum word count which include footnotes, they have to have a clerk manually count the words to make sure it does not exceed the maximum length. Some courts refuse to accept briefs created in MS Word due to this problem.

      Having different versions and revisions included in a file not only makes for a bloated file size but could also give the opposition insight that you do not want to give them.

      There is one thing that WordPerfect does superbly is creat tables of authourities which word sucks at.

      The DoJ just bought 55,000 seats of WordPerfect and even the ruling against MS was created in WordPerfect.

      --

      Today's vices may be tomorrow's virtues.

    3. Re:Word for Lawyers by Devil_Dog · · Score: 1

      You must not have noticed the checkbox right in the Word Count dialog that says "Include footnotes"? Amazing.


      Someday I'll make devildog.org into something.

      --

      Someday I'll make

    4. Re:Word for Lawyers by frank249 · · Score: 1

      True it includes footnotes if you are counting the whole document but for legal briefs you do not count certain portions. In WordPerfect you can simply block off the main body and use the word count. In word you do not have the option to include footnotes when counting words in a selected block.

      --

      Today's vices may be tomorrow's virtues.

    5. Re:Word for Lawyers by TummyX · · Score: 1

      I don't think the incomptabilities is really Word's fault.

      Like the new version of the equation editor must be able to load older versions' data streams, but isn't equiped to save as the older version.

      So word is helpless, it can only ask the EE to save, and the EE will save as the new data format...which can't be read by the old version.

  15. Why Corel, Lotus, et al haven't banded together... by miniver · · Score: 2

    The answer for why the big office suite vendors haven't banded together in the same manner as the OpenDWG Alliance seems pretty self-evident to me. I'm sure that each of these software manufacturers have at one time or another signed an NDA with regards to the MS Office file formats. Once they did that, they were precluded from sharing that information amongst themselves. End of question.

    As for why they signed those NDAs? Again self-evident: early access. If Corel or Lotus wanted to be able to support the new file formats in a timely fashion, they need to know what the spec is well in advance -- TechNet doesn't get that sort of new information fast enough. For that matter, when you subscribe to TechNet, you're signing a limited NDA with Microsoft; I'd check the fine print before I depended upon TechNet information...


    Are you moderating this down because you disagree with it,
    --
    We call it art because we have names for the things we understand.
  16. the real opendwg url... by T.Hobbes · · Score: 1

    ... is http://www.opendwg.org, not http:///www.opendwg.org as is given in the article.
    there should be a policy of the minimum number of cups of coffee the poster has to drink before posting...

  17. On The Subject Of AutoCAD Compatibility by Anonymous Coward · · Score: 2
    This is not meant to be a flame. Really. And it's kind of off-topic. Or maybe not. (Let the "moderators" be the judges.)

    For my purposes, and the purposes of the company for which I work: what good reverse-engineered DWG file formats if you still can't get a good, affordable CAD package on anything but Ms-Win? My company is presently standardizing on applications. And (I'm sure MS would be overjoyed to hear this) it looks like the Unix boxen are on their way out. Why? One of the reasons is AutoCrap. It's available only on Ms-Win. Our customers and vendors demand files in AutoCrap format. There are no price-competitive CAD packages available for Unix anymore. (Bentley has dropped support for MicroStation on Unix--in case you didn't know. Note to Bentley: you screwed up! By dropping MicroStation for Unix you removed any incentive for us to consider your product.) So bye-bye to our reliable, low-TOC Unix workstations and X-terminals :-(.

    So even though we're evaluating StarOffice to use instead of MS Office, and even though we're evaluating non-MS email clients and other non-MS client apps: even if these pan out the Unix environment is still probably doomed because of AutoCrap :-(. (Then there's Visio and other stuff.)

    IMO many vendors, by not making their apps available on non-MS platforms, are missing the boat by failing to differentiate themselves from the run-of-the-mill "Me too! I do Microsoft" crowd. With things happening like the surge of interest in Linux as potentially a viable workstation platform, Solaris for free and Sun hardware getting quite affordable: this seems to me to be narrow-minded. Particularly wrt to vendors like Bentley--who already had Unix versions of their products.

    Sigh...

    1. Re:On The Subject Of AutoCAD Compatibility by marfil · · Score: 1

      Certainly it's off topic - but the topic was pretty lame in anycase. However I agree with you - Bentley have messed up bigtime. MicroStation on a Unix box beats AutoCAD on WinNT hands down - what on Earth were they thinking? Still there's time for them to resurrect themselves - OpenSource it and make money from support and the add-on packages like 3D rendering Artlantis thingy. Get with it Bentley. Well off topic!

    2. Re:On The Subject Of AutoCAD Compatibility by Arker · · Score: 2

      Try LinuxCAD . At $99.00 it's quite competitive with AutoCAD for Windows.

      If you only do 2d work you can get off even cheaper with QCAD

      --
      =-=-=-=-=-=-=-=-=-=-=-=-=-=-
      Friends don't let friends enable ecmascript.
    3. Re:On The Subject Of AutoCAD Compatibility by malfunct · · Score: 1

      The other side of this coin is that it costs A LOT of money for a company to port software. Combine that with the fact that the market for non windows apps is next to nothing and you see the big problem. No the problem isn't that microsoft is evil. The problem is seated in basic economics. Its way more efficient to have one operating system and one set of software that everyone can use (from an economic standpoint) than to have lots of applictations that are all different. I don't know how to solve this, or if its even worth solving. The only thing that would be nice is having a good way to force the "status quo" company to fix its mistakes. Basically the economy of software sux to try and be different in even when the different products are better. Fix that and you will rule the world.

      --

      "You can now flame me, I am full of love,"

    4. Re:On The Subject Of AutoCAD Compatibility by L0-Tek95 · · Score: 1

      I've been around almost all of the CAD products on the market. My company uses 3d intensive apps (3D Studio-max, AutoCAD 2000)and also in the process of standardizing Applications. I've been working with Alias WaveFront's Maya and Studio Tools recently even though my background is mostly architectural. The DWG format will be required by any potential Engineering Software seeking the mid-market level (below the cost of Pro/E) that AutoCAD reigns. I've probably generated 30,000 dwg's since I started on CAD and they are staying at the ver.14 format!! r.2000 has nice utilities but the majority of my vendors can't handle the new format. I'm running Linux (RedHat)at home and have tried various Engineering apps with limited success. Is anyone out there working on an open sourced Linux based CAD project??? We have to be able to come up with something better! AutoCAD has the standard, Alias has the GUI...

  18. About OpenDWG by BoLean · · Score: 3
    OpenDWG started out when a competitor, Vivio stopped trying to make a competitor for AutoCAD called Intellicad and instead suddely quit and handed over the source code to what they had accomplished to the OpenDWG Alliance. Now the source code to IntelliCAD is essentially free (but restricted). It tends to be very bug prone but is getting better. Several proprietary DLL's are needed for it to render and function fully.

    As far as reverse engineering the file format, its all but impossible. Now that UCITA is here it will get even tougher. I just hope AutoCAD knows to not shooting itself in the foot by suing its own users. If the peoblem ever amounted to a threat to AutoCAD's market share there would probably be quite a backlash.

    1. Re:About OpenDWG by __aasmho4525 · · Score: 1

      just fyi:

      i've tried almost all the opendwg-based software and compared it to vdraft, and found all of the opendwg based decoders to be quite inadequate...

      the folks at vdraft did a much more capable job at the reverse engineering (imho).

      just my 0.02

      Peter

    2. Re:About OpenDWG by BoLean · · Score: 2

      I was gonna mention that but its kind of redundent. MS owns everything including 10% of the company I work for.

    3. Re:About OpenDWG by driehuis · · Score: 1
      Not to start another license flamewar, but a cursory glance at the OpenDWG website seems to tell me that source code to their library is not available. This makes the relevance of OpenDWG to the Linux/BSD community doubtful.

      From the tone of the website, I think they'd be willing to work this out on favorable terms, but it does not seem to bring us closer to having a decent CAD package for the home user in the short run.

      --

      Bert Driehuis -- All I asked was a friggin' rotatin' chair. Throw me a bone here, people.

    4. Re:About OpenDWG by driehuis · · Score: 1
      Okay, so I should have said advanced home user. :-)

      The only thing that comes close to being useful for, say, laying out your new bathroom on Linux/BSD is qcad. It suffers from a flakey DXF import/export facility, and as such would greatly benefit from the OpenDWG stuff.

      --

      Bert Driehuis -- All I asked was a friggin' rotatin' chair. Throw me a bone here, people.

  19. For those interested... by TummyX · · Score: 1

    Wordpad supports Word 97/2000 files, and so does MFC.

    The source code for MFC comes with Visual Studio and the Windows Platform SDK.

    You can also download the complete source code for Wordpad from MSDN (Under sample applications).

    1. Re:For those interested... by martin-k · · Score: 2
      This still does not give you a .DOC converter.

      WordPad calls upon the text import filters installed by Windows and Microsoft Office to convert .DOC files to RTF and then reads the RTF file.

      -Martin

    2. Re:For those interested... by Arker · · Score: 1

      Wordpad supports Word 97/2000 files

      Hrmm, nope, not on my 'puter it doesn't. Believe me, that's the first thing I try when I get a .doc file from someone I don't feel comfortable directing to resend the data in a standard format. It does work sometimes - other times I have to hit WvWare home page to get a readable translation.

      Now, since it's obvious from your other post you are this weeks official M$ apologist, and since I am in fact a paying M$ customer (getting closer and closer every day to ending my 10 years as such) - explain this for me. If M$ truly cares about producing software which serves the customers needs instead of just creating lock-in, WHY do they continue with this ridiculous .doc format in all of it's endless change-it-just-enough-to-break-the-converters versions, instead of switching to an open format like TeX?

      I remember when M$ at least felt a need to pretend to care about the needs of their customers - those days seem to be long gone now.

      --
      =-=-=-=-=-=-=-=-=-=-=-=-=-=-
      Friends don't let friends enable ecmascript.
    3. Re:For those interested... by IntlHarvester · · Score: 1

      One thing you might want to try is www.thinkfree.com. They have a Linux-compatible Java clone of MS Word that did a pretty good job converting my files.
      --

      --
      Business. Numbers. Money. People. Computer World.
    4. Re:For those interested... by TummyX · · Score: 1


      WHY do they continue with this ridiculous .doc format in all of it's endless change-it-just-enough-to-break-the-converters versions, instead of switching to an open format like TeX?


      Because they think that a modern extensible format like XML is more sensible?

      And behold! Office 2000 supports XML.

      And the output even looks good in mozilla.

    5. Re:For those interested... by martin-k · · Score: 1
      >> Wordpad supports Word 97/2000 files

      It sometimes does. With Windows 95/98/NT out of the box, WordPad only supports Winword 6.0.

      As soon as you put Microsoft Office on your computer, the setup program installs all kinds of import and export filters. And, voilà, THEN they are available in WordPad as well.

      -Martin

    6. Re:For those interested... by rhinoX · · Score: 1

      Since when does the MFC source code come with the MSDN docs? I have well over 1gb of this MSDN CRAP installed on my hard drive, yet still can't get a decent answer as to why the IWebbrowser control is SUCH A FUCKING USELESS PIECE OF SHIT. Let alone how it's written.

      --
      The copper bosses killed you, Joe. 'I never died', said he.
  20. Oh, sure, it's "documented" and "open" by 1010011010 · · Score: 4

    All you have to do is implement large portions of Windows, COM and Windows Apps to make it work. It uses OLE Structured Storage. OLE (COM/ActiveX) is a Windows thing. To make OLE Structured Storage work on other OSes, you have to make COM available, and use it to read and write the doc. Microsoft did this for the Macintosh, for example.

    So, to properly read and write .doc files, you either have to:

    1) run Windows and Word
    2) run MacOS and Word
    3) port COM to anither OS and write a Word-alike

    Yummy. Anyone written COM for Linux lately? TummyX's "it's open, it's open, stop whining" aside, .DOC is not open because the technology it depends on is not open. I'm sure the fellow who wrote a Word viewer in his C programming course did it on Windows, where COM and other Windows APIs are available.
    If he did it on Unix or BeOS or something, he should speak up.

    Open file formats are important for interoperability and choice. Non-open ones are important for limiting choice and maintaining control. Knowledge shared is power lost, as Aleister Crowley said.

    --
    Napster-to-go says "Fill and refill your compatible MP3 player", which is a lie. It's not MP3. It's WMA with DRM.
    1. Re:Oh, sure, it's "documented" and "open" by TummyX · · Score: 1

      Do you even know what COM is?

      You can still extract stuff out of DOC files without any COM apis. You only need COM when you want to display embedded COM objects.
      COM support wouldn't make displaying DOCs with COM objects perfectly easy either, since you'd need to support the COM objects that the DOC file contains. Guess what COM objects use? That's right Win32.

      BTW, COM is available on Linux and Unix. There are heaps of ports available (all commercial).

      There's a clone of COM done by some guys working on this project called mozilla too.

      But then, it comes down to the fact that you need Windows to display a Windows format.

      If you want to view your word generated document on something other than windows convert it to something like postscript.

      I usually insert visio diagrams in my word documents, and i certainly don't expect to be able to edit those diagrams when i open it up at university with staroffice.

    2. Re:Oh, sure, it's "documented" and "open" by Scriven · · Score: 1
      But then, it comes down to the fact that you need Windows to display a Windows format.

      I think the whole point to this discussion, twit, is that it's a FREAKING WORD PROCESSOR, writing "Dear Grandma" letters. WHY is the format that it uses tied to the operating system? WTF does the .DOC format give a flying f! that you're on Windows?

      Since you have admitted as such, in the above, and since you can't HELP but admit that Windows itself is closed, and since there's been a legal decision that states that it's undocumented and uncompetively proprietary, then you MUST come to the conclusion that the .DOC format is Itself closed, since it depends on closed things.

      This opinion has been stated in this thread numerous times, and each time you have jumped up to defend it. Since you just admitted that it's a Windows file format, NOT a documentation file format, how now can you defend it as open?

      Go back to the campus and get more mind-control done, it's obviously not working on you.


      This is my .sig. It isn't very big.
      --
      This is my .sig. It isn't very big.
      --An Oldie, but a Goodie!
    3. Re:Oh, sure, it's "documented" and "open" by TummyX · · Score: 1

      yeah, that's the problem with these conversion utilities.

      DOC is more than just a format for storing text. It's almost like an executable now. It embedds data streams from embedded objects (they are almost always os specific).

      If you have a simple document that just contains a letter with simple tables and images, converting isn't a problem (at least, i haven't found it to be).

    4. Re:Oh, sure, it's "documented" and "open" by 1010011010 · · Score: 4
      TummyX wrote:

      Do you even know what COM is?

      Yes. It's Microsoft's Component Object Model. A formalized descendant of Object Linking and Embedding, which was originally a method of making compond documents with Word and Excel.

      .DOC is an OLE Structured Storage format which can store data streams meant for other programs, like Visio. Those programs also do not have open formats.

      The practice of passing around Word documents in Email because "everyone must be able to read them, right?" is a problem. If someone sends you a document in their favorite proprietary format, you should send them back a document in your favorite proprietary format. Maybe them people will start to understand the need for open, well-documented formats.


      I usually insert visio diagrams in my word documents, and i certainly don't expect to be able to edit those diagrams when i open it up at university with staroffice.


      And isn't that a tragedy.

      --
      Napster-to-go says "Fill and refill your compatible MP3 player", which is a lie. It's not MP3. It's WMA with DRM.
    5. Re:Oh, sure, it's "documented" and "open" by earache · · Score: 1
      If you used your brain a little you would understand that the only thing that ties it specifically to windows is if you embed an OLE object into the document.

      For a document that is purely text, there isn't any reason why you couldn't write a reader for it for any other OS.

      People seemed confused by the term OLE Structured Storage, and seeing OLE they assume that somehow it requires OLE. OLE is the interface you use to build the files, and there are a means of embedding data regarding OLE objects in the file as well, however most documents don't use this feature very often.

      OLE Structured Storage is an API for creating documents that mimic a mini file system. You can have directories, seperate file streams, etc. all in one file and treat the file as if it were a big empty harddrive. It's actually a very convenient format for writing complex data in binary form.

      And, of course, Microsoft could give a shit if your on any other platform then Windows, I don't understand why they would. It's natural a commercial entity would favor their own platform over another's.

      That being said, I doubt it would be a major chore to reverse engineer the file format at it's highest level. If I'm not mistaking, there are a couple of libraries for Borland Delphi that reimplement Structured Storage without using Microsoft's API.

    6. Re:Oh, sure, it's "documented" and "open" by sheldon · · Score: 1

      I heard that when Bill Gates of Microsoft receives a file in .PDF format he sends back a polite request which says, "Please just use the industry standard Microsoft Word format."

      I heard that when Steve Jobs at Apple receives a file in .PDF format he has one of his underlings send back a polite request which says, "Can you please send this to me as a MacDraw bitmap... Steve only likes reading books with lot's of pictures."

      and when Bill Joy at Sun Microsystems receives a file in .PDF format he sends back a polite request which says, "Can you send this to me as ASCII text, otherwise I can't read it with vi."

      Didn't you stop and think that the CEO of Adobe might just be trying to plug his own product?

    7. Re:Oh, sure, it's "documented" and "open" by Malcontent · · Score: 1

      "Didn't you stop and think that the CEO of Adobe might just be trying to plug his own product? "

      Well sure but it's still a good idea.

      --

      War is necrophilia.

    8. Re:Oh, sure, it's "documented" and "open" by 1010011010 · · Score: 1

      At least PDF is a fully documented file format that anyone can make, without paying royalties to Adobe; PDF doesn't have "macro viruses" and doesn't need system fonts; it has several free multiplatform readers; and it preserves document layout. However, it's not a word processing format, so it's not a panaecea. It's difficult to impossible to edit.

      --
      Napster-to-go says "Fill and refill your compatible MP3 player", which is a lie. It's not MP3. It's WMA with DRM.
    9. Re:Oh, sure, it's "documented" and "open" by jafac · · Score: 1

      My company is a marriage of a traditional Unix shop and an NT software company. I'm from the NT side. I used to be delighted when people would send PDFs around, and nobody on the NT side had a problem reading them. Acrobat is freely available. It offered several advantages over Word, the primary one being, speed. You could open Acrobat in about 1/10th the time it takes to open Word on the same machine.

      People from our side of the company, of course, were standardized on .doc, and would send .doc out, and get complaints back from the other side, saying that they couldn't read them, they run Unix machines, don't have Word, etc. (this was pre StarOffice-Sun).

      Some people on my side of the company actually obtained Acrobat writer, and started changing over to PDF. There was even a movement to switch over to HTML (unfortunatley, if you have diagrams or graphics, it wasn't a single, self-contained file.)

      Of course, the tide has turned back for the worse now, as I'm seeing fewer and fewer PDF documents. People from the Unix side are now sending Word documents. It's a very sad thing.

      What ticks me off is that PDF was better in so many ways. It offered better security for documents that were being published from a single source, it was faster, and more universal. Word just sucks. At least our internal distribution of manuals and stuff is still PDF.

      What I really fucking hate is idiots who post technical specs on the web in .doc format. Instead of HTML or PDF. That fucking crashes my machine - so I have to carefully check file-formats before I click on links.

      If it ain't broke, fix it 'til it is!

      --

      These are my friends, See how they glisten. See this one shine, how he smiles in the light.
  21. Applix does convert doc. by rew · · Score: 1

    Applix does not convert DOC, it converts RTF

    Not true. Applix does convert DOC. It doesn't write doc however.

    At least the version that I use dayly.

    Roger.

  22. You are all both wrong and right. by Forge · · Score: 3

    .DOC is explained on the MSDN CD. There is also documentation on the web site for it. This is theory. In the real world following this documentation to the letter will allow you to read MSWord 6 files and RTF files only.

    The troth is that those specifications are inaccurate and incomplete with regards to word 97 and 2K. Every person who has tried to implement an import filter has ran into that problem. The end result is that you sit down and create word documents on one PC ( or virtual PC with VMWare ) then go through with a hex editor to figure out what symbol dose what.

    To put that all in perspective the two paragraphs above save to 1 KB ( minimum displayed file size on Win98 ) in HTM or text format. In MSWord .doc format it's 20 KB. This wouldn't be a problem if word simply inserted 20KBs of headers and footers but rather it splatters Irrelevant symbols all over the place. Even Word Perfect 8.0 only bloats it to 3 KB by adding 2 KB of headers, footers and font definitions.

    Everybody who dose this reverse engineering has to start from scratch. Every company that tries to read *.doc files has to put people to work doing it. A combining of efforts would be very prudent. Let's start by getting The Open Source teems together on this then we can invite IBM, Corel, Sun, etc... to join.

    We need someone to advocate the benefits of an LGPL or even BSD licensed library set to corps who must otherwise do it all themselves ? This is what ESR is useful for so go and call him.

    --
    --= Isn't it surprising how badly I spell ?
  23. If .doc is well so documented, why can't I use... by MrEfficient · · Score: 3
    files in that format? I've read a few comments from people who are defending Microsoft. These seem to come in two flavors:

    1. .DOC is documented, this question is lame FUD. Quit bashing Microsoft.

    Well, if its so well documented, then why can't I open a Word document in WordPerfect? And please don't tell me its because the Word document can contain embedded things like Excel and Access parts. I'm just talking about a regular word processing document with text and a little formatting. Our MIS guys tell me it does work but they apparently received this information from the WordPerfect 8 packaging rather than from experimentation because it doesn't work on my computer and they have been unable to show me where it works on their's.

    2. Why are you picking on poor Microsoft? Do you really think they would purposely obfuscate their own code and make it difficult not only on the rest of the world, but themselves as well? Do you really think they're purposely trying to make it difficult for other companies to use the .DOC format?

    Um, well yes, that's exactly what I think. What planet have you people been living on for the last 20 years. Of course Microsoft wants to make it difficult for other wordprocessors to use its format. They pretty much have a monopoly on in the Office arena and they want to keep it. If you could go out and buy WordPerfect for $100 less than Word and still be able to use the .DOC format perfectly, how would that help Microsoft? They have done things like this in the past and they will continue to do them as long as they can.

    On a more positive note, I'll say that I do think that Microsoft Office is a good product. I mean it works and it does alot of cool stuff(even though that makes it bloated). The problem is in the way which Microsoft has used the power that Office has given them, not in the product itself. And I'm not just bashing Microsoft. I fully believe that if Sun or Corel were in their place they'd be doing the same thing. The bottom line is that consumers are suffering because of proprietary formats. This is one of the big reasons why computers have not made us more productive (or at least as productive as we could be). I can't count the number of hours I've spent simply trying to convert documents from one format to another.

    --
    Check out AbiWord.
  24. Comment removed by account_deleted · · Score: 1

    Comment removed based on user account deletion

  25. Provoking PC users by leonbrooks · · Score: 1
    I used to annoy PC users by using my Mac PowerBook to translate files for them that they couldn't open, from programs that they didn't have and that weren't even available for the Mac, e.g., Lotus AmiPro.

    Likewise, feeding broken documents to the "strings" program to recover the text, or doing to a list with sed(/gawk) in one line and two seconds what would take a Word(/Excel) user weeks by hand. Highly amusing. (-:

    --
    Got time? Spend some of it coding or testing
  26. Re:'Everpresent Office monpoly'? by (void*) · · Score: 2
    Well if you work in an environment where people keep sending you that MS document in their emails, how much choice do you have?

    So because MS wants to keep out competitors, it is entitled to make you find another job simply because you wanted to exercise your choice in software. In my book, hurting innocent people is EVIL!

  27. DOC file converters by Hotaine · · Score: 1

    There was a company (I believe they were called INSO) that once upon a time made a set of Windows DLL's for file conversion. One of the things they supported was converting to/from .DOC '95 and '97 formats. '97 was the last one I saw, so they may have disappeared.

    I don't know if they had some sort of agreement with Microsoft or they came up with the converters on their own, but they were indeed out there.

    1. Re:DOC file converters by Hotaine · · Score: 1

      Yup, here's more info:



      Claims to support through Word 2000 Beta 2. Still don't know if they had an agreement with MS though...

    2. Re:DOC file converters by Hotaine · · Score: 1
  28. Why can't we reverse engineer HTML? by Rilke · · Score: 5

    The analogy is actually more apt than you'd think.

    The .doc file format is fairly well documented, as these things go, although there are some proprietary aspects, like the VBA streams. It's not that tough to open up a Word doc in your own program and parse the file correctly.

    The tough part comes when you actually want to display the document. Now all sorts of little details that aren't in the file format but are idiosyncrancies of MS Word pop up. And, as anyone who's used Office extensively knows, Word will display the document differently depending on which version you're using, what printer you have connected, phases of the moon, etc.

    Parsing and display are two different things. While half a million apps can parse HTML, no two of them seem to display it in quite the same way. The question here is a bit like pointing out that no browser displays things like (IE|Netscape). Well, no they don't, but that has nothing to do with an inability to reverse engineer the file format.

    1. Re:Why can't we reverse engineer HTML? by rgmoore · · Score: 2
      The tough part comes when you actually want to display the document. Now all sorts of little details that aren't in the file format but are idiosyncrancies of MS Word pop up. And, as anyone who's used Office extensively knows, Word will display the document differently depending on which version you're using, what printer you have connected, phases of the moon, etc.

      And arguably in both cases this is because people are asking the program/format to do more than it was ever intended to. Both html browsers and word processors were originally intended to format documents dynamically and squish them into shape using some fairly general parameters of window/page size, font, etc. The problem is that people are now turning around and trying to use both as detailed page description formats that place each letter or object precisely on the screen. Given the underlying assumptions of the renderer, it shouldn't be surprising that this doesn't work right. If you really want to fix the words onto the page, use a desktop publishing program or convert to PDF.

      --

      There's no point in questioning authority if you aren't going to listen to the answers.

    2. Re:Why can't we reverse engineer HTML? by Cassandra · · Score: 1

      While half a million apps can parse HTML, no two of them seem to display it in quite the same way.

      Why should they? HTML was never aimed at being WYSIWYG (Although admittedly many people use it as if it was). If you are true to the spririt of HTML (ie. don't use explicit fontsizes etc), it should be up to the application how to render the page.

    3. Re:Why can't we reverse engineer HTML? by tiny69 · · Score: 1
      anyone who's used Office extensively knows, Word will display the document differently depending on which version you're using, what printer you have connected, phases of the moon, etc.

      You forgot to add, Word will display the document differently depending on which computer you are using. Even if the hardware and the versions of Office are identical. I encounter this at work on a daily bases.

      --
      Go not unto/. for advice, for you will be told both yea and nay (but have nothing to do with the question)
    4. Re:Why can't we reverse engineer HTML? by Phroggy · · Score: 1
      And arguably in both cases this is because people are asking the program/format to do more than it was ever intended to. Both html browsers and word processors were originally intended to format documents dynamically and squish them into shape using some fairly general parameters of window/page size, font, etc. The problem is that people are now turning around and trying to use both as detailed page description formats that place each letter or object precisely on the screen. Given the underlying assumptions of the renderer, it shouldn't be surprising that this doesn't work right. If you really want to fix the words onto the page, use a desktop publishing program or convert to PDF.

      You're exactly correct, except that people do use Word to do weird things like embed a spreadsheet or a bitmap with weird page layout elements and such, adjusting their margins a tenth of an inch at a time to make everything fit just perfectly on one page. They use Word. They don't use PDF.

      Sure, some people use PDF, and for them, everything works - but if everybody did that, this wouldn't be an issue.

      --

      --
      $x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
      $x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
    5. Re:Why can't we reverse engineer HTML? by Nexx · · Score: 1
      In theory, an output from one vendor's fully-compliant HTML/CSS user agent should be identical to another vendor's. In practice, if they did that, then MS won't be able to make any money =P

      Sorry for the obligatory MS bashing, but <rant> Have you seen the crap they call html that gets barfed out by Word 2k?</rant>

    6. Re:Why can't we reverse engineer HTML? by mr3038 · · Score: 1
      In theory, an output from one vendor's fully-compliant HTML/CSS user agent should be identical to another vendor's.

      This works also in practise. But we still need to see first fully-compliant HTML/CSS user agent. Mozilla could be it after 8000 bug fixes.
      _________________________

      --
      _________________________
      Spelling and grammar mistakes left as an exercise for the reader.
    7. Re:Why can't we reverse engineer HTML? by driptray · · Score: 1

      Both html browsers and word processors were originally intended to format documents dynamically and squish them into shape using some fairly general parameters of window/page size, font, etc. The problem is that people are now turning around and trying to use both as detailed page description formats that place each letter or object precisely on the screen. Given the underlying assumptions of the renderer, it shouldn't be surprising that this doesn't work right. If you really want to fix the words onto the page, use a desktop publishing program or convert to PDF.

      There's a big difference between DOC and HTML that you're missing. With HTML the output medium is unknown - you don't know the user's browser, the browser's font and colour settings, the user's stylesheet, the user's colour depth, the user's screen resolution, the size of the user's browser window etc etc.

      With a DOC file the output medium is known, and it basically comes down to the page size. Its fixed - it doesn't change for each user - and therefore it should be pretty easy to render. In my experience DOC file converters (for WordPerfect etc) work pretty well, except where the documents contains fancy features like Word's "frames" or "text boxes".

      But hey, the world would be better off if people created structured documents instead of the hodge podge of formatting that is encouraged by Word.

    8. Re:Why can't we reverse engineer HTML? by rgmoore · · Score: 1
      With a DOC file the output medium is known, and it basically comes down to the page size. Its fixed - it doesn't change for each user - and therefore it should be pretty easy to render. In my experience DOC file converters (for WordPerfect etc) work pretty well, except where the documents contains fancy features like Word's "frames" or "text boxes".

      Well, that depends on what you're asking for. If you just want to preserve the structure of the document (i.e. alignment, line spacing, margins, etc.) then my impression is that current converters do pretty well. The problem is that some people expect for the document to look exactly the same, as it would if it were being rendered by PageMaker or Acrobat.

      In fact, though, to do that would essentially require that the renderer be bug compatible with Word. This is A) undesirable because it means rendering bugs correctly, B) impossible because there are several versions of Word with subtle and not-so-subtle differences, and C) impossible because it is observed fact that the same version of Word renders documents slightly differently on different computers, etc. In short, the problem that most people are complaining about isn't with the difficulty of dealing with niggling problems with fancy features, but with unreasonable expectations for what the program can and should be doing.

      --

      There's no point in questioning authority if you aren't going to listen to the answers.

    9. Re:Why can't we reverse engineer HTML? by GeZ117 · · Score: 1

      HTML contain (nearly) no information about rendering. There are some FONT, *COLOR and other tags and attribute that allow to define how it must be rendered, but it's regarded as a very bad practice. Correct HTML should contain only information and meaning, letting DTD do the rendering, allowing a document to be rendered differently on different media.
      In fact, HTML is more like LaTeX than like DOC: it's a WYSIWYM approach.

      DOC files, in contrario, are meant to be -supposedly- WYSIWYG.

      That's why different browsers display web pages differently, but wordprocessor should display .doc files always in the same way.

      --
      sigmentation fault
    10. Re:Why can't we reverse engineer HTML? by Tony-A · · Score: 1

      Even Word users sometimes care about how their documents look. Very frustrating. Word has its own ideas. You can fight it but Word will win.
      The cheapest crappiest plastic ruler at a discount store is at least tolerably accurate and useable.

  29. Lawyers in love by leonbrooks · · Score: 1
    Lawyers love to receive files in doc format since they can go in and see the previous revisions of offers etc.

    So... they not only get the last word, but the previous few words as well?

    --
    Got time? Spend some of it coding or testing
  30. Voting with your wallet by leonbrooks · · Score: 1
    The DoJ just bought 55,000 seats of WordPerfect and even the ruling against MS was created in WordPerfect.

    Now that's what I call followng your convictions...

    Microsoft has performed an illegal operation and will be terminated.

    Is that what they call a genital protection fault?
    --
    Got time? Spend some of it coding or testing
  31. Diagrams/line art/etc by leonbrooks · · Score: 1
    it is pretty convinient they way MS have done it.

    I have no problem with storing an image (standard bitmap or scalable, e.g. (E)PS) in a document, plus a reference to the application that created it and the source file. Then you can have your convenient little diagram in a portable format, and when you double-click (or, right-click->edit) on the image the originating application is started with the sourcefile as the first parameter.

    That way no application means no editing rather than no picture, which is how dear old "we know what you want" MS have done it.

    --
    Got time? Spend some of it coding or testing
    1. Re:Diagrams/line art/etc by spectecjr · · Score: 2

      That way no application means no editing rather than no picture, which is how dear old "we know what you want" MS have done it.

      Actually, MS did it the way you described above as the way it should be done. It's called View Caching. And most converters don't bother because they haven't implemented all of OLE Structured Storage (or at least enough of it to be able to *use* that part).

      Simon

      --
      Coming soon - pyrogyra
  32. Better idea by SnapperHead · · Score: 1

    Why sit here and waste the energy with 10 million differant formats. Why not create a standard. I know MS would not like that, but, thats there problem. Yet, another reason why people should think about not using there products. I think a standard is long overdue for, documents, spreadsheets, etc.. Remeber 10 years ago where there was 10,000 differant types of databases. Foxpro, dbase, etc.. Now we have SQL, which makes life sooo much easier. I hate it when people use .doc for sending stuff to me. I had a company send me there price list in that format. I told them that unless they sent me it in plain text format, I refussed to buy from them. After a number of emails back and forth, I was told that I am using an OS that is garbage becuase its not MS. Don't get me wrong, I could have converted it. Bt, its not worth my time. PDF is another example. I won't use it, becuase its a stuiped properity format owned by 1 company. If it was an open standard, I would use it. If the file format goes with there product only I can understand that. But, if there is 10 products very simular to it, I think its a waste of my time and everyone else. Ok, I have stepped off my soap box now...

    --
    until (succeed) try { again(); }
    1. Re:Better idea by vsync64 · · Score: 2

      I agree with you about Word and Windows, but PDF? I like PDF. And there are free viewers (GhostScript) for it, too. I don't even have AcroRead on my box... I just use gv.

      --
      TO BUY A NEW CAR WOULD MAKE YOU SEXUALLY ATTRACTIVE.
    2. Re:Better idea by jstarr · · Score: 1

      PDF is still a proprietary format (even though it is descended from postscript). Additionally, Adobe is not interested in keeping it stable. The latest PDF files can not be read in gv or converted using programs like pdf2ps. Even worse, users are required to upgrade to new versions of acroread, so the format is barely backwards compatible. If only the viewers existed for all the platforms that Adobe supports, DVI and ps would do the job in a much more open manner.

    3. Re:Better idea by SnapperHead · · Score: 1

      Thats my point, is corperate controlled. I would rather see an open standard.

      --
      until (succeed) try { again(); }
  33. Actually, it's been done, and it's GPL... by Arker · · Score: 1

    ... by at least one project, WvWare which has a very functional word to html converter available online, and the routines behind it are all open source.

    The problems remaining are two - the .doc format keeps changing every release, and second, honestly, it sucks. Even converting it to a real format can be interpreted as giving it credence. I have used the link above in a couple of cases where it was really necessary, but generally, when I get sent a .doc document I reply please send me the data in a standard format. This usually gets the point across. It isn't like word can't output to rtf or txt formats, but for the rare occasion when you don't dare insist some PHB converts his data to a real format, this is a viable converter. And of course if you are writing a GPL Word Processor you are free to use the routines published here to create your own conversion filter...

    There are also links on the page to all sorts of resources related to the ms word document format.

    --
    =-=-=-=-=-=-=-=-=-=-=-=-=-=-
    Friends don't let friends enable ecmascript.
  34. proprietory need not be easy to reverse-engineer by roman_mir · · Score: 2

    "why hasn't .DOC been reverse engineered? I would think that if this can happen to the DWG format then it can happen to any proprietary format."

    Not necesserily true. A format can be encrypted with PGP and a connection to the Internet may be required to read a document encoded with this format. Try and reverse engineer that.

  35. Microsoft Must Have Specs. by Effugas · · Score: 2

    You know, I've been thinking about this.

    The obfuscation isn't actually in the .DOC format; it's in the fact that Word itself reads the statements contained within the .DOC format in confusing and illogical ways.

    Yet, this readability has been maintained from Office 95 thru Office 97 to Office 2000. (Lets not even talk about Word for Mac!)

    This just isn't possible unless Microsoft has internal conformance specifications that they follow from revision to revision.

    We know the specs exist because it literally would have been impossible for Microsoft to have functioned without them.

    98% of Word documents don't use any advanced Word features. In fact, 98% of Word documents should be saved in RTF format, and lose nothing of value in the translation. With these specifications, the #1 thing companies could do would be to implement a DOC->RTF filter *at the mail gateway* and be done with 98% of Macro Virii.

    Will it happen? Nah. The Word Monopoly is just too critical to Microsoft's success. It really is.

    Yours Truly,

    Dan Kaminsky
    DoxPara Research
    http://www.doxpara.com

    1. Re:Microsoft Must Have Specs. by edhall · · Score: 2

      Actually, Microsoft doesn't need to have specs at all. It just needs to carry along a bunch of legacy code that gets glued into successive versions, perhaps with some API modifications or a compatability layer. "Conformance" to such a non-spec can be determined by regression testing.

      A surprising number of software projects cook along for many years and through many revisions without ever having complete specs. And though the lack of specs may be bad, code re-use is usually a Good Thing, specs or no.

      -Ed
    2. Re:Microsoft Must Have Specs. by EvlG · · Score: 2

      I have heard of several corporations that banned the use of .DOC on their network; they make all employees transmit files via network storage/email/whatever in .RTF to eliminate virii and interoperability problems.

      Let's face it. Most people just don't need all that shit in their document. Bulleted lists and tables satisfy most people's needs. Whatever happened to optimize for the common case?

    3. Re:Microsoft Must Have Specs. by HermDog · · Score: 1
      And this spec-less method better supports features of Microsoft such as Spontaneous Information Hiding, which happens after you've spent three or four hours carefully constructing a multi-page document with tables of data and it turns into multiple pages of perfectly-shaped squares.

      I don't know if Spontaneous Information Hiding still exists in MS Word because I have since turned to using an Etch-o-Sketch for all of my important documents.
      --

      --
      JADBP
    4. Re:Microsoft Must Have Specs. by Effugas · · Score: 2

      The legacy code issue would be logical except for its surprising portability to alternate platforms, i.e. Macintosh.

      --dan

    5. Re:Microsoft Must Have Specs. by steveha · · Score: 1
      The legacy code issue would be logical except for its surprising portability to alternate platforms, i.e. Macintosh.

      Sorry; you're wrong, and edhall is right.

      I used to work at Microsoft, and I spent some time working in the Word development group. Word is a seething mass of legacy code. Sometimes it gets re-used, sometimes large chunks of new code get added on top, but every version carries everything forward.

      The Mac version is built out of the exact same source tree as the Windows version. In Mac Word97 they use some compatibility libraries, sort of like WINE; I don't know for sure about Word2K, but I'll bet it still uses them.

      --
      lf(1): it's like ls(1) but sorts filenames by extension, tersely
  36. wv? by RossyB · · Score: 1

    What's wrong with the wv library - also known as wordview which was packaged with many Linux distributions. www.wv.com IIRC.

    The default translation is DOC to HTML - which sucks. However, it internally generates XML which can be transformed into anything - one of the examples is TeX.

    Ross

  37. First off, the word is obfuscate... by millia · · Score: 1

    Now that I've got that off my chest:

    Why would a company with the smartest people in the world make life more difficult on themselves by making their own formats hard to read?

    That's a good question I'd like to know the answer to. Having tried in the past to plumb its depths in order to output database reports in it, I really don't know.

    Are these formats really open, however, or are they like the shrinkwrap on the Kerberos license? Back when I got the format, I had to sign an NDA and promise not to use the format in a word processing program.

    Microsoft are moving to XML based everything.

    Yes, and have you looked at how they use it to save HTML? Biggest bunch of spaghetti known to man.

    Look, I'm not an agnostic here: I use Windows 98 everyday, and Office, too, and am an MCSE. But it is futile to defend a company who attempts to maintain their monopoly by making things complicated. I would defend them more if they merely trusted the quality of their own products, instead of doing their best to lock you in by means that go against computing's best practices: keeping things simple, and allowing you to move your data however you like.

    I think the quote on the OpenDWG Alliance page sums it up:

    Who should have control of your DWG files?
    You should.

    It is very scary that a good chunk of word processed documents are stored in an overly complicated binary format.

    --
    stored on computers from birth to the grave
  38. Proposal: An alternative posibility. by Tsujigiri · · Score: 1
    I have heard this type of discussion before and always end up looking at it in another way. The problem as I see it, is not that Microsoft format is a universal format and other companies have trouble using that mostly undocumented/un-recreatable[sic] (due to COM and other platform specific issues), but that all the other suppliers of office/wordprocessing software try to support that universality.

    These other suppliers (Lotus, Sun-Staroffice, Applixware and other opensource projects) all try to offer their own proprietary formats (ok, maybe not the opensource projects, but certainly the big boys) and also offer to support MS formats.

    An alternative solution that I believe may be the answer is to create an open community equivalent to the W3C (is that right, the body that maintains the HTML standard, whether or not it is followed, (it's late and my brain is getting foggy)) for office document formats. This way there would be an open, universal specification for document formats that all the main products could conform to and would be available to any newcomers. The bigger suites could support open standards and proprietary if they wished (for advanced formatting that they thought was important), but it would mean that if you wanted to ensure that your documents could be read by anyone, you could use the universal format. Also a document format such as this would (hopefully) be outside the control of any one developer.

    It's not a perfect solution by any means, but it could just be a step in the right direction .

    --

    "I'll take the red pill. No! Blue! AAAaaaahhhhhhhhh"
    - Monty Python meets the Matrix

    1. Re:Proposal: An alternative posibility. by gddavidson · · Score: 2

      An alternative solution that I believe may be the answer is to create an open community equivalent to the W3C (is that right, the body that maintains the HTML standard, whether or not it is followed, (it's late and my brain is getting foggy)) for office document formats.

      This is the only answer to the problem and W3C is a great example. It takes the control away from MicroSoft, a company that uses the spec as a means of driving upgrade sales and maintain their monopoly (the real purpose of .DOC these days) and places control with back with the consumer.

      The idea for a common doc format could be marketed successfully based on two points.

      First, a common doc format would allow companys and individuals could save large amounts of money by not having to upgrade to the latest verion of Word every two years. This would impact a company's bottom line.

      Second, a common doc format would provide companies and individuals with a level of "insurance" that older document types that hold important data would not at some point in the future become obsolete.

      Neither of these points even brings up the obvious benefit to the rest of us that use non-MS systems. It would increase competition in the Word Processing arena and would probably move use towards a world where .DOC and .HTML could be interchangeable. Based on the above points, many companies would require that their employees maintain company data using the new open standard.

    2. Re:Proposal: An alternative posibility. by mfnickster · · Score: 1

      Before I comment, I should state that I'm a Mac owner -- although I've used Unix a fair amount and (unfortunately) Windoze pretty extensively.

      I was really disheartened to see OpenDoc fail, since it was exactly the kind of multi-vendor open solution that would fit this situation. Sure, if your compound document contained proprietary parts that you didn't have an editor for, you wouldn't be able to modify them. But at least the document wouldn't break because of it.

      Better still, OLE was incorporated as a subset of OpenDoc so all those old MS formats could be rolled in. I guess it's just one more example of the Market deciding on a standard that is not technically the best choice.

      Maybe someone will create an XML-based format that offers all the advantages of OpenDoc, but I'm not holding my breath.

      - mfnickster

      --
      "Slow down, Cowboy! It has been 3 years, 7 months and 26 days since you last successfully posted a comment."
    3. Re:Proposal: An alternative posibility. by jafac · · Score: 1

      I think that an enforced independent standard would *eliminate* competition. Right now, nobody effectively competes with MS, because there's no point, they know they'll never dislodge MS.

      With an enforced independent standard, MS might become dislodged as the standard, but then, we'll have nothing left but a million shitty word processors, all of them happy to compete with a shitty offering - no vendor willing to invest a sufficient amount of money in the product to make it truly great, because they will be unable to gain dominance easily, and once they do, there will be no way to assure continuance of dominance, because any other company could just come along and knock you off your hill. You couldn't capture the market with proprietary standards any more, and enjoy the fruits of monopoly-ness. So why bother. The only incentive to even produce a word processor under those circumstances, would be to bundle it with something else that you COULD leverage. (sound familliar?)

      I'm just not convinced that open standards would be THE answer. While it would be a nice thing, and help stimey some of the abuses that are rife today - it would have the same effect as communism has on incentives to innovate.\
      While it's true, some folks DO innovate, just for the thrill of it, or the fame, or for the philanthropy of it all. But most people do it out of greed.

      If it ain't broke, fix it 'til it is!

      --

      These are my friends, See how they glisten. See this one shine, how he smiles in the light.
  39. It's very simple really... by Anonymous Coward · · Score: 2

    "Now, why can't Corel, Lotus, Sun, etc. band together and reverse-engineer Microsoft's file formats properly?"

    It's very simple really. Unlike Autodesk, which uses some form of logic to create their file formats, Microsoft uses heavy encryption seeded with a semi-random number.

    This number is based on the millions of dollars Bill Gates is worth at the year of release. In fact, the file formats for Office 95, 97, and 2000 are identical - it's just that Bill Gates has been worth more at the time of their release, so the file was encrypted differently.

    This is why it's so important for the Microsoft stock price to jump around, if it stood still then the file formats wouldn't change, which means people wouldn't buy the latest version of Office, which means the stock price woudln't change, and so forth in an infinite loop.

    ;-)

  40. we already have... by ocipio · · Score: 1

    strings file.doc ;)

  41. There is a standard by Arker · · Score: 1

    It's called TeX - check it out.

    --
    =-=-=-=-=-=-=-=-=-=-=-=-=-=-
    Friends don't let friends enable ecmascript.
    1. Re:There is a standard by SnapperHead · · Score: 1

      Heard of it, never used it, and never saw it used.

      --
      until (succeed) try { again(); }
    2. Re:There is a standard by SnapperHead · · Score: 1

      Let see, hmmm, how many places use PDF ? How many use Tex ??? You get where I am going with this ?

      --
      until (succeed) try { again(); }
  42. sure. by mattdm · · Score: 2
    Sure, this totally makes sense. For example, the Word document's description of a table is going to be based on the way Word renders tables. If your program makes tables a different way, a one-to-one conversion may not be possible. In order to do a lossless conversion, you'd need to incorporate the way Word does it into your app.

    --

  43. Good comment! by Arker · · Score: 1

    I think you are 100% right, and if I hadn't used up my last moderator point on a good post yesterday, I'd bump you a point myself. Not that your comment was particularly informative (no links to back up your point, shame shame) but it definitely qualifies as insightful. MSDoc format is an abomination, while it is a good thing there are in fact decent converters available (see WvWare) for those occasions when we just have to read a .doc file, but the goal should not be conversion of this disgusting format, but elimination of it in favour of open standards (text, TeX and html, depending on the document.)

    --
    =-=-=-=-=-=-=-=-=-=-=-=-=-=-
    Friends don't let friends enable ecmascript.
  44. nope by Lazy+Jones · · Score: 1

    Win98's Wordpad doesn't read most of the Office 2000 .DOC files I've been sent, at least not properly. Usually, these files will contain many spurious characters and the formatting will be lost.

    --
    "I love my job, but I hate talking to people like you" (Freddie Mercury)
  45. Re:Ok, here we go again... (or use wine) by Lennie · · Score: 1

    go take a look here: http://www.winehq.com/Apps/details.cgi?id=2097

    --
    New things are always on the horizon
  46. HTML by aengblom · · Score: 1

    Have "we" managed to get Netscape and IE and Opera (etc etc) to display HTML exactly the same yet? Of course there might be some minor disagreements on what each tag is exactly supposed to do, but it still must be easier than reverse engineering the DOC spec. I'll bet Microsoft couldn't even create a app to read the Doc spec just as Word does!

    --


    So close and yet so far from the world's perfect ID number
    1. Re:HTML by nagora · · Score: 1
      It is part of the design of HTML that it does not require user agents to display HTML in the same way as each other. CSS is more for that, but even then there is supposed to be some flexibility.

      TWW

      --
      "Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
  47. StarOffice 5.1 works by prac_regex · · Score: 1

    i dont know what the big fuss is anyways.. because StarOffice 5.1 DOES decode .doc files. I dont know how, and i dont care. All i know is i can read my pesky marketing peoples files in Linux. boind...goinbb..
    pavementrocks.

    1. Re:StarOffice 5.1 works by Phroggy · · Score: 1
      If somebody creates a file in Word97, e-mails it to you, you edit it in StarOffice and make a minor change, then e-mail it back to them, I'll bet you something will be wrong with the formatting. Maybe not something really big and obvious, but something.

      --

      --
      $x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
      $x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
  48. Why? by tru+junglist · · Score: 1

    Why can't we reverse engineer .doc?
    The answer is 43.

    jungle is massive

    --
    jungle is massive
  49. Re:'Everpresent Office monpoly'? by Spoing · · Score: 2
    Well if you work in an environment where people keep sending you that MS document in their emails, how much choice do you have?

    Mmmm...send them files in TeX format?

    Seriously though, knowing what the person can read on the other end and sending them that is courtious. Unfortunately, too many people think that everyone uses what they do.

    --
    A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
  50. Corel WordPerfect by paulproteus · · Score: 1

    Enough said. Corel WP is really the best WP out there, and it bothers me that people say, "Oh, that feature is in Word so it must only be in Word."

    I can't stand it any longer. Please, go help Corel out and buy a copy of WordPerfect Office 2000 -- it's better than Word (Reveal Codes!!!) and it's cheaper. Not to mention OEMs don't force it down your throat....

    --
    |/usr/games/fortune
  51. err..., not even m$ can do this by bcboy · · Score: 1

    I have the misfortune of having to work with Word on occasion, and my wife uses it in her consulting business. Formatting problems crop up constantly when trading files with other Word users. Wasted time, wasted paper are just part of using Word. If m$ can't make two versions of Word that format the same document the same way, or even one version of Word that runs on two platforms (MacOS and Windows) and formats the same document the same way, why is anyone surprised that no one else can get it to work? It should be nuked from space and forgotten.

    1. Re:err..., not even m$ can do this by Pachy · · Score: 1

      ...or simply to instances of the same version running computer with different printer/video drivers.

      You're perfectly right when you say not even m$ can do this. I tend to think Word versions are not intentionally incompatible: none of them works, that's all. Unfortunately the concept of reverse-engineering implies some kind of rigorous technical work has been done on the subject of the study.

      The tragedy here is .DOC is not really a printed document format, it heavily depends on Windows' font display system and tries to print it afterwards.

      Why people still use this crappy system, that has not progressed in any aspect in 16 years (since Mac Write 1.0) is beyond me.

      Of course, the correct approach is TeX's: having a page description and being able to display a graphical preview of it.

      I tend to think WYSIWYG word processors can't work at all, because of this dependency on the font displaying system. LyX is the closest thing to a WYSIWYG WP that works.

  52. One Single Way, One Single Format. by rubinelli · · Score: 1

    It's a daunting task to reverse-engineer the .doc format because even MS's developpers would be hard pressed to re-implement their engine.

    The best way to deal with .doc files, IMHO, is to make them irrelevant. That's why I totally agree with the people that say we should have one office documents format.

    I believe this format should be based on XML, so it would be very easy to extend documents. You might even put a whole site inside your doc! Doc size wouldn't be a problem: we just have to compress it in the end with a standard metod.

  53. Reading .DOC files by phutureboy · · Score: 1

    I have had to put a great deal of research into this because I'm doing a project for a client right now that requires converting .DOC files to HTML and inserting them into a MySQL db. So far I've found plenty of worthy solutions for converting the text, but none of them will handle the linked TIFF graphics in the documents.

    Here are a few of my bookmarks:

    WVWare - GPL library for reading .doc files, used by AbiWord, currently incomplete

    W3C's list of converters

    HyperNews' list of converters - really old

    Filtrix - Good commercial, closed-source converter, now available for Linux, great price, but doesn't handle linked TIFF files :P

    InfoAccess - Makers of HTML Transit, the Cadillac of closed-source commercial document converters, also exorbitantly expensive ($5000+) and AFAIK not avail for Linux

    KOffice (KDE2) filters page - not much here, but AFAIK they intend to ship with MS-Word import capabilities

    So, is anyone aware of any open-source MS-Word filter projects that I don't know about? Especially one that recognizes/converts linked graphics contained in the document?

    - phutureboy

  54. why would you want to? by Mao · · Score: 1

    I personally don't see anything particularly amazing in .doc that I don't see in other document formats. I only prominent "feature" i notice in it is the fixed page width thing, making it truly wysiwyg. However, over time i've begun to increasingly appreciate .lyx motto wysiwyM (m stands for "mean"). I realized how rarely i need a document to be truely wysiwyg. The most common reason i can think of that people need to have a rigid text/page appearance is for making flyers and etc, which i guess a program like gimp of photoshop may be better suited, since many flyers feature lots of images and stuff anyway. The argument for .doc may be that lots of people are using it, and it'd fascilitate things better if say a windows user can read something from a linux user, or vice versa. But the root of this problem is simply that big word again "STANDARDS." Why reverse engineer a proprietry format, when one could spend the time promoting and developing open standards like html or xml, or even TeX? Another problem with .doc is that it is not a typesetting language. My favorite scenario is that: you make one typo in your resume. You are in a situation that all you have is a telnet program that allows you to connect to thr server where your resume is stored. If your resume is done in a something like html of TeX, u can fix the typo in no time. I'd love to know what i can do in this situation if my resume is done in .doc. Thanks for listening.

    1. Re:why would you want to? by josepha48 · · Score: 2
      It is more of a need I think. For some reason, people over time have moved to Microsoft Office. I think it is sort of a domino effect. One company uses it, then they do business with another company, and so on. As they send documents between each other it ends up bing in either .doc or some other format. Because one person starts using .doc and the only way to see these files really as they were meant to be seen everyone involved ends up using .doc. Now we have a large percentage of people all over the planet using .doc. I get them at work through our office mail all the time informaing us of this or that. How do I read them? With word. The only reason to make this transulation system is so that people can still read word docs without having word.

      Now I have tried wordperfect 8 for Linux, and the word filter does not work on more than half the documents that i have. StartOffice 5.1 does a pretty good job of this and from what I hear is it is getting better. However I know that if you start doing some complex things in word then startoffice may not read all of the document. They are working on this though. Apparently startoffice 5.2 is supposed to have pretty good support for word files.

      On another note their are several project that are open source that are working to reading these formats, on of which Ibelieve is called AbiWord. Although it's native output will not be word, last time I talked with them they were working on a word filter.

      send flames > /dev/null

      --

      Only 'flamers' flame!

  55. Well, I'll be darned.... by ZoneGray · · Score: 4

    At one time, you could download the specs for the binary file format. Now, according to:

    http://support.micro soft.com/support/kb/articles/Q211/6/41.ASP

    You need to write to an e-mail address and explain why you want it. It also says that the formats for earlier versions of Word are no longer available.

    For what it's worth.

  56. Comment removed by account_deleted · · Score: 1

    Comment removed based on user account deletion

  57. Re:What a silly question by nafmo · · Score: 1
    Disallowing reverse-engineering of software products or file formats is illegal under the laws of many countries; Microsoft just wouldn't be able to sue you, unless you actually infringe on any of their copyrights or patents (i.e use their code).

    Disclaimer: I am not a lawyer, and I don't know the laws of many countries, above is what I understand of Swedish and Norwegian law.

  58. HTML is too vague. by rlowe69 · · Score: 1

    I believe the reason why HTML documents look different on any given browser is because the spec is too vague.

    For example, in Netscape 4 you can't put something right in the top-left corner without using frames. Is this right? No. Is this wrong? Where is it in the spec?

    Another example is the amount of space that the individual browsers use for frames themselves. If you specify a speicific height for a horizontal frame at the bottom of a page (see http://www.wired.com for an example of this), it will be different heights for IE and NS. Where is this in the spec? Exactly ...

    Clearly, HTML isn't up to the task. We need to design a markup language specific enough so that it won't choke when a designer makes a neat and interesting design.

    Graphic designers are increasingly becoming web developers. They shouldn't have to worry about the different versions of different browsers and why the same shit looks different all over the place.

    If people keep falling back on the "if you don't like it, post a PDF" reply, then nothing will be done! We clearly need a new markup language.

    rLowe

    --
    ----- rL
    1. Re:HTML is too vague. by prizog · · Score: 1

      HTML describes the structure of a document, not it's look. It was NEVER intended for graphic designers. Lynx (actually, Links, another text-mode browser, is better) can render HTML perfectly. Yes. Perfectly. But "perfect" just means that it interprets the tags that it chooses to, and displays stuff. But it doesn't necessarily do it the way hot-shot web designers want it to.

      That's 'cause hot shot web designers aren't usually trained to design for the web - they design for print and hope it comes out OK on the web. Which is one of the reasons that a lot of sites are damn hard to use. But I digress. The point is that HTML doesn't describe look / feel - it describes structure. Using tables for layout is abusing the intent of tables (that said, I do it on my personal page).

      If you don't like the intent of HTML, make some new standard. Don't use PDF - it's proprietary, and supporting proprietary standards gets us into shit like the gif and mp3 affairs.


      -Dave Turner.

    2. Re:HTML is too vague. by rlowe69 · · Score: 1

      If you don't like the intent of HTML, make some new standard

      I believe you're agreeing with me, so I shall add a few points.

      1. True, HTML just describes structure. Browsers *should* be able to interpret that structure in any way they please, given HTML's vagueness (ie. how high is a <BR>?).

      2. "Web Designers" need a multi-platform standard upon which they may design web applications. If GUIs are going to make life on the web easier, they we need some universal way to make GUIs. Flash and Shockwave are nice in that they use vector graphics (such as those used in common graphics programs like Corel Draw and Illustrator), but they aren't as universal as they should be. I believe vector graphics will be the future of web design, since they can be resized on the fly, and are sometimes much smaller in size.

      3. True, hot shot web designers aren't trained for the web, but they understand good design - whether it is a car, a magazine ad, a room or a web page. These are the people we should be using to make our GUIs, because they understand not only color and layout, but also the important elements of UI design.

      I think we are in agreement that a new standard should be born. The only question is, who is going to do it? MS? The open source community? ISO?

      --
      ----- rL
    3. Re:HTML is too vague. by mr3038 · · Score: 1
      1) ...ie. how high is a <BR>?).

      What? BR-tag marks forced line break. AFAIK spec says that repeated BR-tags should be replaced with only one. You cannot design HTML pages - and you aren't supposed to - with absolute values in everything. Viewer possibly doesn't have even font you would want to use or cannot use color you defined. Possibly he/she cannot even see the page - I hope your ALT-tags are correct.

      As stated before HTML (or at least should) describes only document structure (no matter there is formatting tags/attributes in spec) and CSS should be used for ALL formatting. Note that those tables are not for formatting. Though I use those for formatting too. But I do it only to support those brain dead browsers. Any Netscape 5- users out there?

      About point 2) I think that vector graphics should be distributed in EPS (Encapsulated PostScript) format. It doesn't support animation (at least that I know) but I yet have to see page that has any extra information in vector graphics animation. (No, if your menus roll, blink and zoom when you come visible it is NOT extra information I want.) There could be better format but it needs to be open.

      Item 3) in your list is true but I don't think good design has anything to do with absolute positioning and stuff like that. We should NOT mess up with GUI and UI either because as I see it HTML is not limited to 2D displays with paper like formatting (think about blind people for example). After all it's only headers and text paragraphs.
      _________________________

      --
      _________________________
      Spelling and grammar mistakes left as an exercise for the reader.
    4. Re:HTML is too vague. by rlowe69 · · Score: 1

      Care to elaborate?

      --
      ----- rL
  59. Re: never used it... by Arker · · Score: 1

    Your loss.

    --
    =-=-=-=-=-=-=-=-=-=-=-=-=-=-
    Friends don't let friends enable ecmascript.
  60. MS Word File Format is here by QBasic_Dude · · Score: 3

    At Wotsit. Microsoft Word 6.0, 8.0, Word 97, and Palm Pilot doc files where all reverse engineered.

    1. Re:MS Word File Format is here by dmccarty · · Score: 2

      FWIW, the Palm DOC file format (which has nothing in common with the MS Word .doc file format except its name) was invented by Richard Bram, and has been open source since day 1.
      --

      --
      Have fun: Join D.N.A. (National Dyslexics Association)
  61. Specification != Documentation(Samba-like problem) by MythosTraecer · · Score: 3

    I believe there are actually 2 problems here:

    1) As I think several people have touched on, the problem here isn't the documentation, since Microsoft through MSDN etc. has documented the Word file format. The problem is that the only specifications on how to correctly render the Word documents are the Word rendering engine itself. Without the ability to see the exact logic that Word uses to render certain formatting codes (read: source code), it is impossible to reverse-engineer a 100%-compatible converter/viewer. It is a similar situation to what the Samba team faces: the SMB/CIFS protocols have been documented by Microsoft, but the only implementation of those protocols is Windows NT/2000, so Samba in reality must be coded to re-implement NT, not implement the CIFS specifications. The difference here, of course, is that CIFS apparently has a complete spec that Microsoft simply ignores, rather than the Word situation where they purposefully keep people in the dark on how things should be done.

    2) the reason that you can't just watch what the Word rendering engine does and duplicate it is because it's stupid. From my experience working with Word itself and wvWare to convert Word files to HTML, it's obvious that Word just throws odd formatting codes where ever it pleases, and never bothers to clean them up. Often tags to end bold formatting (converted to </b> by wvWare) are just randomly placed in the document, nowhere near where any bolding is supposed to occur. The same goes for font sizing/coloring: Word seems to place odd, irrelevant font codes in places, only to override them with the correct codes a few lines later (often without canceling the first codes). In other words, it's a mess. With the Word source code, one may be able to figure out the (supposed) logic behind the mess; without it, I fear anyone is simply grasping at straws, especially since MS continuously changes to Office keeps everyone guessing about what Word is actually doing underneath it all.

    My US$0.02 of course.

    --

    --Mythos
  62. Question is rather: why no coordinated efforts? by Colonel8 · · Score: 1

    Summing up we've this situation: 1. MS uses a proprietary storage format: Ole Storage. It's structure is meanwhile well known, one can retrieve the actual application dependent documents easily. 2. The application dependent documents are partially documented by MS, partially by others. 3. Documentations aren't complete anyway. The binary documents contain most relevant undocument data portions.(It's obviously due to automatic serialization strategies applied by MS: easy to apply but practically not documentable; not even by MS themselves. This leads to the funny situation that people reverse engeneering the file formats understand them better than MS ;-)). I'm working on Word, Excel and PowerPoint intensively for about six years now and can say: it is possible to understand all of these portions. 4. The WMF/EMF/PICT image formats are not sufficiently supported on alien platforms. Even this: on Macs xMF looks ugly, on Windows PICT drawings look ugly. Not a too big problem compared to the rest, but it's not yet solved. 5. MS XML support simplifies the understanding of the docformats even more. 6. Quite a bunch of information is not stored in the documents but in the application; only the variations from default are stored in the documents. It requires quite some efforts to rebuild this data yourself, but is is possible. Summed up: The knowledge about document formats is no longer a problem. The problem is rather to get the knowledge focused on free applications. I'm afraid it requires management actions from this side. PS: Did you know that MS stores GIF files as PNGs in their documents? :)

  63. More info on .DOC format by dvt · · Score: 2

    There is a lot of confusion here about whether or not the .DOC format has been documented, because there are two layers to the file format. First, there is the Word document format itself, which Microsoft has published in some MSDN CD versions. It also available from places like www.wotsit.org. This specification is inaccurate in places but close enough to make Word document conversion possible. Caolan McNamara has a very good start on a Word-to-HTML converter at www.wvware.com. The Word document format changed in the transition from Word 6 to Word 97, and is the same in Word 2000.

    However, Word documents since version 6 are wrapped in OLE Compound Documents, which Microsoft also uses for .XLS files. The Compound Document format is not officially documented anywhere in Microsoft documentation, as far as I can tell. (But see below for a patent that might disclose this structure...) The MSDN library samples invariably use Windows system calls to access data in Compound Documents, and reveal nothing about the file format.

    There have been some efforts to reverse-engineer this format:
    http://arturo.directmail.org/filtersweb/ and
    http://snake.cs.tu-berli n.de:8081/~schwartz/pmh/guide.html,

    A Compound Document contains a tree structure of data streams, which seems like a simple enough structure but it is implemented using a very complex file format. The lack of complete documentation of this format is a major impediment to development of robust open-source code that will access the Microsoft Office file formats.

    A second potential impediment is a nest of patents that Microsoft has built around the Compound Document format. These are just a few:
    US5467472: Method and system for generating and maintaining property sets with unique format identifiers
    US5715441: Method and system for storing and accessing data in a compound document using object linking
    US5506983: Method and system for transactioning of modifications to a tree structured file
    US5706504: Method and system for storing data objects using a small object data stream

    There are a fair number of patents (IBM seems to have some possibly related ones as well). You can find them here: http://patent.womplex.ibm.com/home. A search for "((compound document) and microsoft)" lists 24 patents. It would not be surprising if a serious effort to provide open-source access to Microsoft Office documents ran into legal threats because of these patents.

    Interestingly, the last one looks like it might disclose the Compound Document format, which Microsoft would have to disclose to satisfy the patent office. The description looks right, but the diagrams do not seem to be available from the IBM site. Looks like I'll have to dig some more -- anyone know how to get the full text and images for U.S. Patent 5,706,504?

    1. Re:More info on .DOC format by sgifford · · Score: 1

      Try the patent office Web page:

      http://www.uspto.gov/patft/

  64. But the point is... by Arker · · Score: 1

    I don't want a word processor that spits out files that are readable only with extraordinary effort. I do want a word processor that will produce files in a standard format readable on any machine. The much maligned EMACS is in this respect far superior to MS-Word. I don't believe MS would have any problem putting out a far superior product, but they refuse. They would rather keep using an ever-changing format whose only "virtue" is that it is NOT readable to anyone on a different platform. This is "progress?" This is "innovation?"

    It is not, but the only way to turn Microsofts admittedly great collective talent to doing things right is for the users of their software to send them a collective and loud message that we will not put up with this crap anymore.

    Will this happen? I don't know. I'm not betting on it. I'm learning to use Linux. I'm about 90% there. Once I learn ITCL and figure out how to set up a linux box as a usable multimedia platform I'll be all the way there. Anyone want to help?

    --
    =-=-=-=-=-=-=-=-=-=-=-=-=-=-
    Friends don't let friends enable ecmascript.
  65. The Turth is out there... by sheldon · · Score: 1

    Microsoft developed an open specification for sharing of word processing documents years ago.

    It is called Rich Text Format(aka RTF), the specifications are available from MSDN.

    It is not uncommon for application vendors to support open specifications for data sharing, but not provide the details of their internal file formats. There are a lot of reasons for this, and not all of them are "evil".

    For instance, searching the Corel website I can't seem to find the file format for WordPerfect.

    Obviously Corel must be an evil corporation. Oh wait, they can't be, they're going bankrupt and supporting Linux.

    1. Re:The Turth is out there... by frank249 · · Score: 1
      The file format for WordPerfect has not changed since version 6 yet Microsoft Word cannot save to .wpd format. Why should they when they can use illegal monopolistic practices to control 95% of the office suite market? Word 97 will save to WordPerfect 5.x for Windows. Hmm thats funny. I thought the first WordPerfect for Windows was ver 6?

      BTW, Corel has only 5% of the corporate market but has 30 % of the retail market. Customer satisfaction surveys consistently report that while more people use MS Word, they are not generally satisfied with it. WordPerfect users consistently report high satisfaction levels.

      Well there are about 85 days to go until the restrictions kick in. We will see if Corel does better then when the playing field is leveled.

      --

      Today's vices may be tomorrow's virtues.

  66. Stimpy - Yee-u eediot! by Gat1024 · · Score: 1

    The very nature of a component object model makes transferring document across different platforms, even different computers on the SAME platform aggravating at times.

    You just HAVE to have some component that can interpret a stream available to completely decode a document. This is true of ANYa component model. You want to see how difficult decoding a compound document is? Try grabbing the dead OpenDoc spec at look at their bento container. It's design goal is exactly like *.doc. And that was designed from the get-go to be cross platform.

    Think of it as component hell. And it is unavoidable no matter who does it. This goes for KOffice as well. Complexity is a run away train. I should say entropy. Since we're tending towards chaos here.

  67. And while your at it... by T. · · Score: 1

    And while your at it take a crack at the Visio file format! (Which just recently got swallowed up by Microsoft, damn it all.)

  68. Why .DOC? by Compuser · · Score: 1

    Reverse engineering a format tailor made for
    a specific application to a point where even
    upgrades to said application break format
    compatibility may be futile or a waste of
    resources.
    What is needed, IMHO, is an education campaign
    to use RTF for file exchange. If a few big
    corporations adopt a policy of only accepting
    RTF files for communications and only generating
    RTF files for communications, then it may start
    propagating in the corporate world. The only
    reason I think this is realistic is because there
    is clear financial insentive to do so for everyone
    except MS itself.

  69. Re:Component Hell by Arker · · Score: 1

    This is true. Pull in a TeX document which contains a figure which is in a format your 'puter doesn't understand, you'll see blank space. Same thing with a .doc file in the same situation, right?

    Explain to me just how the proprietary .doc format is superior to the open TeX format then. This obviously is not it, because they both react the same to this situation.

    --
    =-=-=-=-=-=-=-=-=-=-=-=-=-=-
    Friends don't let friends enable ecmascript.
  70. -1 (Offtopic) by Chops · · Score: 5
    Do you think MS is the only multi-million dollar business to lie and cheat? I've got news for you. THEY ALL DO. However, MS does it to enforce a monopoly, while other companies do it to try to get a monopoly. That's why it's wrong. The problem is once you get to being a monopoly you have to stop doing all the things that got you there. But don't talk about MS like they are so much worse than other companies. They aren't. They are just the biggest, and most documented.

    Right you are, sir! In today's "free" market, there are a slew of businesses which wield monopoly power, but which they don't want you to know about it. Consider:

    Cisco Systems has a market value comparable to Microsoft's, and has even exceeded it at times, by maintaining a total stranglehold on the network hardware market. Although they would have us believe that Cisco's strategy is "providing a reliable, top-quality product and good support," a number of internal memos have recently been leaked indicating that Cisco plans to start including support for the "upgraded" IPv6 "extension," putting them in a position to use the "embrace and extend" strategy to leverage their large market share into an almost total monopoly on the Internet's physical infrastructure.

    The Lego corporation has a long history of introducing new block designs which render the old blocks almost totally useless from an aesthetic perspective. "I spent all my lawn-mowing money on the medievel set," said a sniffling little boy who asked not to be identified, "but then the Technics came out, and all my spears and stuff wouldn't fit anywhere on the walking robot I built unless I mixed those brown spear-holder blocks in, and then my robot looks yucky." He also pointed out, as is well known, that Lego has broken Technics color-compatibility with their new Mindstorm upgrade, by switching red dye #5 for #8, and yellow #2 for #7. Alas, the legal hassles that await anyone foolish enough to reverse-engineer Lego's proprietary block-connection protocols have ensured that Lego has reigned unchallenged as the only source for toys you can build cool shit with, despite their inferior product. The "accidental" death of Abe Fromage and the subsequent collapse of Tinkertoys spelt the end of competition, even before Lego started blatantly cloning "CPU" and "robotics" technology from the computer industry for use in their "innovative" Mindstorm toys.

    Furthermore, Red Lobster, Denny's, and other chain/corporation/restaruant/franchise establishments regularly use unconscionable terms in the dining agreements they make with their patrons. As a large corporation, they play from a position of strength: With their high-priced lawyers and large bankrolls, they can freely impose their will on the consumer (commonly by the use of so-called "walk-through" agreements: the restaurant posts it dining agreement on its wall, you and are considered to have "agreed" simply by choosing to dine there, regardless if you have read or even noticed the sign). Examples of this include:

    • "Shirt and shoes required" -- usually extended at the whim of the management to cover any situation that might cut into their bottom line. You must keep your shirt buttoned, shoes and feet off the table, wear pants (although it says nothing of this in the dining agreement), and wear all clothing "correctly" (again, at the whim of the management) -- even if you're wearing shoes, placing your socks on your ears will earn you a quick ticket to the street.
    • Even though you have paid in full for the meal, none of it is "yours" to do with as you see fit -- only licensed to you. You cannot throw your potato. You cannot hold a puppet show with your broccoli. You cannot gargle anything. And don't even think about trying to take "your" plate, ashtray, silverware, or table out the door with you -- if you read the fine print, you'll find that these items were only "licensed" to you for the duration of the meal!

    It is sad, but the powermongering megacorporations who really run our country also have merciless teams of wedgie-men and noogie-goons at their command, and they have bamboozled the media and the government into abusing Microsoft to benefit their own bottom line. What with communistic government interference, backlash from the misinformed public, and the software piracy that is rampant in today's industry, Microsoft can barely stay afloat, let alone research more of the innovative, professionally engineered products the software community has come to expect from them, like Microsoft Bob, the dancing Office paper clip, and email clients that do it all at the click of a mouse! Yay Microsoft! Go Bill! One world, one web, one program!

    1. Re:-1 (Offtopic) by zigzag · · Score: 1

      I second the "That ruled" comment.

      Have you considered writing for the Wall Street Journal?

    2. Re:-1 (Offtopic) by Nerds · · Score: 1

      Wow, that kicked ass. The worst part is that it is the funniest thing I've read in a while and I can't tell anyone about it because they'd just look at me like...well, like they always do when I try to tell them about something I read on Slashdot...

      Oh well, it still ruled.

      --
      My other .sig is 'The Art of Computer Programming'
  71. Not just .DOC by mpe · · Score: 1

    There are other formats, such as .XLS and .PUB which also lock people into using Microsoft stuff.

  72. Re:'Everpresent Office monpoly'? by Phroggy · · Score: 2
    Yes, it really is a monopoly. The last time I submitted a résumé to a temp agency, I e-mailed it as a PDF. I was asked to re-send it in Word format. This sort of thing is VERY common.

    --

    --
    $x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
    $x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
  73. Re:Component Hell by Gat1024 · · Score: 1
    Explain to me just how the proprietary .doc format is superior to the open TeX format then. This obviously is not it, because they both react the same to this situation.

    Not that I like .doc format or anything but the principle of the .doc format is the same as any component object container.

    So, what makes .doc better is also what makes the KOffice format better or any other component storage format. If memory serves me, Tex is more closely related to Postscript than a document format. Sure it can create Postscript output, but it can just as easily drive a typesetting machine directly. The Tex engine is hardcoded to understand certain formats of resources like fonts or graphics. I think there are macros that get around this, but I'm not sure. (It's been a while so if this isn't true anymore, then I stand corrected.)

    A compound file and the program that interprets the streams is generic. It doesn't know anything about the streams in the file. It defers that knowledge to the reader/writer which may have a better way of storing it's document than the generic format. The gist is that new formats can be added without changing any of the underlying mechanisms.

    Functionally, there probably isn't any real difference between a Tex file and and a compound doc file. I just think that with a compound file layout, things that you have to explicitly state in Tex is handled by the low level machinery automatically. Sort of comparing C to C++. Example, because of the class of stream in a COM or OpenDoc file, your application menu changes to allow you to edit the stream.

    It boils down to the fact that no matter what format you have, without the necessary renderers on your machine, you're out of luck. (As you indicated.) Tex is cool because documents are specified using Tex's macros. But it forces your application to think in frames and columns and runs when a tree or a graph might be better.

  74. Great! by fishexe · · Score: 1

    But you can't save it in another format, can you?
    Nor can you edit it. All you can do is view, and only if you own windows. So once somebody has done their work in Office the only choice is to either a)commit to the scarily expensive and frighteningly short upgrade cycle and be doomed forever to keep draining out cash or b) do that work over in another program.

    Note dumb-ass m$ defenders that none of us would give a shit if they wouldn't keep changing the format and forcing upgrades. But evidently bill needs more money than he already has so he can buy another house or something.

    Ever get the impression that your life would make a good sitcom?
    Ever follow this to its logical conclusion: that your life is a sitcom?

    --
    "I don't care about the Constitution!" --Bill O'Reilly, November 17, 2009
  75. Read the Halloween Documents and come back by ink · · Score: 1
    http://www.opensource.org/halloween/

    That pretty much sums it up, and from a Microsoft VP, no less. You can pretend that Microsoft is a benevolent company all you want, but that doesn't change the facts.

    The wheel is turning but the hamster is dead.

    --
    The wheel is turning, but the hamster is dead.
    1. Re:Read the Halloween Documents and come back by spectecjr · · Score: 2

      That pretty much sums it up, and from a Microsoft VP, no less. You can pretend that Microsoft is a benevolent company all you want, but that doesn't change the facts.

      The facts being that Vinod Vallipolli wasn't a Microsoft VP, nor even anywhere near that. He was a grunt.

      Now if you'd said he was a Microsoft V V, then I'd have to agree with you.

      I can sit down right here and now and write a document that claims the best way for Microsoft to make money is to take Linus, strap him to a chair with electrodes on his testicles, and fry him like a bug on a hotplate.

      This document would get leaked.

      Does this mean that this is happening in real life? Well, goddamnit YES! Linus is strapped to a chair! Right now! With electrodes on his testicles!

      Funny how you never saw the leaked document which says that Microsoft would be better off if they gave all the lower-level peter-principle'd management a good kicking, stopped the infighting, and stopped the use of brute force in their development practices.

      Simon

      --
      Coming soon - pyrogyra
    2. Re:Read the Halloween Documents and come back by Omnifarious · · Score: 1

      That's because nobody actually believed they'd actually do anything mentioned in that one.

  76. Bento in OpenDoc Re:Stimpy - Yee-u eediot!) by treedragon · · Score: 2
    Gat1024: Try grabbing the dead OpenDoc spec at look at their bento container. It's design goal is exactly like *.doc.

    I worked on Bento. I was not the designer. Jed Harris was the designer (Ira Reuben the coder). Jed said Bento was an experimental first cut prototype that was pushed into production, and I agree with this view.

    The design goal was only rather similar to *.doc. Unfortunately, since Bento was a version one prototype, it never had a redesign for ease in reading and writing until I designed one.

    Gat1024: And that was designed from the get-go to be cross platform.

    It was technically cross platform, but Bento was very unfriendly as a clearly understandable format. It's big mistake was to use phsyical stream embedding instead of logical embedding, so the recursive flow of control was a nightmare to analyze. The format had physically discontiguous streams embedded inside other physically discontiguous streams, which would give almost anyone the shudders.

    Gat1024: Think of it as component hell. And it is unavoidable no matter who does it. This goes for KOffice as well. Complexity is a run away train. I should say entropy. Since we're tending towards chaos here.

    You are correct that every open format can embed opaque content that cannot be understood, so all component systems suffer from the risk of component hell.

    I would not accept any amount of money to reverse engineer the Office doc format as a regular job, because it would tend to be too hard and frustrating to deal with the complexity under ongoing changes.

    Furthermore, I would not trust any junior engineer who did accept such a job, so I would avoid the product based on such work, under the theory it would be fragile and buggy. Am I a pessimist, or what?

    David McCusker, former Bento guy

    --
    Values have meaning only against the context of a set of relationships.
  77. Re:-1 (Offtopic) (Should be +1, Funny) by superkorn · · Score: 1

    That ruled. I would mod you up but I have no points at the moment...

  78. Half a step up by fishexe · · Score: 1

    For that matter, there's also text format! That's perfectly well documented and 100% open!
    Truth be told RTF is only about a half-a-step up from text. In fact I prefer to send and recieve text because I can deal with it in emacs or ae. That and when I tried d/ling abiword its rtf filter wouldn't work.

    Ever get the impression that your life would make a good sitcom?
    Ever follow this to its logical conclusion: that your life is a sitcom?

    --
    "I don't care about the Constitution!" --Bill O'Reilly, November 17, 2009
  79. Restating the question, one more time. by Arker · · Score: 2

    Not that I like .doc format or anything but the principle of the .doc format is the same as any component object container.

    Functionally, there probably isn't any real difference between a Tex file and and a compound doc file.

    I really think this was my entire point. They both do the same thing. The only difference is that one (TeX) is an open format dating to the early 80s, while the other (.doc) is a proprietary format that changes every 2-3 years. They both do the same thing, so what possible justification could there be for using the second? Assuming, for a moment, that M$ is, as they claim, concerned with producing real benefits to their customers, I don't see any point to .doc. Do you? If so, please explain it.

    --
    =-=-=-=-=-=-=-=-=-=-=-=-=-=-
    Friends don't let friends enable ecmascript.
  80. I've noticed the same thing by fishexe · · Score: 1
    With word html conversions, every single line has a
    in it, even if it's just blank. And blank lines are always double height because a carriage return becomes

    &nbsp

    instead of just
    . The placement of their tags is just bizarre!

    I usually just use that filter only for long documents in order to not have to retype all the text (they hand us things in .doc), then go back and edit the HTML by hand to clean up.

    Ever get the impression that your life would make a good sitcom?
    Ever follow this to its logical conclusion: that your life is a sitcom?
    --
    "I don't care about the Constitution!" --Bill O'Reilly, November 17, 2009
  81. non-conformant XML by Nexx · · Score: 1

    Which means that at some point they'll start changing the definition of XML to close out competitors. They've always taken this approach, why do you think they won't this time?

    MS already does this, with HTML^H^H^H^Hrubbish that gets spat out from Word2k, when you do a "save as html". It's rather frightening, actually, to see the actual code.

    1. Re:non-conformant XML by spectecjr · · Score: 2

      MS already does this, with HTML^H^H^H^Hrubbish that gets spat out from Word2k, when you do a "save as html". It's rather frightening, actually, to see the actual code.

      Office output is fully XML/XSL transform compliant - which is why Opera can handle it perfectly fine.

      Also, a lot of the stuff in there is for round-tripping; it doesn't get used by a browser for display - the XSL transform just deletes it to all intents and purposes.

      Simon

      --
      Coming soon - pyrogyra
    2. Re:non-conformant XML by GeZ117 · · Score: 1

      > Office output is fully XML/XSL transform compliant - which is why Opera can handle it perfectly fine.
      AFAIK, they make their HTML file hard to retranslate in anything else. See here for further details.

      --
      sigmentation fault
  82. Catdoc by SlapAyoda · · Score: 1

    Well. You can always use Catdoc to view doc files in console. Just catdoc and it outputs it in plaintext.
    Just making a note of a neat little utility I found.

    Cheers,
    SlapAyoda.
    # SlapAyoda
    # SlapAyoda@yahoo.com

    --
    # wrote sig.txt, 23 lines, 31337 chars
  83. I see - Re:Restating the question, one more time. by Gat1024 · · Score: 1

    Does the format provide any real benefit to the customer? I don't know. The format seems to be geared more towards programmers (MS's) than customers. Which (in an ideal world) would lead to more efficiently designed applications.

    I do know that the format is a red herring. Look at how MS is so excited about XML. It's all the components that need to be duplicated. A Office2000 document in TeX would be no more translatable than the .doc one it produces right now.

  84. Re:'Everpresent Office monpoly'? by roystgnr · · Score: 2

    If its compatibility we are looking for here, why would expect MS to do it?

    Because their customers expect them to make decisions that make their software better for the user, particularly when those decisions would come at little or no (or negative, in the case of maintaining a consistent document format) cost to Microsoft. The fact that Microsoft repeatedly changed the Word format costs themselves and their competitors money for additional programming work on filter and import/export code, and costs their users money for repeated unnecessary upgrades, incompatibility hassles with other programs. Looking at the Microsoft+competitors+users system as a whole, there is no benefit to anyone for Microsoft to use a poorly documented, convoluted format without an accurate public specification.

    However, looking at MS, competitors, and users independently, it's obvious that while the value of the system as a whole is reduced by Microsoft's decisions, the handicap that it gives to competitors and the additional revenues it generates from users causes more of that value to end up as cash in Microsoft's hands.

    This isn't the way a free market is supposed to work. If someone makes an inferior product, I'm supposed to be able to switch to a different producer and not be adversely impacted by said product. (and as a side effect, my readily available choices encourage all producers not to produce inferior products) Unfortunatly, when you add network effects, i.e. the requirement that my new product be compatible with the old, suddenly Microsoft has the ability to use an existing large marketshare as it's own "benefit", to make it self-sustaining, to reduce or eliminate that choice.

    I'm not saying that, after thinking about it, it doesn't make sense for Microsoft to do just that. I'm just saying that, to consumers used to having a wide selection of companies competing solely based on price and quality for their purchasing dollars, it certainly counts as "unexpected".

  85. Re:I see - Re:Restating the question, one more tim by Arker · · Score: 1

    I don't believe this is quite true. If Office2000 was putting out TeX documents then any problems in translating them would be perfectly opaque. Programmer time could be allocated most effectively in that case, to real problem points, rather than to red-herrings.

    --
    =-=-=-=-=-=-=-=-=-=-=-=-=-=-
    Friends don't let friends enable ecmascript.
  86. Re:Word 97 RTF format by Old+Wolf · · Score: 1

    This was a bug in Word 97 RTF handling, which has since been fixed.

  87. Re:'Everpresent Office monpoly'? by HermDog · · Score: 1
    The last time I submitted a résumé to a temp agency, I e-mailed it as a PDF. I was asked to re-send it in Word format.
    Me too! Were you able to resist the urge to include a Word macro virus? The one that deletes all the other resumes on the computer?
    --
    --
    JADBP
  88. Why is this legal? by led_belly · · Score: 1

    Why is the reverse engineering of DWG any different from the DeCSS? What are the legal ramifications?

    1. Re:Why is this legal? by mrBoB · · Score: 1

      Because CSS is a format agreed upon by hundreds of money-hungary (rich) folks _who've got nothing better to do with their time_. Plain and simple, CSS was not a format meant to protect the IP of movies, it was meant as a tool to enforce legal restrictions on copying movies. Heaven forbid a person tryed to watch a DVD on a free OS.

      bob

  89. There *is* a standard for rendering HTML... by yoz · · Score: 1

    ... if you combine it with CSS.

    All the replies here seem to say that not only do all browsers render HTML differently, but that's how it should be. However, that's not the case when CSS accompanies a document - in fact, it's just the opposite. CSS performs all the page-layout and style description that HTML wasn't meant to do. Also, there are specific standards for how HTML+CSS is meant to be graphically rendered; check the various test suites available at http://www.mozilla.org/newlayout/te stcases/css/. (Yes, I know that different browsers have different levels of CSS conformance, but that's to do with buggy and incomplete implementations, and nothing to do with lack of clarity or general unsuitability of the standard. There's only one released browser that has full CSS1 apparently, and that's IE for Mac.)

    Please bear this in mind when reading the other posts in this thread regarding graphic designers, especially those that suggest that we need a completely new format. We don't. We just need proper implementation of an existing one.

  90. Hey! TummyX! by zigzag · · Score: 1

    What's the difference between a mercenary and a prostitute?

    1. Re:Hey! TummyX! by Tony-A · · Score: 1

      The prostitute has better morals.
      Something like Larry Flint being the social conscience of Washington.

  91. The empire is evil. by Bushwacker · · Score: 1

    The reason for this is probably pretty simple: no one has ported the .doc format entirely becase it is fundamentally a DOS API. Unless you have the core code for Word, it would be theoretically impossible. Also, since the practically OWN the software industry (GNU excluded, of course), they can boss everyone arround and assimilate the competition. If the Tech industry was StarTreck, M$ would of couse be the Borg. Windoze software boxes would be their cubes.

    --
    -----------------------------------------
    Perversely greped and groped by PowerPenguin
  92. I can only assume most of you haven't used Corel2K by sabaco · · Score: 1

    I've been using Corel Office 2k at home and at work exclusively since it first came out. And I have not yet had one problem with it's conversions. In fact, there are a number of errors that can occur that make the file unreadable in MS Office, but it still opens perfectly in Corel. So while the 2 might not offer identical renderings, I think Corel does if anything a better job at displaying .Doc than M$ does.
    -- Braeus Sabaco
    Member of the Roman Legion
    Customer/worker at Phenomenal Internet Solutions

    --
    This is SO educational! -- Kintaro Oe
  93. Easy handling of .doc files: by lifebouy · · Score: 1

    Two steps:
    1. Your .sig should say something to the effect of: "All files ending in .doc are from my account before opening due to a. it being a proprietary file format which my machine does not understand, and b. the likelyhood of viruses embedded in them. Take your Pick. Please use .rtf when sending your data."
    2. filter out any emails with an attachment named *.doc.
    Hey no more .docs! The "I'm ignoring you!" tactic. children have used it for years. Because it works;)
    3. (I can't count) With access to a prompt, simply write a script to email your favorite .doc junky once a week asking them to post in another format.(for instance, all those parts manuals that are in .doc: email the company) Be sure and be very polite, and be sure and check to make sure when they post in RTF or your favorite non-proprietary format. The idea is to rattle the can, not to ring a gong.
    These are very simple pro-active steps that you can take that will help change the digital world.
    Also, make sure you change email addresses once in a while on that script, so they don't just filter you. Spam is a tool for good, too;) Just dont be obnoxious, then you won't get what you want.

    --
    Drop me a line at:
    Key ID: 0x54D1D809
  94. What's the problem? by evilviper · · Score: 1

    I must have missed something... StarOffice 5.2 beta PERFECTLY converts from almost any format known to mankind. You can open a M$ office 2000 document without any loss of formatting at all. There is only a slight loss of formatting when going from StarOffice formats to M$ formats, and that's only in the extremely advanced features that M$ office doesn't have (like rotated text, transition effects that powerpoint doesn't have) So if I missed something, let me know! But as far as i can tell, this case was closed before it opened.

    --
    Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
  95. Re:I see - Re:Restating the question, one more tim by Gat1024 · · Score: 1

    The problem isn't the format. It's the components. How a .doc file is created and read is pretty well documented and understood. The problem is that all the streams or virtual files inside of them are created by components that change from version to version.

    Even in TeX you have to hunt down the correct macro includes (a "component" in the TeX sense) if they're not installed inside the document. 'Twould be a lot of duplicated macros and wasted space if every document carried with it the requisite macro libraries that it needed to draw itself. That's why TeX has a bunch of standard macros that you don't have to send along with the document.

    So let's say you only embed the macros that the document uses -- not every single one in a library. What do you have? A document that is renderable but not 100% editable. Because the macros that support some additional features are not present.

    As soon as you start relying on external libraries (macros for TeX, components for COM or whatever) you run into component hell. The problem isn't figuring out the file, it's providing the required library/component functionality that a document needs. Each component has it's own operating context that itself depends on the document context.

    All of this is why you get blank glyphs in TeX. And why translaters can only get 85% there for .doc files.

  96. Smartest people? by Troy+Roberts · · Score: 1

    "Why would a company with the smartest people in the world make life more difficult on themselves by making their own formats hard to read?"

    Why do you think Microsoft has the smartest people in the world? There is no convincing evidence of that. In fact they have spent many years and have not been able to produce a stable operating system.

    They have, however, marketed the hell out of what they have managed.

    Compared to places like IBM's Thomas J. Whatson Labs or Lucent. Microsoft has done not significant research or produced anything original. They have been very good at seeing computing trends and buying companies that have products that do what they want.

    Really, I suspect you know nothing of recent computing history.

  97. good luck by cfleming · · Score: 1

    shit, micros~1 can't even get new versions of Word to read old doc files correctly. micros~2's own filters seem to fall off exponentialy with time. I swear they don't even test those things.

    Conversion % = exp(-dt)

  98. Objective critique of .doc by tjstork · · Score: 2

    A bit of background:

    Word is a COM object and uses COM extensively. OLE was at the roots of COM but these days OLE is just another set of COM objects that one either implements or uses. So.. from the get go, one needs to implement COM, and also IStorage/IStream, on Linux, to get at Word. This would be ok if COM were an open standard, but it ain't. Where in MSDN is the VTABLE format for COM? It isn't there.

    Strike 1 against Microsoft. By strike I mean that they are doing the usual evil empire thing by not opening up COM.

    Philosophically, IStorage / IStream are a set of COM objects (read Libraries), for divving up a file into its own directory mechanism. The rationale for doing this is that end users want to copy documents as entire entities, and not deal with 200 or even 2000 subdirectories or small files that might comprise a total document. In Microsoft speak, a document must be a moveable entity, and in that regard, COM library based documents are entirely defensible. However, what goes into each of those subdirectory entries, or streams, is free to remain largely undocumented. It is the design intent of COM to ensure interopability between closed interfaces. At this, COM does stunningly well. You can script against COM in any language... but the medium for interchange is an application that you must always have in order to view the document.

    Strike 2 against Microsoft. COM IS an excellent piece of software engineering, but it is engineered to do the hypocritical thing. The easiest way to make things interoperable is to post the source...

    Much ado has been about Word changing file formats. The critiques of Word say that it is unnecessary to change file formats between releases. This is non-sensical hogwash. New features mean new data requirements, and new data requirements mean new file formats. Every other application on the planet has versions of formats and downward compatibility problems. Have you tried looking at a style sheet page in Netscape 2.0? That Word changes file formats is reasonable.

    Hit: Microsoft.

    Some criticism has been made about how a Word document changes appearance based on the display or print device. This is in keeping with the philosophy of Windows - which is to enable software features only if the hardware is present to support them. This is radically different from Unix, but this hardware-centric approach of Windows IS defensible on many merits.

    Hit: Microsoft.

    Word has, in effect, an autoexec scripting mechanism with no sandbox and no security besides that which the user security context of the OS offers. Since Windows 98 effectively runs everyone as root, the vast majority of Windows Word users are flying blind into a cliff.

    Strike Three: Microsoft.

    The bottom line is this. The .doc format and the entire idea of files within files has a lot of merit, as does the concept of only dealing with content supported by ones hardware. However, given the lack of openness by Word file formats, by COM, and the lack of security, Microsoft strikes out.

    The bottom line is this:

    If Microsoft had opened the Word file format, then Word files would have been the defact web page of the Internet, not HTML. That we are doing HTML and HTML rendering engines is testimony to how badly Microsoft missed a golden opportunity with Word. To protect their Word Processing IP, they made sure a non-Word file format (HTML), would become the lingua fraca of the Internet. That by itself is a compelling argument in favor of open file formats.

    --
    This is my sig.
  99. Re:For that matter, why can't we reverse.... by sgifford · · Score: 1
    Postscript wasn't reverse engineered, it was re-implemented based on the published PostScript standard. See the "PostScript Language Reference Manual" (ISBN: 0201181274).

    As for a UNIX version of Windows, see WINE:
    http://www.winehq.com/

  100. Monopolistic practices? by sheldon · · Score: 1

    You really didn't answer my one question, which is where is WordPerfect's file format located on Corel's website?

    Similarly is the Lotus WordPro file format located on the IBM website?

    I guess both Corel and Lotus are using monopolistic practices to...

    Oh wait a minute, you really don't know what you're talking about do you?

    Personally I've never liked Office, but I don't think my like or dislike for a product should influence this discusison. Unfortunately your dislike for a product has blinded you to the reality of the industry.

    Oh, and I don't know what difference restrictions will make. Even if they were to survive the appellate court.

    1. Re:Monopolistic practices? by fb · · Score: 1

      >You really didn't answer my one question, which is where is WordPerfect's file format located on Corel's website?

      Here:
      http://www .corel.com/partners_developers/ds/CO32SDK/docs/ff/ A_FRNTFF.htm

      --

      --
      fB
    2. Re:Monopolistic practices? by frank249 · · Score: 1
      You really didn't answer my one question, which is where is WordPerfect's file format located on Corel's website?

      As noted above, the WP file format is here.

      I guess both Corel and Lotus are using monopolistic practices to...

      Oh wait a minute, you really don't know what you're talking about do you?

      Personally I've never liked Office, but I don't think my like or dislike for a product should influence this discusison. Unfortunately your dislike for a product has blinded you to the reality of the industry.

      Hey man, wake up. If you have not noticed, Microsoft just went through a two year antitrust court case and lost. The findings of fact are now in the books forever. Any appeal might change the penalties but the fact is that Microsoft used illegal business practices to bone the consumers. Note that there are now 135 civil law suits pending against them.

      Oh, and I don't know what difference restrictions will make. Even if they were to survive the appellate court.

      Do you know that originally some of the states wanted the case to be about the way Microsoft used the bundling of Word with Excel to be the main focus of the case? Corel and Lotus have not sued yet but they would have a hell of a good case especially now that most of the work has been done in this trial.

      When the restricitons come into play, other companies like Corel will have a chance to win contracts based on the merits of their product as the companies will not have to worry about Microsoft taking revenge against them.

      --

      Today's vices may be tomorrow's virtues.

  101. Re:I see - Re:Restating the question, one more tim by benedict · · Score: 1

    What do you mean by "opaque"?

    --
    Ben "You have your mind on computers, it seems."
  102. A Brief History of IntelliCAD by Bernal+KC · · Score: 3
    So why hasn't .DOC been reverse engineered?
    As one who used to make my living building new versions of AutoCAD, I think I have something to say about this. Even if the /. attention span has moved onto more immediate stimuli.

    By now we know that both the DWG and DOC format have been reverse engineered. We also know that it really does not matter. Autodesk/MS control the data formats. Their rendering of the data is the reference implementation -- and they both change the format at will. They both exploit run-time and new version peculiarities in their rendering of the data.

    When it comes time for a company to decide which product to invest in, when it's time to choose if they want to use the proprietary product or some wannabe cheap-o competitor, the answer is alway the same. Go with the standard bearer. And that really is the correct answer. The price differential is completely and totally irrelevant. Corporations invenst a lot more in labor and data than they invest in any one version of a software product. The "open source" factor is -- if not irrelevant -- not appreciated. It is secondary at best.

    Look at IntelliCAD. They attempted to commoditize R12 AutoCAD. Supposedly nobody wanted any of the features crammed into post-R12, post-multiplatform AutoCAD. R13 was a bitter pill for AutoCAD customers and loyalists. Supposedly IntelliCAD would allow drafter/designers to draw basic 2D engineering drawing just as well as R13++ for half the price. More importantly, they thought they had given companies that had huge investments in DWG data a viable alternative -- a way out. They could jump from the ship they were supposedly dissatisfied with and seek alternatives.

    But you know what? Nobody took the offer.
    Not before IntelliCAD was "open source" and not after.

    It turns out that Autodesk was able to pull off R14 and salvage their reputation Turns out customers were not all that dissatisfied with Autodesk -- which they correctly saw as a well entrenched, healthy (==rich) partner, committed to investing in both AutoCAD and other forward looking design products and technologies. Turns out AutoCAD is very capable of getting the drafting job done. Besides, IntelliCAD was for shit. Still is. And when Visio sacked the original ItelliCAD development team - a very idealistic and motivated group -- because ICAD was released prematurely with bugs and feature gaps -- any idealism or customer loyalty went out the window. ICAD was exposed for what it had become -- a cheap knock off with no future. The so-called open sourcing of IntelliCAD was just window dressing. The fact was that Visio had interred it's mistake in preparation for acquisition by MS. (It also parted ways with the folks that had inspired IntellCAD, FWIW.)

    So what does this have to do with .DOC?

    You could come out with a .DOC compatible word processor without a super-human effort. But wihtout the VBA, without the quirky rendering, without all the nuances and endless litany of features of Word it would be nothing more than a knock-off. It would have to beat Word on functional terms in order to be attractive. That would be a very tall order. Like it or not, Word and AutoCAD are very mature products. Maybe they attempt to do too much. Maybe they are bloated with features that any one customer does not want or need. But a whole lot of customers are well served by these products. They get the job done for a broad spectrum of customers.

    They are both going to be very, very hard to disslodge.
    It's their game to loose.
    Beating them on the merits will be damned hard, and possibly not enough.

    And, just to goad anyone still reading, being "open source" or not has nothing to do with it.

    If open source is a strategic advantage, it will hvae to do with stamina and longevity. Eventually MS/Autodesk will find it hard to keep milking their cash cows. Eventually they will find it harder and harder to justify continued investment in these products. Eventually the WinX platforms both producst are married to will fade. At that point, when Word and AutoCAD stagnate, they may be vulnerable to an open source comminity that can run endlessly on no cash, that can build bridges to newer, more current technologies.

    I'm not holding my breath.

    In fact, I've changed jobs to get out of the CAD industry. The action is elsewhere. I may not live long enough to see AutoCAD take a fall. It may never happen.

    PS: In the CAD space, the most intersting open source activity is not IntelliCAD. The Matra folks have a more interesting offering. IntelliCAD is a corpse. OpenDWG may prove useful if and when the action moves beyond AutoCAD. If that future is to involve open source, it will more likely be centered on Matra than OpenDWG.

  103. Re Odd Formatting by Danious · · Score: 1

    "it's obvious that Word just throws odd formatting codes where ever it pleases, and never bothers to clean them up"

    That's almost just what it does do :-) I remembr reading an article in a mag a while back where an anonymous MS code slave explained the basics of how .DOCS are created. This is vastly simplified and from an failing memory, so no flames please where I screw up :-)

    From memory, he said that .doc is sort-of a diff file. You start with an (almost) empty file, into which Word inserts your text and formatting as you type and click. So far, so good, but when you go back to change things, it doesn't do the obvious thing and change it in the file, it actually appends your changes onto the end of the file.

    When you reload the file, it sort-of starts from the beginning again, and applies your changes in the order they occurred. That's why you find the format commands all over the place: the file holds location details for the target and the action to be applied. This is also why old, frequently edited documents get so large (and so slow to load).

    Try it. Create a document. Do lots of editing, make lots of changes. Save it. See how big it gets. Now use 'Save As'. Watch it shrink as Word goes through and trys to clean up the mess it's made.

    My favorite quote from the MSloth: "Word docs are mostly space". Sort of sums M$ up nicely, don't you think?

    Of course, I could be wrong, it's a frequent occurence...

    John.

    1. Re:Re Odd Formatting by MasterAlex · · Score: 1

      From memory, he said that .doc is sort-of a diff file.

      Yes, it is. But only if you activate the switch to speed up file saving! (It is switched on by default, so I think about 90% of the Word files are created this way, but you _can_ switch it off.)

  104. Intellisense an innovation? by btempleton · · Score: 2
    Doing stuff like this was developed in the early 80s in projects like the Cornel Program Synthesizer.

    I myself developed a syntax directed editor in 1985 called ALICE -- see this page to download it for DOS or Linux -- which still 15 years later does more than Intellisense.

    There are some MS innovations but this is also 20 year old stuff.

    --
    Has it been over a year since you last donated to the Electronic Frontier Foundation
  105. Re:Flamebait? by Malcontent · · Score: 1

    I have noticed that all the pro MS posts always get moderated up pretty good. Atleast three usually five. Something is going on. I think all the astroturfers are moding each other up.

    --

    War is necrophilia.

  106. Re:Component Hell by John+Allsup · · Score: 1
    A critical problem with editing TeX, for which realistic solutions are only now arriving, is that it is difficult to track the implications if various macro definitions and font definitions.

    It is very difficult to have a setup with real time feedback as to changes you make to a TeX document.

    Furthermore, it is even harder to write an editor for TeX that allows you to use the extensibility without the possibility of breaking the document (i.e. the TeX file the editor spouts needn't be able to fit through a TeX compiler)

    The strict structure enforcement possible with XML together with the possibility of database backends makes for a far brighter future. (p.s. suppose you want to search through a bunch of TeX documents and extract all definitions, say. The ability to do this at all requires discipline from the document author, and so realistically limits itself to a single author set-up or a closely knit group. (SG/X)ML with DTD's doesn't suffer the same fate, and can still use a TeX formatter as the backend.)

    I'll end with a quick point, very worthy of note.
    TeX is for typesetting, not wordprocessing and general document production. It is only the clever design and extensibility of TeX that makes it even suitable for such tasks.

    John
    --
    John_Chalisque
  107. Bundled s/w simply didn't sell enough add'l h/w by kriegsman · · Score: 1
    Several execs at Apple told me that when Steve Jobs came back, he told them to take a new look at everything they were spending money on. The "perceived added value" of the bundled software simply wasn't responsible for enough incremental hardware sales to justify the basic financials of the bundling deals. Fundamentally, Apple is in business to sell hardware (hence 'no more clones', too).

    Plus, the guys from NeXT (a software company) had a mild allergy to paying for and bundling software from other companies, and that hastened the termination of some of the preexisting bundling deals.

  108. Re:Too bad DOJ did not ask to open up MS file form by DavidOgg · · Score: 1

    Good Idea.

    But make them document the Win32 API as well, and make Source Licenses available for their products, doesnt have to be free, but AVAILABLE, like in the Unix days.

    Why the hell is the default formatting HTML for posting messages, what the hell is slashdot thinking?

    --
    Fear the government that fears your guns. Fear the government that fears your computers. Remove them from my email.
  109. First things first by BigBadaboom · · Score: 1
    Now, why can't Corel, Lotus, Sun, etc. band together and reverse-engineer Microsoft's file formats properly?

    If they were to do that, then I'd like to see them to put their money where there mouth is and open all their own proprietary formats first.

  110. Its bloody well pretty much done... by caolan · · Score: 3
    Listen again and again this comes up, and again and again I make the point that my wv does read .doc format. Abiword uses this for their .doc import. KWord uses a munged copy of it too. It is not perfect, but it does support versions 6, 95, 97 and should handle 2000 as well.

    Its GPLed, granted it needs work. So scoot onto the abiword mailing list and cvs down the latest version, get hacking on it and sort it out.

    ole2 is fully sorted out with libole2, excel is being handling by gnumeric.

    What is not handled by wv is not by lack of documentation or design, its simply a matter of spending some time at it. Easy peasy. Info on the MSDN docs can be got from here. They can be gotten off the MSDN 1998 July cd, or you can get some of them from wotsit.org. I even wrote ivt2html for you to convert the office.ivt file into html. Like what else do you need.

    90% of all the hard work has been done, wv can parse fast and simple with no bother to it, which was a nightmare to do, it can construct the correct PAP (paragraph properties) and CHP (character properties) for a given run of text. Feed you the correct characters and charset and font, the TAP (table properties), graphic properties and handle to graphics. The correct OLE handle for embedded objects. Document properties etc. There is an example html conversion program included for reference (wvHtml).

    I put together libwmf to convert wmf file into something useful as well. Theres a half done implementation of an Escher (the graphics for Office) importer floating around in there as well.

    Theres also an implementation of a Summary Stream displayer for all ole2 documents.

    I even bust my ass and dragged together the right bunch of motivated people to help implement the decryption module for word 97, 95 and 6, and that was not fun at all to say the least

    The hard work is done, if you want something improved you have a very very solid base to work from. Yes the spec is confusing, yes its not a great format, yeah is sort of moves over time, but in a fairly rational way that can be supported with some work. There are any number of equally crap formats with weak documentation supported in various tools.

    There is just this false myth that the Microsoft formats are inpenetrable and/or not available. Just download wv, fair enough there might be problem documents, if there are, just debug wv and get onto the abiword list and work it out with them. If something fails it can be fixed and improved, its not a case of "ah well, its a MS format, nothing can be done". If you truly want to handle Microsoft formats there are a number of people working on it that you can help.

    So its right there for the right bunch of motivated people to work on. C.

    --
    I sometimes write stuff
    1. Re:Its bloody well pretty much done... by caolan · · Score: 1
      I might as well follow up to myself with some cynical commentry. I was far too late to meet the slashdot moderators so there are 300+ useless moanings about the lack of a doc converter sitting up nice and high while an attempt to draw attention to an existing GPLed function converter just screaming for helpers languishes in the doldrums

      Oh the bitter irony of it all

      C.

      --
      I sometimes write stuff
    2. Re:Its bloody well pretty much done... by brix · · Score: 1

      Ahh, the advantages of sorting newest postings to the top. You'd think more moderators would get the hint by now that the best (and worst for that matter) postings don't always appear early.

  111. Re:Specification != Documentation(Samba-like probl by Bazzargh · · Score: 1

    This is really a side issue. When you say wvware converts bold (style) to bold tags you're only looking at the html converter. Html conversion is always going to be poor because there isnt a one-to-one mapping to word features. (which is no excuse for word 97 converting 'heading 1' to html font tags, and 'h1' to bold style!). However, wvware's real strength is the conversion to a neutral xml format which you can mess with to your heart's content. You're generally better off starting with the xml then using a (XSL-T) stylesheet to get nice html out of it - and write your own CSS.

    BTW I've contributed code to wvware and there were, last time I looked, features of the spec which remain unimplemented (I was only doing optimisation patches so I don't know if the features really were undocumented - but Caolan had put in comments to the effect that he didn't know what some flags were for).

    Frankly I don't care if wvware doesn't make the document look (in html or whatever) like the original, which a lot of people seem to want; its real job is to extract the data from that crazy format into one which mortals can use. If we can extract the style tags, then convert them to something sane in a.n.other tool, what more do you need?

  112. Re:'Everpresent Office monpoly'? by (void*) · · Score: 2
    Why should they have to make it easier for you not to use their products?

    This the exact kind of attitude that should turn people away from MS. Why ? Because it is Bill Gate's explicit goal (and he goes to TV to say this) that MS wants to bring computing to the masses.

    Pretend you are him, and you want to achieve this goal. By what means should you use? Closed file formats with lousy specifications? How does that bring computing to the masses when they are prevented from speaking to the Unix Priesthood?

    If you, as a MS lackey and worshipper, believe that this is not MS's responsibiilty, then please go take it up with Bill, your prophet. He has stated publicly and many times that this is his goal. Remind him that MS's duty is to the stockholders and they should make as much money as possible. Please tell him that, and also tell him to STOP LYING to the American public.

  113. Why? by Hard_Code · · Score: 2

    "Now, why can't Corel, Lotus, Sun, etc. band together and reverse-engineer Microsoft's file formats properly?"

    Because the formats suck...?

    --

    It's 10 PM. Do you know if you're un-American?
  114. DWG has not been totally reverse engineered by Quack1701 · · Score: 1

    As someone who has spent much time trying to convert drawing between AutoCad and Microstation, I can tell you that there is not a single product on the market that comes close. None even attempt to guarentee they can convert the files.

    Yes, the basics of the DWG files are well understood. But I guarentee you if you take a professional engineering AutoCad drawing with all the different layers and all the "smarts" attributed to the different elements, and you try to convert it to any format, you WILL lose most of the "smart" information stored in the DWG file. You will also probably have problems with the different layers and linewidths, colors, etc...

    DWG is not a good example unless all you trying to do with the .DOC file is read the plain text. However if you wish to recreate the entire rich document with all annotaions, footnotes, headers, footers, graphics, etc... your going to be in for a major task. Microsoft even has difficulties converting large, rich word files.

    Quack

  115. Re:Flamebait? by Tony-A · · Score: 1

    I think you are right. Completely in line with Microsoft's practices and ethics, or lack of same.

  116. Quoteth the EULA by inetd · · Score: 1

    in short, by agreeing to "run" and "install" this application you are not permitted legally to reverse engineer it.
    same thing applied to the "ellison challenge" at comdex a year or so ago, users of sql are permitted to publish benchmarks without explicit permission from microsoft.

    1. GRANT OF LICENSE. This EULA grants you the following rights:
    Applications Software. You may install, use, access, display, run, or otherwise interact with
    ("RUN") one copy of the SOFTWARE PRODUCT, or any prior version for the same operating system,
    on a single computer, workstation, terminal, handheld PC, pager, "smart phone," or other digital
    electronic device ("COMPUTER"). The primary user of the COMPUTER on which the SOFTWARE PRODUCT
    is installed may make a second copy for his or her exclusive use on a portable computer.

    Limitations on Reverse Engineering, Decompilation, and Disassembly. You may not reverse
    engineer, decompile, or disassemble the SOFTWARE PRODUCT, except and only to the extent that
    such activity is expressly permitted by applicable law notwithstanding this limitation.
    Separation of Components. The SOFTWARE PRODUCT is licensed as a single product. Its
    component parts may not be separated for use on more than one COMPUTER.

  117. URLs of software that opens Office docs by dan_bethe · · Score: 1
    Here is a list of applications that can open Microsoft's proprietary file formats. But first, I ask you all what good even an open standard is from a company who champions most of the world's business and personal document formats, if that company doesn't follow their own standard? We must script one copy of Office such that it acts as a cgi-bin, converting all submitted proprietary docs into an open standard.
    • http://www.wvWare.com/, maybe the best open source Word converter? Formerly "mswordview", it's a library and a front-end app, which is currently AbiWord's converter.
    • word2x
    • AbiSource, a company producing an open source, cross platform, comercial office suite. Their motto was "SHOW ME THE SOURCE!!!", which we had to scream at the March 1999 Linuxworld Expo in order to get their t-shirt.
    • Adobe FrameMaker for Linux -- Not sure if it does Office, but it's a commercial word processor!
    • VistaSource / ApplixWare -- Cross platform, partially open source, complete office suite and integrated development environment in the form of either a local app, or as a Java-based thin client plus app server architecture. Compare to StarOffice. My experience has been that you can send an un-convertable Office document to Applix's closely-monitored community support mailing list, and they will attempt to modify Applixware's import filters around it, and send you a patch. How cool is that?
    • S un StarOffice. Very good as well. Complete office suite. StarOffice and Applixware are capable of replacing Microsoft Office for literally most people.
    • Corel Wordperfect -- See also Corel's Linux distribution.
    • KDE's KOffice -- Open source office suite.
    • Freshmeat.net's index of office apps
    Here is a list of how to buy books for tutoring you on how to use these products, including reviews and price comparisons, and free shipping from Buy.com. In order of my personal preference. Any others? Perhaps some that are embedded in the ton of entries in Freshmeat's office index? Let's hear some authors pipe up! Slashdot's html submitter seems to be busted, so try to fix the above urls by removing their spaces!
  118. Open Source vs. Open Standard by grue23 · · Score: 1
    It is sort of odd to me that people are talking about this issue in terms of 'Open Source'. Coming from the network world, it is fairly plain to me that the issue here is whether or not the format that the content is saved in is an open standard or not.

    Fortunately, in the networking arena, customers value whether or not the products they are buying support open standards. That way they know that the switch they buy from Foo Inc. will be able to talk to switches from Bar Corp. in case Foo Inc. goes out of business.

    This is not to say that there are not proprietary formats used in networking. There certainly are. However, proprietary functionality in the networking world usually comes in the form of additional features that are built on top of existing standards. If you have devices from Foo and Bar talking to each other, they just won't be able to use those extra features that Foo devices provide rather than causing the entire system to break.

    In the case of word processors, we have a much different situation. None of the proprietary document formats are supersets of an open standard format. This means that in order to have absolute confidence that you will be able to read data saved by M$, you better have a M$ reader.

    Network device providers of course have more of an incentive to follow standards because network admins can't take 20 minutes with every packet editing them by hand to convert from the Foo to the Bar format. ;)

  119. Re:For that matter, why can't we reverse.... by gmac63 · · Score: 1

    there were a few that weren't. As for reverse engineering. I was refereing not to a Linux implementation, rather a standalone implementation.

    thanks

    --

    INSERT INTO comment VALUE('Doh!') WHERE user='you';
  120. web designers != graphical designers by rlowe69 · · Score: 1

    A lot of graphical designers are bad web designers, but that it only because they misunderstand the medium. Truly good overall designers will understand the medium they are presenting on and design something suitable for it.

    In my opinion, design doesn't just include graphics as some people think, but also navigation and element placement on UIs and such. Careful design in these areas seperate good designs from ones that are like "broken windows with nice curtains", so to speak.

    Anyway, the point that I was trying to make in the original post is that HTML is unsuited for the "mainstream web of the future". You know, the one people use for e-commerce, news, sports, the TV and movie theatre of the future and all that. Geeks will still have HTML-based web sites, but the mainstream ones need something with a bit more kick and functionality built-in.

    Plain text is boring and outdated. We need to look ahead a bit and formulate a new solution ...

    --
    ----- rL
  121. Fucking idiot moderators, mod this up by raph · · Score: 1

    Someone is doing something about free tools for the .doc file format, rather than whining, and he only gets a +2 (as of the time of this writing)? Have the moderators recently had a brain transplant operation, on the donor side?

    Meept!

    --

    LILO boot: linux init=/usr/bin/emacs

  122. Applixware DOES import native .DOC files. by gzub · · Score: 1

    This is a comment from our Product Manager for Filters, Joe Dunbar(joe@vistasource.com):

    Applixware imports both MS Word formats, RTF (Rich Text Format) their easy to use ASCII format, and .doc their native binary format (both DOS and Windows, all five versions from 2.0 - 2000). Both formats support all features including embedded MS Office OLE objects which our Applixware filters import. Applixware exports to RTF, but does not export to MS Word native .doc due to the complex proprietary format and our choice to go with Open Standards. Note RTF supports all data, layout and formatting information like the .doc binary format. Since there are no features to be gained by directly exporting to their proprietary .doc format, why do two when one will do?

    Applix continues to work with Microsoft who have made past versions of their file formats (or at least portions) available to us, however we are still waiting for their latest 2000 formats which they said would be available soon as of last January. While it would be helpful to have the latest Microsoft file formats, Applix is commited to open standards and shared file formats like HTML and XML which will be useful to all applications and users.

    Joe Dunbar
    VistaSource, Inc.,
    subsidiary of Applix, Inc.