Slashdot Mirror


Alternatives To .DOC As Standard WP Format?

D. C. Sessions asks: "I'm on the Software Task Group of a standards body (JEDEC) which is, among other things, responsible for the DDR memory standard. You may have heard of it. Currently standards drafts must be submitted in an editable word processing format, which right now is interpreted as FrameMaker or MS Word. I find not only offensive, but dangerous that these standards -should- outlive the current MS software that can manipulate them. I've gotten some sympathy on 'bit rot' from the rest of the committee based on showing what current flavors of Word do to documents saved with older versions, but the problem is this: What do I propose as a replacement?" Two that come to mind right off of the top of my head are LaTeX and, of course, HTML. Any other formats that can work just as well as .DOC in most situations and are cross-platform to boot?

"It should (obviously) be an open file format, preferably with an open source tool to access it. It absolutely must be usable on LoseBlows, should be usable on Mac, and (for my own sake) on Linux and Solaris. It must be capable of structured documentation, numbering, tables, and embedded vector graphics. I just don't know of such a beast at present."

89 of 205 comments (clear)

  1. No. by Eloquence · · Score: 4
    • HTML print results are unpredictable, formulas are hard to layout, and page design is impossible.
    • LaTeX is bad at handling images, and there are no easy editors for the Windows platform.
    • RTF has been killed by Microsoft with dozens of different implementations. (Some of them omit important things like footnotes.)
    • SDW (Star Office) is just as proprietary as Microsoft's DOC, but supported by fewer platforms.
    • PDF is a print format, text extraction is more difficult, and it's bad for PDAs.
    • TXT is insufficient for most tasks.

    XML may be a way out, but there's no XML-based document format on the horizon. (I don't know about this Open E-Book stuff, though.) All in all, the OSS community has failed to provide an open, flexible document format that could compete with MS Word. I'm as unhappy with that as you are, but if you want to change it, all word processor developers must get together and formulate a standard. Is this ever going to happen? Note that most closed-source word processors want to bind their users to their product by using a proprietary, closed format.

    --

    1. Re:No. by QuoteMstr · · Score: 2
      SDW (Star Office) is just as proprietary as Microsoft's DOC, but supported by fewer platforms.
      Ahem? Staroffice (well, Openoffice) can parse its own files and is GPLed. Its format is documented as well.
    2. Re:No. by Eloquence · · Score: 2
      Surely, if you expect the average user to type commands like

      \raisebox{-12.8mm}{% \setlength{\unitlength}{1truemm} \includegraphics[width=50 pt,height=50 pt,keepaspectratio=true]{logo2.bmp} }

      you're right, but I don't. Positioning, scaling and using graphics within LaTeX is far from easy. And we don't have to discuss in which ways MS Word sucks -- it will never find its way onto my harddisk. (I personally use TXT, LaTeX, HTML and StarOffice, depending on the task at hand.)

      The question is not whether something is possible in LaTeX, the question is how long it takes the average user to do it.

      --

    3. Re:No. by Eloquence · · Score: 2
      What about Emacs/XEmacs?

      Show me an out-of-the box installation of Emacs for Windows that not only does decently install & configure the program without much user interaction but also gives you all the info you need to know to write letters, including an easy interface to select templates for common tasks.

      No, you can't expect the average user to acquire this info by themselves. Emacs is even too much for a geek like myself.

      --

  2. Re:Postscript.... by Ethan · · Score: 2

    Postscript is a spectacular presentation language for the final product, but it isn't much for editing. This guy wants something for "living" documents that people are going to have to edit...

  3. Re:ASCII by pen · · Score: 2
    In case you're wondering, DocBook is here. Or you can read the text only version.

    --

  4. Re:.RTF could have been it ... by ksheff · · Score: 2

    I seem to remember that MS developed RTF as a way to exchanging documents between Macs & PCs. As the original poster stated, MS has changed RTF quite a bit over the years, usually to follow the changes that they've made to Word. But at least the changes have been documented and are available on the web. A quick search with google will turn up several of the RTF specs. Most word processors that I know of will support RTF and there is at least one open source word processor (Ted) that uses RTF exclusively. I've used it and it's pretty good.

    --
    the good ground has been paved over by suicidal maniacs
  5. what's wrong with PDF? by iso · · Score: 2

    i've asked this so many times in threads like this, but i always seem to get in too late to get any responses. i'd like to ask once again what's wrong with PDF documents?

    it's my understanding the PDF is an open format. in fact, i've even heard that part of the reason why Apple used DisplayPDF in MacOS X is because they would have had to license Postscript from Adobe while PDF was royalty-free. if this is the case, why is it that opensource advocates hail Postscript, but denounce PDF?

    i know that PDF is the format when you want to ensure that pages are printed correctly. that being said, they're still able to store text-content, they're compressed so they're a resonable size, and they're cross-platform (lots of programs can read PDFs these days, not just Acrobat)

    now for the topic at hand, i understand that standards definitions would be best presented in a format that doesn't waste so much space on presentation: content is what should matter, which is why a Framemaker file format or XML might be best. but for casual documents, why don't we use PDF? it's surely a lot better than DOC.

    so i'm asking: what's wrong with PDF? why can't more programs write to PDF as an export option? why can't more programs read PDFs for editing? am i missing something here?

    please, somebody knowledgeable: enlighten me.

    - j

  6. Re:I like RTF the best. by Rigid_Glitch · · Score: 2

    I like RTF too - but did Microsoft really author it? I recall first seeing RTF around 1991 as an internet-centric file format. MS seemed to push the .DOC format over .RTF, and I thought that was because .DOC was spawn of MS, while RTF was not. I could be wrong.

  7. Latex is the right tool for another job ... sorta by OmegaDan · · Score: 2
    Latex is a great program -- but latex is for typesetting not for word processing ... its true most latex users combine the two distinct phases into one messy process ...

    There is however a windows tex word processor who's name escapes me -- but it reads / writes tex files as its native format and allows you to write latex files in a interactive format, which IMHO is alot better then the "edit and compile" paradigm ...

    This would really be the best of both worlds ... the unix heads can have their programming language latex, and the windows-bred secretaries can have a program that they can work with as well ...

  8. But you still need to be able to _use_ Word! by update() · · Score: 2
    IMHO, most of the responses here are galactically missing the point. The question is (or, at least, should be) "What is a format we can move to that everyone can read?" and not "What text editor and markup language should we force everyone to start writing in?"

    All the posts arguing for TeX, DocBook, XML, Star Office or Pathetic Writer are forgetting that a group that demands submissions in .doc format is obviously receiving them from people using Word and turning them over to other people using Word. Forcing everyone to use LaTeX or XML (or to write LaTeX or XML in Word) is a guarantee that the whole thing will grind to a halt.

    HTML is an option; XML is not until Microsoft adds it as a "Save As" option.

  9. Re:XML and SVG by Anonymous Coward · · Score: 2

    Yes, XML appears to be the most viable alternative; and if I recall correctly Linus Torvalds suggested two years ago that it should be the native format for any Linux word-processor. However, since then KWord and others have come out, and I still haven't seen any attempts to support it... :(

    In my own little start-up project, I am in desperate need of an Excel-file format replacement. I am contemplating over XML, but besides being a lousy programmer I am even worse at reading specs... :(

    But anyhow seems to be coming. My work will abandon development of Java GUIs (on top of C++ programs) in favor of XML GUI! That way any program can be called from within browsers, without the need for platform specific virtual machines!

    My suggestion is: go for XML!

  10. The IETF standard is ASCII by Madwand · · Score: 2

    The Internet Engineering Task Force (IETF) publishes all its standards (the RFCs) for the Internet in American Standard Code for Information Interchange (ASCII). You can also submit the document in PostScript, but the ASCII is the primary reference.

    ASCII is searchable, printable, indexable, and forward compatible essentially forevermore. Everyone can display it correctly, anywhere. There is no better format for any kind of International standard. The IETF debates the question of superceding ASCII as the standard format about every other year, but we've yet to identify any other format that has ASCII's advantages.

    HTML might one day replace ASCII in this capacity, but it needs to be stable for longer than it has been, and the HTML generators out there never generate correct HTML (ever looked at web pages in iCab? It has a built-in syntax checker, and even slashdot comes up with errors, all the time). Until those problems are fixed...

  11. OpenOffice is the answer by IGnatius+T+Foobar · · Score: 2

    One of the things that the Sun/StarOffice project is doing, is to create this "OpenOffice" set of standards, with the current StarOffice codebase being the reference implementation. The set of standard document formats you wish for is one of their specific goals. Formats rich enough to handle the needs of business documents, but open enough to be vendor-neutral. Initially, the OpenOffice formats will be implemented by both StarOffice (the open source office suite) and StarPortal (Sun's upcoming online version).

    I think this would be a good place to start. To make it even more buzzword-compliant, the OpenOffice formats will be XML-based. I'd be very happy to see the OpenOffice formats adopted not only by Star/Sun, but also by Abi, Gnumeric, K-office... who knows, maybe someone could even write a plug-in for MS Office to load and save documents in OpenOffice format. (If it's successful enough, MS will eventually have to do it themselves.)
    --

    --
    Tired of FB/Google censorship? Visit UNCENSORED!
  12. Re:Staroffice 5.2 by SurfsUp · · Score: 2
    Just FYI, Word 2000 document format is backwards compatable with 97. And yes, it runs under Win95.

    Microsoft makes this claim with each new generation of its office products. It has always turned out to be a lie.
    --

    --
    Life's a bitch but somebody's gotta do it.
  13. Re:Staroffice 5.2 by SurfsUp · · Score: 2
    --
    Life's a bitch but somebody's gotta do it.
  14. Re:TeX is what you want. by dougmc · · Score: 2
    TeX (and the LaTeX frontend) runs on about as many platforms as linux.
    Actually, TeX and LaTeX run on far more platforms than Linux ...
    many people think the learning curve is high, but this isn't necessarly so.
    Trying to make the masses learn TeX or LaTeX is a big mistake -- they'll just go back to Word. The trick would be to write a WYSIWYG word processor that saves documents as TeX. There's already a few out there, but they're not ready to take on Word yet.

    TeX is a good idea. XML is probably better, and far more likely to actually happen. Of course, there's a zillion different ways that a paper could be stored in XML, so XML alone isn't the magic bullet. But it's a good start.

  15. The answer is obvious ... by taniwha · · Score: 2

    Specify .doc AS the standard .... and start a standardization process on it .... take it out of M$'s hands so that it becomes a non-issue

  16. Re:Something for the Non-Coders out there by Mr.+Slippery · · Score: 2
    In a market place, it doesn't work. You do not, in fact, have a "solid piece of wood in your hands" -- if you don't give people what they want and what they believe to be most useful, they will go elsewhere.
    As is often the case, what people believe they want and what they really want are two different things. (Consult any Zen master, or software requirements analyst, for further enlightenment.) It can take some forceful education and interrogation to get people to realize this and tell you what they truly want and need.

    People working or large documents need tools and formats that focus on document structure. A bunch of very smart people looked deeply into the problem years ago and came up with the idea of markup languages.

    If you want to displace .doc as a standard, you have to be willing to give people the tools they want to use, and not the tools you think they should use.
    Actually .doc got to be a de-facto "standard" exactly because managers gave their employees the tool the manager thought they should use.

    Tom Swiss | the infamous tms | http://www.infamous.net/

    --
    Tom Swiss | the infamous tms | my blog
    You cannot wash away blood with blood
  17. MS has this cornered for a reason by Animats · · Score: 2
    None of the alternatives to .DOC solves the problem.
    • HTML seems reasonable, but HTML documents are collections of files, not a single file, which makes moving them around a problem. There's also no entity-oriented graphics system (a "draw program") for HTML. (Well, there's Flash, but that's overkill for simple diagrams.)
    • TeX and its derivatives are too programmer-oriented and don't handle images, diagrams, or tabular material easily.
    • DocBook is text only.
    • StarOffice has the right feature set, but doesn't have enough market share. That may change. (I was a StarOffice fan for a while, but the early version sucked, and I had to give in and buy Word 97.)

    There's a window to do something about this right now, because Microsoft is tightening the screws on Office 2000 pricing. The amount of money companies have to spend on Microsoft Office is about to increase substantially. Technically, documenting the StarOffice format very well and encouraging other efforts to use it would be a good way to get started on the problem. From a business perspective, VA Linux or Red Hat ought to try pushing a desktop distribution that takes one install and provides the tools needed by, say, 70% of office workers. (Hint: support the first few companies that try this in a big way, to find out what's needed.)

  18. openoffice is working on this by kervin · · Score: 2

    Openoffice is seeking to address this. This may be of no consequence for someone needing a solution to today, but I thought I'd mention this.

    the link is xml.openoffice.org. Draft formats are available for download, and you can follow the development on the mailing list there as well.

  19. Staroffice by QuoteMstr · · Score: 3

    Staroffice has all the features you describe, and is portable.

    1. Re:Staroffice by jreilly · · Score: 2

      As far as I know, staroffice is not available on a PPC platform, only x86 and sparc. The author specifically said he wanted it to work on MacOS

      --

      Freedom's just another word for nothing left to lose
  20. .RTF could have been it ... by martin-k · · Score: 3
    RTF (Rich Text Format) could have been a sensible cross-platform, cross-application solution, were it not for Microsoft continually "enhancing" this format by globbing on new features in uncoordinated ways.

    -Martin

    1. Re:.RTF could have been it ... by Bouncings · · Score: 2

      Unfortunately though, RTF cannot be structured, at least as most programs use it.

      --
      -- Ken Kinder ken@_nospam_kenkinder.com http://kenkinder.com/
  21. Re:Just a bit reactionary? by dbarclay10 · · Score: 2

    Did XML kidnap your cat?

    Nope, just trying to clear up some issues.

    I think it's safe to assume that defining a DTD was implied. It's simply easier to say "Use XML" than to say "Write a good DTD to use with XML"

    I don't think it was implied. It was mentioned casually. But that wasn't my point. Choosing to use XML is like choose either a binary or text document format. Just saying "use XML" doesn't mean a whole lot. The format itself is really the DTD that's used. Whether or not writing a good DTD was implied, it is certainly a whole lot more complicated than the poster was making it out to be. XML is no magic wand.

    How could it possibly be device-dependent? This is just text, we're talking about.

    It's waaay too easy to make things device-dependant. For instance, think about printing a modern, full-featured HTML page. It is a device-dependant language; it's meant to work within a browser, of a certain size rage, with a certain colour depth, etc., etc.. It will look great in your browser, but it doesn't lend well to printing. So you have to choose your language/DTD carefully.

    Easy rendering has nothing to do with the XML DTD or document, that's the responsibility of the XSL that would accompany it, or the application that parses the document.

    Okay, sorry. So, if you want to use XML, you'll need a good DTD, *and* a good XSL(or a good application). :)

    Easy editing is pretty straightforward. Just edit it. This goes along with comprehensive. A good DTD can be comprehensive, but it can also leave room for extension without breaking that document. It is, after all, the extensible markup language.

    Now, I'm only going to argue semantics on this one. "Easy" is subjective. You're right, it's easy to look into the document and edit it, but that doesn't make editing easy. I can easily look into a MS Word document and edit it. That doesn't mean I'll do anything useful, nor does it mean it'll be fast.

    I wouldn't say XML without a DTD is useless, but I will say XML without a DTD is silly. It's a simple, logical assumption that if you're writing XML documents, they should have a DTD, so you know what's allowed. Like I said before, it seems like this would be implied.

    Well, you obviously know what you're talking about :) The reason why I replied to that post was that while it might be implied to you and me, it might not be implied to everything. The tone of that post struck me as, "Use XML - it's easy and simple," whereas using XML is not necessarily so simple nor easy. Lots of work to be done if you'll be writing your own DTD, and lots of learning to do if you don't.

    Thanks for the reply, though :)

    Dave

    Barclay family motto:
    Aut agere aut mori.
    (Either action or death.)

    --

    Barclay family motto:
    Aut agere aut mori.
    (Either action or death.)
  22. SGML/XML/DocBook by tobyjaffey · · Score: 5

    Use a nice SGML/XML application like DocBook. Tools for manipulation are free, anyone can write DocBook, with or without specialist tools (it looks a lot like HTML to the layman).

    Don't use HTML, at least use XHTML making sure that you segregate style from content. If you must use HTML, use stylesheets so that formatting is consistent.

    But, my recommendation would be to use DocBook (SGML) and use stylesheets and nice free parsers to output TeX, ASCII, RTF, HTML and whatever else people want.

    1. Re:SGML/XML/DocBook by dsplat · · Score: 2

      Anything with source that is plain text (HTML, SGML, XML, RTF) and that is based on a published standard should be the requirement. That guarantees too things. The first is that there will be tools in the future that can read it even if the format itself is abandoned at some point by all of its users. The second is that it is documented in a publically accessible way for the whole world to see.

      TeX doesn't meet that second requirement as much as I love it.

      --
      The net will not be what we demand, but what we make it. Build it well.
    2. Re:SGML/XML/DocBook by dsplat · · Score: 2

      My reason for suggesting a published standard was not a slur on TeX. As I said, I love it. The advantage of standards is that, in theory at least, they are not under the control of a single person, company, or reference implementation.

      I agree with you about TeX's stability. After using several different incompatible tools through the 80's for my resume, I finally put it in TeX and stayed with TeX for a decade. I'm considering HTML or XML now, but I haven't made the switch.

      --
      The net will not be what we demand, but what we make it. Build it well.
    3. Re:SGML/XML/DocBook by tobyjaffey · · Score: 3

      HTML is great, XHTML (or at least HTML >= 4) is better.

      The problem with HTML is that it was designed to be a markup language for simple documents, so it has heading, subheadings, titles, paragraphs etc. However, as people wanted to do more and more stylistic things with it, the language was extended by the w3c. But, most people kept just bastardising it by using heading tags to make things big and bold tags to emphasize things.

      HTML is a big, nasty mix of structured document and stylistic tags. What HTML 4 strict does is to say that HTML is just a structure language with no formatting info. Then you use CSS or XSL to do the style work, which is a much more sane and portable approach.

    4. Re:SGML/XML/DocBook by Whelkman · · Score: 2

      There are numerous things that make HTML a poor choice for documentation.

      First, there's the aforementioned kluge of the HTML standards, but if one is writing documentation, he should stick to pure structuring (at least at first) anyway. If I write an entire document using <p> and <hX> tags, sure it'll be portable, 100% compatible with the W3C guidelines and so on, but there's more than that.

      HTML, unlike many other more complicated mark-up languages, has poor support for "book" features. Headers, footers, generation of table of contents, page numbering, margins, cross-platform printing support. The list goes on and on, but if you're doing anything but looking at it in a browser, HTML is not a good choice for documentation.

      So that's why HTML is not the best choice for documentation, not because of any grandiose "stylistic perfection" ideas. Furthermore, HTML is no more or less open than may other mark-up standards (e.g. SGML, XML, TeX), and they're all roughly on the same line in terms of portability (if you get the right tools, that is).

      Basically, HTML makes a good "quick and dirty" documentation tool, but if you want your options open (wide open), SGML (or maybe XML in a few months) is the way to go.

    5. Re:SGML/XML/DocBook by Doc+Hopper · · Score: 2
      I must add my hearty approval. As the maintainer of the Bugzilla Guide (http://www.trilobyte.net/barnsons) I am immensely enjoying writing documentation in SGML instead of some lame proprietary format.
      Another positive benefit of using SGML: All Department of Defense (IIRC) documentation must be SGML. So if you're ever going to have to maintain government documents, SGML is a great choice!

      Matt Barnson

  23. Re:It may seem incredibly redundant... by dbarclay10 · · Score: 2

    I apologize, and you're right :) The poster didn't mention that a good DTD would need to be written(a lot of work, I might add), and I didn't mention that a good set of XSLs would need to be written :)

    Barclay family motto:
    Aut agere aut mori.
    (Either action or death.)

    --

    Barclay family motto:
    Aut agere aut mori.
    (Either action or death.)
  24. Re:Staroffice 5.2 by slickwillie · · Score: 2

    I just happened to buy StarOffice 5.2 for $40 two days ago. Then I went to work yesterday to discover that the company documents were now in Word 2000 format. I still have a Window 95 box at work for MS Outlook and Word. No one was sure if Word2K would run on Win95, so that meant I would have to a) "upgrade to Win98 or Win2k, and b) up[grade to a new machine. So I installed StarOffice instead, which supports word2k format. I installed it on Win95, Linux, and Solaris. I can even use it from my FreeBSD box as an X application on Solaris.

    Try that with Word [97, 2k, 2.001k, etc etc).

  25. Re:It may seem incredibly redundant... by dbarclay10 · · Score: 2

    Watch your tone. We're having a good discussion here. Of course, I'm assuming that you're posting anonymously because you don't like cookies - not because you're a troll.

    The poster didn't answer the question that had been asked very well. They talked about XML as a good thing, but they didn't talk about the bad things(which you must know about when trying to make an informed decision). I was just trying to clear the issue up a bit. The bad things about XML being that you've got to write a good DTD, and good XSLs, etc., etc..

    Dave

    Barclay family motto:
    Aut agere aut mori.
    (Either action or death.)

    --

    Barclay family motto:
    Aut agere aut mori.
    (Either action or death.)
  26. documentation isn't the problem by q000921 · · Score: 2
    Microsoft documents the .DOC format, probably as well as they understand it themselves, and there are reasonably good converters/readers for it, some even open source (OpenOffice contains one of them).

    The problem with .DOC is the typical Microsoft problem: Microsoft beats other people to market by "just getting the job done". They hack up what needs to be done, if it works 90% it's OK, and maybe they document it later. They are even proud of that and seem to think it's the right way to go because they actually beat everybody else to market; let's hope the customers will wake up to this and stop buying.

    The latest .DOC format is supposedly XML (with embedded binary). That will help somewhat, in that it will at least make the text and other important information accessible without a complex OLE infrastructure. But to take full advantage of the information found in .DOC will still not be possible. The .DOC format often contains scripting and all sorts of other extensions, and the actual semantics of those can depend heavily on the environment or a buried deep inside some MS Word module.

    Note that inside Microsoft, there now seems to be another approach, NetDocs. If it delivers what it claims to, a fully XML-based standards-compliant, end-user document and information management system, I have my doubts that that will catch on--it is way out of character for Microsoft.

  27. tex by Dungeon+Dweller · · Score: 2

    I think that tex would be the best format as such. I would rather see it be tex than HTML, and certainly rather see it be tex than doc. You don't want to give a company control over the format, especially not for a hardware setup. DOC is WAYYY to volatile. It also seems a bit bulky. As for HTML, also kind of bulky, good at what it's used for, but certainly not a replacement for tex.

    BTW, I wouldn't think of it as a replacement for the DOC format, I would think of it as doing things right from the start. Doc is good for what it does, but what I think you are describing is MUCH more suited to tex.

    --
    Eh...
  28. Re:It may seem incredibly redundant... by iapetus · · Score: 2

    Agreed. LaTeX is bound to be a frequently suggested alternative, but IMHO it's the wrong answer: XML has been designed for exactly this purpose, and it fits your needs perfectly.

    XML can easily (dare I suggest it, trivially) be transformed into XML documents - in fact, this is the approach my current employers take for a number of types of business documents - XML is the format for representing the data, and LaTeX or HTML or whatever can become the format used for making this available to the user - XSL transformations allow us to take a language-independent set of data and translate it into a document in a suitable format.

    If you want true independence from propietrary data formats (and open source applications can have data formats that are just as restrictive as closed source applications to most users) then XML is the only real choice right now - a well defined XML document should be readable even *without* a parser, and with a well-defined DTD and a series of appropriate XSL files, you can select your own viewer application. What could possibly be better? Certainly not Word, StarOffice, LaTeX or any of the other competitors in this arena.

    --
    ++ Say to Elrond "Hello.".
    Elrond says "No.". Elrond gives you some lunch.
  29. LaTeX and HTML by the+eric+conspiracy · · Score: 2

    Both LaTeX and HTML suffer from the fact that they are evolving standards. You would have to also pick a version, and face the fact there might come a day when there is a loss of backwards compatability.

    I think the best idea is something that is extra simple, and unlikely to change in the future; that is ASCII.

  30. Let TeX die. by Ars-Fartsica · · Score: 2

    Sorry, but as an old TeX-head, I can tell you with satisfaction that writing one document in it does not make you literate in its varied commands. TeX does have a ridiculously high learning curve, and added to which it is only a display markup language - it does nothing to infer context and meaning in the content itself, which we've all learned by now is something you want to preserve.

  31. Re:Just a bit reactionary? by Cardinal · · Score: 2

    Now, I'm only going to argue semantics on this one. "Easy" is subjective. You're right, it's easy to look into the document and edit it, but that doesn't make editing easy. I can easily look into a MS Word document and edit it. That doesn't mean I'll do anything useful, nor does it mean it'll be fast.

    Granted. I imagine things will get easier as editors become more widespread that are geared towards editing XML documents. The editor could make sure you stay within the DTD, speed up the writing time involved.. Until then it's all being done by hand.

    Well, you obviously know what you're talking about :) The reason why I replied to that post was that while it might be implied to you and me, it might not be implied to everything. The tone of that post struck me as, "Use XML - it's easy and simple," whereas using XML is not necessarily so simple nor easy. Lots of work to be done if you'll be writing your own DTD, and lots of learning to do if you don't.

    Yeah, looking over the original post again, he probably should've been more clear. It sounds like he's been using XML for awhile, and forgot about the issues involved in actually learning it. :) Fortunately most of the work is initial stuff.

    Been fun. :)

  32. Re:It may seem incredibly redundant... by frantzdb · · Score: 2
    The difference is in the purpose of the markup - XML is (generally, with a good design) syntactic markup. LaTeX is entirely structural markup, specifying not *what* a particular element is, but how it is to be displayed.

    I think you are confused. LaTeX *is* designed with with generalized structural markup in mind. (OTOH TeX focuses on specific markup.) In LaTeX you use commands like \section and \chapter and \emph, and (generally) not layout markup commands like ``itallics'' or font sizes.

    ``LaTeX is, to a large extent, an example of a `generic markup language' (GML). Thanks to the class file mechanism, the visual style of the various document elements are described in a single place outside of the source document itself'' (The LaTeX Companion, 7).

    I hope that clarifies things.

    --Ben

  33. Some Suggestions by bhurt · · Score: 4

    Consider using TeX/LaTeX, postscript, or an XML/SGML variant, like DocBook or HTML.

    Basically, what you want is a format the fits the following criteria:
    1) The original text can be easily gotten out of the format. This way even if the programs that read the file go the way of the dodo, future programs could still recover the data.
    2) The specification is fully open and documented, and preferrably stable and mature.
    3) At least one open-source program handles displaying/converting the format. I would recommend storing a copy of this program in the same place as the standards themselves- including shipping source with standards CDs.

    You've gotten over the hardest part already- you've realized you have a problem.

    Brian

  34. Re:postscript isn't editable by spitzak · · Score: 2

    You mean people edit those MicroSoft Word documents at the byte level? I didn't know that, I was always under the impression that they cheated with some program called "Word". Well, apparently such cheater programs are not allowed...

  35. Re:Something for the Non-Coders out there by johnnyb · · Score: 2

    You don't want it working exactly like Word, that would destroy the whole idea. Having a nice GUI is fine, great in fact. However, it should work semantically. And, no, the people typing don't have to worry about DTD restrictions and XSL - that's handled by the application. Look on freshmeat for Morphon to see a good XML application for end-users.

  36. Re:TeX is what you want. by jbf · · Score: 2

    As previous posters have pointed out, TeX runs on more than just Linux =)

    There are a few problems with using TeX/LaTeX. The first is that TeX tries to do paragraph-by-paragraph layout, and often winds up in tight spots that it doesn't need to. The average user wouldn't have a clue about what to do with overfull hboxes.

    Another problem is that it's not really possible to do WYSIWYG, and those people who use lots of spaces instead of tabs (even with variable width fonts, heh) will have a rough time adjusting to that. People will complain about things like "well in Word the line wraps this way..." BTW this is a problem with Word itself; it's figure placement is really screwy.

    Finally, those of us in academia who write papers in LaTeX can no longer look down on those whose use of Word is obvious by the terrible aesthetics of their papers.

    Obviously, there are lots of advantages, and for Microsoft, possibly the nicest thing about TeX is that there are no known bugs. (not that Microsoft will have any problem adding some...)

  37. Re:Dangerous and Offensive??? What is standard? by Detritus · · Score: 2
    Most biz owners, small biz or enterprise, usually have the latest version of word.

    My experience is the opposite. Where I work, Office 97 is the standard, along with Windows 95 and Windows NT 4.0. The company (Fortune 50 corporation) is conservative about upgrading to new versions of software. They don't want to spend money on new software unless there is a compelling reason to do so. I don't know anyone who has Windows 2000 or Office 2000 on their work PC.

    --
    Mea navis aericumbens anguillis abundat
  38. Re:RTF could have. . . (I think it is!) by Fantastic+Lad · · Score: 3
    I still love RTF despite the flaws.

    For straight wordprocessing where no layouts are required, it's great. It's ascii with the expressive power of italics and extended symbol recognition. For straight word smithing, that's all anybody needs.

    Here's what I do:

    1. Do all wordprocessing using a very basic text editor which saves very basic RTFs.

    2. Import those files to whatever layout program is needed. (Quark, Pagemaker, whatever.)

    The stability of RTFs lies in two areas; Firstly, there will ALWAYS be a selection of homemade editors available upon which to do your writing, and second, there is no financial incentive for big software manufacturers to make RTFs un-importable to their suites and layout packages.

    This means that doing all your basic work in RTF will make files readable for a long time to come.

    In any case, particularly in the print publishing field, today's software is finally about as good as it needs to get. There's no real reason to switch tools. Unlike with graphics technology, Times Roman simply doesn't need to motion blur and bump map for a writer to work his or her craft. As long as we all keep our old copies of Wordperfect and Pagemaker, everything is fine.

    Fantastic Lad

  39. Re:Dangerous and Offensive??? by Spoing · · Score: 2
    Anywho, DOC is a biz standard. This isn't going to change.

    Till the next version of Word is released, then...it changes!

    --
    A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
  40. Re:Staroffice 5.2 by Ross+C.+Brackett · · Score: 2

    Just FYI, Word 2000 document format is backwards compatable with 97. And yes, it runs under Win95.

  41. Re:Reverse engineer the thing by QuoteMstr · · Score: 2

    That's true, but how often do people use these things?

  42. HTML has problems by Gorimek · · Score: 2

    1. It's itself not a standardized standard, and the different dialects are evolving continuously.
    2. A document can be rendered quite differently by different browsers.
    3. You can't even get things like page numbers in HTML documents.

  43. Re:It may seem incredibly redundant... by dbarclay10 · · Score: 5

    I'm sorry, but I have to disagree with you.

    XML is nothing more than a concept - you store data and text within "tags". The tags can be of pretty much any name. The data can be anything. This isn't a standard, it's not even a format.

    Basically, XML boils down to: store it in a text file, delimit data, fields, and content by tags. Sorry, that doesn't cut it. You have to do more.

    No, if you want to think about using XML for this, you need to talk about the DTD, not XML itself.

    So, the question becomes, which DTD? In order to compete with the competition(LaTeX, HTML, PostScript), it has to be: device-independant, easily rendered, easily edited, and extremelycomprehensive.

    Don't shout "XML!!". XML, without a DTD, is almost useless, especially for this application. The DTD has to be all those things I mentioned, plus(for this application), it needs to be standard.

    Dave

    Barclay family motto:
    Aut agere aut mori.
    (Either action or death.)

    --

    Barclay family motto:
    Aut agere aut mori.
    (Either action or death.)
  44. No no no by Devil_Dog · · Score: 2

    Just a slight correction, the DoD standard for documents is the 2 latest revisions of MS's .DOC format.

    --

    Someday I'll make

  45. Intro/Tutorial on DocBook? by Speare · · Score: 3

    I hadn't heard of DocBook, so I went fishing on docbook.org for some basic info.

    The state of the documentation for this product is fairly lacking. (Hey, it's a DOCUMENT application!) There's no "getting started with DocBook" stuff. There's no official tutorial.

    The closest thing to a tutorial I found is this page: DocBook intro. I'll excerpt the front page.

    • DocBook intro
    • Here is my tutorial on DocBook. I never completed it, but it is still useful, since others don't focus on a complete beginner tutorial.

      Last modified: Mon Jul 27 11:19:57 1998

    Frankly, this sums up my issue with many Open Source projects: making a technically superior tool is not enough to generate wide user acceptance. There has to be an easy migration path from what the user's already got.

    DocBook needs at least ONE of the following to get people going:

    RTF/DOC/FrameMaker/TeX to DocBook converters, supporting at least a good 75% of basic features,

    A usable migration tutorial that assumes the user already makes RTF/DOC/FrameMaker/TeX documents,

    A usable editor that shows the results, even if it has to be two-paned to show both source and results.

    I'm not flaming Open Source in general, but this is not the first time I have heard of a tool that would fit my needs exactly, except they put very large barriers to entry in my path.

    --
    [ .sig file not found ]
    1. Re:Intro/Tutorial on DocBook? by FattMattP · · Score: 3

      Maybe if you had bothered to look around docbook.org a little more you would have noticed that there is an entire O'Reilly book available online and for free about Docbook and how to use it. You can also purchase the dead trees version from your local bookstore.

      --
      Prevent email address forgery. Publish SPF records for y
  46. Re:It may seem incredibly redundant... by iapetus · · Score: 3

    The difference is in the purpose of the markup - XML is (generally, with a good design) syntactic markup. LaTeX is entirely structural markup, specifying not *what* a particular element is, but how it is to be displayed. As a result, XML is easier to tailor to a particular application's needs - from the XML document you can trivially create the equivalent LaTeX document. The same does not hold true the other way round.

    --
    ++ Say to Elrond "Hello.".
    Elrond says "No.". Elrond gives you some lunch.
  47. Re:Something for the Non-Coders out there by Mr.+Slippery · · Score: 2
    The whole document writing process has to be as transparent as selecting fonts, size, justification, etc. with a simple mouse clic on an icon or scrolling menu.

    No. First, we start with unlearning past mistakes. It is often handly to have nice, solid piece of wood in your hands at this point, as we teach "You do not want to change fonts and sizes. You want to think about your document's structure and mark it up accordingly."

    Yes, we don't have to beat that into "the average John and Jane Doe" or "the average secretary" who just wants to type up a one page letter, but when people are creating real documents structure should be in the front of their minds. Otherwise they're fscked from the start, regardless of technology choices.

    Tom Swiss | the infamous tms | http://www.infamous.net/

    --
    Tom Swiss | the infamous tms | my blog
    You cannot wash away blood with blood
  48. Re:HTML...Niagra falls by Mr.+Slippery · · Score: 2
    There's a running joke at my office on my constant threats to start doing all wordprocessing in HTML,
    Why leave it as a joke? Last contract I had, I wrote up all my intra-team proposals and documents in HTML. (These were, I should note, short documents, three or four printed pages tops, so lack of large-structure layout wasn't a problem.) Didn't have to leave the comfort of my Emacs window, didn't have to worry that they'd be unreadable two years from now when M$-Word was no longer backwards-compatable, didn't have to deal with dancing paperclips or crashing Macs.

    Tom Swiss | the infamous tms | http://www.infamous.net/

    --
    Tom Swiss | the infamous tms | my blog
    You cannot wash away blood with blood
  49. Re-inventing the Wheel: A good idea?! by Alien54 · · Score: 2
    The only way that DOC files could be made a standard would be the public release of the internal file specifications so that everyone can use them.

    . . . . right.

    I can see M$ going for this one right now. (HA HA HA!)

    This means that the file format would have to be made a part of the public domain.

    IANAL, but I think this would take a prodigious amount of legal wrangling.

    I personally prefer a format like xml or html where you can see the tags, etc. and figure out what is going on, if someone made a mistake. Mind you, this is just me, just a personal preferance.

    I also wonder about designing a file format for the future, given the various changes in technology. As an example, there is a new technology that has been demonstrated providing 3d displays in shocking detail, no special glasses needed. Not a Moving Picture yet, but you get the idea. How to incorporate this? The file format has to be scalable and adaptable.

    MS word does really horrible at things like books, where it is better to use a page layout program like Pagemaker.

    so it looks like we have to re-invent the wheel here, and include all of those features that make the best sense. Yet another Open Source project for the masses.

    Don't look so enthuthiastic now!

    ;-)

    --
    "It is a greater offense to steal men's labor, than their clothes"
  50. LoseBlows by aphr0 · · Score: 5

    Thanks for showing the maturity everyone has come to expect from the linux community.

    Hey linsux users - grow up.

  51. Re:HTML...Niagra falls by griffjon · · Score: 2

    When I'm at a conference or using a laptop in general, I do write in html because a) it gives you massive cred to any shoulder-surfers and b) it can be less power-intensive and less prone to laptop mouse problems.

    --
    Returned Peace Corps IT Volunteer
  52. I apologize in advance... by Gendou · · Score: 2
    ...but you've missed the point of XML (yes, it IS a standard and yes, it IS a format - I think you'll find that all tools for working with XML are very consistent) and an important mention in my post.

    Certainly, you have to assign meaning to tags in order for data to be formatted correctly. The whole point of XML is that data carry traits and structure (which of course, can be inherited).

    This is where the concept of a template would come in. I had mentioned this but you must have looked over it.

    You have a set of rules defined that determine what certain tags do. Very similar to HTML now (table, p, b, div, etc. are all assigned functionality). With XML, these templates can even be a part of the document with tags that flag them as such. The trick is to put as little of this in the hands of the word processor itself.

    I never said "XML! XML!" all by itself. XML is fairly abstract. Obviously we need everything that works along side of it and I'm talking about all supporting technologies if I'm talking about XML. If you read the article again, you'll notice the question was about document formats, not whether or not we'll need templates to go along with our XML formatted data.

    :-P

  53. You miss my point; this doesn't *need* gingerbread by Christopher+Thomas · · Score: 2

    Yeah.. I can see how it's easily portable with graphics, doing chapters, PAGE BREAKS, headers and footers on pages (which may or may not be common) and have you ever pulled in HTML to edit on any of the above?

    Ok, I'll bite.

    Firstly, you seem to be missing the main point of the question. This isn't about finding a generic format for page layout - it's about how to best transfer specification documents so that they can be written anywhere and read anywhere. HTML works wonderfully for this.

    Secondly, _yes_, you can do all of the above, when it makes _sense_ to do so.

    Page breaks? Easy. Have a set of linked documents instead of one big page. This is useful under some circumstances (like dividing a large document into sections).

    Chapters? Um, you _have_ the tools to emphasize chapter headings, and you _have_ page breaks if you really feel you have to use them. Where's the problem?

    Graphics? If I need a figure, that's what the image tag is for. If I want to have anything fancier than an image in a box... then I should have someone else write the standards document. Again, we aren't making magazine articles here - the goal is to find a format suitable for a *technical description*. Visual gingerbread is _counterproductive_; it distracts the reader.

    Headers/footers? Frames work fine for that, if you have a real reason to use them. I personally can't think of any, for this application. For my own documents, if I'm writing something that must be pretty, I use a script to prettify things consistently.

  54. Problem with Word? by Fervent · · Score: 2

    What, exactly, is the problem with the Word document format? Couldn't open source programmers add features after Microsoft stops using it (anaylise it, "emulate" it, build off it)?

    --

    - I don't care if they globalize against free speech. All my best free thoughts are done in my head.

  55. Re:It may seem incredibly redundant... by frantzdb · · Score: 2
    If you want true independence from propietrary data formats (and open source applications can have data formats that are just as restrictive as closed source applications to most users) then XML is the only real choice right now - a well defined XML document should be readable even *without* a parser, and with a well-defined DTD and a series of appropriate XSL files, you can select your own viewer application. What could possibly be better? Certainly not Word, StarOffice, LaTeX or any of the other competitors in this arena.

    I'm not sure why you include LaTeX in this list. I'm not sure which, LaTeX or XML, would be best for the proposed use, but LaTeX most certainly *is* readable even without a `parser.' The other aspects of XML and LaTeX are where the two formats differ but both are designed as structured markup saved in ASCII.

    --Ben

  56. Just a bit reactionary? by Cardinal · · Score: 2

    Did XML kidnap your cat?

    No, if you want to think about using XML for this, you need to talk about the DTD, not XML itself.

    I think it's safe to assume that defining a DTD was implied. It's simply easier to say "Use XML" than to say "Write a good DTD to use with XML".

    So, the question becomes, which DTD? In order to compete with the competition(LaTeX, HTML, PostScript)

    That's just the point. It doesn't need to compete with other formats. The process goes something like this: Write a good DTD that fulfills all your needs, and allows for easy extension and specialization later on. Then, write XSL for exporting the format to whatever other formats are useful. HTML obviously for web display, PostScript for printing, maybe one for PDF, even. (Though encouraging the use of PDF probably isn't any better than encouraging the use of Word's DOC)

    it has to be: device-independant, easily rendered, easily edited, and extremely comprehensive.

    How could it possibly be device-dependent? This is just text, we're talking about.

    Easy rendering has nothing to do with the XML DTD or document, that's the responsibility of the XSL that would accompany it, or the application that parses the document.

    Easy editing is pretty straightforward. Just edit it. This goes along with comprehensive. A good DTD can be comprehensive, but it can also leave room for extension without breaking that document. It is, after all, the extensible markup language.

    Don't shout "XML!!". XML, without a DTD, is almost useless, especially for this application. The DTD has to be all those things I mentioned, plus(for this application), it needs to be standard.

    I wouldn't say XML without a DTD is useless, but I will say XML without a DTD is silly. It's a simple, logical assumption that if you're writing XML documents, they should have a DTD, so you know what's allowed. Like I said before, it seems like this would be implied.

  57. Re:20 year-old problem by Doc+Hopper · · Score: 2
    The key issue, IMHO, this company needs to decide is what they want most from documentation: presentation or content.
    Microsoft .doc format (and StarOffice's .sdw format) are very presentation-centric. The only thing that matters to it is how the printed page will be. PDF, PS, and many other formats share this limitation. Ideally one should focus on the content of the documentation, and allow it to expand without massively reformatting the page every time. My company has run into this issue already. We open up our Product Requirement Documentation to modification as needed, and thereby lose all the formatting the Product Management staff has worked hard to get in there. Ever tried adding a paragraph on a page with an image anchored to a page position in MS Word? You get my drift. If you choose to use the DocBook DTD (Document Type Definition) with XML (Extensible Markup Language) or SGML (Standard Generalized Markup Language), you can use an off-the-shelf DSSSL (Document Style Sheet Specification Language [I think]) or create your own to customize how the "compiled" raw SGML/XML should look. An earlier poster said there is no good documentation on DocBook and SGML/XML. Bull Hockey, there's a full-fledged guide on how you can create standards-compliant, flexible DocBook available as the "LDP Author's Guide" at http://www.linuxdoc.org.

    Matt Barnson

  58. WP formats by Todd+Knarr · · Score: 2

    My two favorites: Rich Text Format (.rtf) and HTML. HTML has obvious advantages, but the disadvantage that it really wasn't designed for word processing as such. RTF was a format that, I believe, DEC came up with as a software-independent storage format for word-processor documents. I've found it does most everything needed to keep formatting and such intact, it's readable and writeable by most WP software ( MSWord, WordPerfect and StarOffice that I've confirmed by use ). It's also a plain-ASCII format, if you've no word processor you can pull it up in a text editor and get at the actual text if you really have to. And it hasn't had changes made to it in many years, stability is a definite plus for a long-term storage format.

  59. XML and SVG by Cato · · Score: 2

    You might want to investigate XML, with a suitable DTD, e.g. DocBook for technical manuals. Also, SVG is an XML-based format for vector graphics, which always seemed to be the point at which SGML-based efforts had trouble.

    Tool support for this combination may not be so good or inexpensive, but you can be fairly sure the content will survive and be usable in many different environments.

  60. LaTeX by Weezul · · Score: 2

    Seriously, how can anyone consider anything diffrent from LaTeX for serious writing (unless they have a publisher with trained monkies to rewrite it in TeX)? Hyperlinks are the ONLY feature missing from LaTeX, but LaTeX is about the only system with a good clean way to handle the old fassion hyperlinks (i.e. index, figure numbering, etc.).

    The point is that you must use LaTeX if you want your work to ever appear respectable in print, so the question is: dose your publisher want to TeX it themselves from your draft or do they want you to TeX it, i.e. it's a question of money. If your an autonomous institution which dose it's own publishing and dose not have ass loads of money then you really need to make people TeX it.

    Now, there are SGML systems which produce TeX and HTML, but they may not handle pictures propperly, so you should be very careful. Actually, there are ways to include hyperlinks in LaTeX. The resulting dvi file can be compiled to an HTML file. This is almost shurly the very best way to typeset your documents since you can write a TeX macro to treat images propperly for conversion to postscript OR HTML. It would work something like this: your images would be compiled to both .eps and .jpg, the TeX macro would embed the .eps and the URL for the .jpg into the dvi, the dvi could be converted into both .ps and .html without loosing the pictures. There are some issues regardling the placment of the images when convered to HTML, but nothing a LaTeX hacker could not fix.

    Jeff

    --
    The Christian religion has been and still is the principal enemy of moral progress in the world. -- Bertrand Russell
  61. XML is verbose and ugly, but it IS useful... by q000921 · · Score: 2
    XML isn't just "text files with tags". XML comes with standards for describing the grammar of those files, with a standard language for describing transformations, and with standards for performing physical layout. In terms of tools, there are standard libraries for many languages to read and write XML, and standard APIs for manipulating XML once it has been parsed.

    XML is pretty verbose and ugly. It's not the most convenient format to type in. But, in some sense, it finally extends the traditional UNIX approach to more complex data types. UNIX used to give you scanf, printf, and plain text files with fields. XML now extends that to parsing, generating, and transforming tree-structured types. That's really great, and it is really useful.

  62. TeX is not what you want by q000921 · · Score: 2
    TeX output is pretty good, and LaTeX markup is pretty good, too.

    Where TeX falls way short is in the way it is programmed and extended. The TeX processor is more like a machine language, with registers, lots of side effects, hooks, and global variables. Doing non-trivial transformations in TeX is incredibly hard, and even the best macro packages often don't get it quite right.

    XML's approach is both more modern and much simpler: you describe transformations on the parse tree. XSLT and XSLT:FO are what corresponds to the programmable guts of TeX.

    Most likely, what is going to happen is that many documents will be authored in XML, many document styles will be described in XSLT and XML Schema, and TeX will be used not for defining macro packages, but merely for performing the last stages of physical layout.

  63. Re: SO, DocBook, RTF, and DOC by dominator · · Score: 2

    Hi,

    I'm a maintainer/lead coder on a couple of OS Office Projects: AbiWord (http://www.abisource.com) and wvWare (www.wvware.com). I've written quite a few import/export filters for AbiWord.

    AbiWord is an excellent OS word processor which already handles lots of existing formats that you speak of: DOC, XHTML, DBK, RTF, et. al.... They each have their own mertits, advantages, and disadvantages.

    XHTML is not a good layout language. It has all of the same problems that HTML and thus web pages have: i.e. WYSISYG formatting is next to impossible to achieve.

    DBK is nice, except that DBK wasn't really meant to be a WP file format. Its tags carry with them lots of semantic information that WPs generally just don't care about. Its layout tags leave much to be desired for a WP. There just isn't a clean mapping of DBK->WP tags.

    RTF is really slick (even though it is kinda old). Basically, anything that the AbiWord format can represent, RTF can do. This is a really good choice for your format.

    ABW (or your WP's favorite native format) is always nice because it maps neatly onto your feature-set.

    DOC support isn't really all that bad anymore. If you know what wvWare is (if you don't, see www.wvware.com), you know that it can convert DOC files into just about any format that you'd like. It can do this through either the command-line version of through the 50KB associated library. AbiWord uses wv to import DOCs. The importer is about 1100 LOC. I'm currently also writing DOC export support into wvWare (and thus AbiWord). Our DOC importer is *significantly* better than the one that OO has released. That will probably change soon, since Sun hired wvware's ex-maintainer to work on OO ;) Our DOC exporter currently exports something that looks like DOC at about 10 paces - i.e. it's not really DOC format, but it's getting there.

    Anyway, hope that this helps,
    Dom

  64. 20 year-old problem by maggard · · Score: 2
    Lots of places (esp. US DOD & auto-industry) faced this problem years ago and came up with a stable, reasonable solution:
    SGML
    It's open, cross-platform, flexible & has a long heritage. If you want to embed graphics call out to a Postscript fle.

    Framemaker speaks it, WordPerfect speaks it, I dunno about MS Word, and of course it can be pumped out into lots of other formats (eg HTML, XML, etc.)

    It's not a perfect solution but it's widely availiable and fairly future-proof. Your specs should be about content anyway, let the reader concern themselves with presentation.

    --
    I don't read ACs: If a post isn't worth so much as a nom de plume to its author then I wont bother either.
  65. XML, DTD & XSL can be edited with some neat tools. by crovira · · Score: 2

    The advantages of using cross-platform and implementation independent standards are three fold:

    1) XML separates your content (XML) from your structure (DTD) from your presentation (XSL) leading to far more concise and rational documents.

    2) They are open standards, unlike Word, FrameMaker or other proprietary formats.

    3) The tools for document creation are open (and closed,) cross-platform and not dependent on the largesse of any single source.

    --
    MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.
  66. "Save as HTML" is Your Friend. by Christopher+Thomas · · Score: 2

    I vote for HTML. Yes, it isn't great for fine layout control, but you don't *NEED* perfect layout control. We're writing standards specs here, not doing graphic design.

    The advantages: HTML is readable on any platform under the sun (and quite a few in caves), and most word processors can export using it.

    If the documents have figures, they can be saved as one of .gif/.png/.jpg, and read by most browsers.

    This is the only way I've found to get MS Word-users to give me readable documents, among other things.

  67. Use DocBook by FattMattP · · Score: 2
    I'd use DocBook. DocBook is a system for writing structured documents using SGML and XML. DocBook, provides all the elements you'll need for technical documents of all kinds. A number of computer companies use DocBook for their documentation, as do several Open Source documentation groups, including the Linux Documentation Project (LDP). With the consistent use of DocBook, these groups can readily share and exchange information. With an XML-enabled browser, DocBook documents are as accessible on the Web as in print. The format is used by O'Reilly and Associates and they were one of the original creators of the specifications. You can find more information at these links:
    --
    Prevent email address forgery. Publish SPF records for y
  68. DocBook Resources by Bob+Clary · · Score: 3

    DocBook is your friend

    DocBook is a lot to digest at one time, but it is well worth the effort. Personally I prefer DocBook XML and use Norm Walsh's XSLT stylesheets to transform the XML to anything I want... HTML, PDF, whatever.

    Here are some resources for your reading pleasure.

    DocBook is Open Source, freely available on all platforms of interest, can be used for simple documents to complex books, separates presentation from content, and is extensible. What more could you want from a document format?

  69. One word... by Greyfox · · Score: 2

    Postscript. You can get programs to read it free for any OS, it's device independent, ANYTHING can render into it, and it gives you supurb control over the content of your document. It is truly the lowest common denominator.

    --

    I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

  70. Re:Reverse engineer the thing by QuoteMstr · · Score: 2

    Reliable reverse-engineering of the doc format *has* been performed. Both Staroffice and Abiword can work with doc files just fine.

  71. Re:It may seem incredibly redundant... by iapetus · · Score: 2

    Your examples really aren't relevant in this case. Firstly, it's actually easier for an automated process to generate text from RTF than the other way round, since the RTF document must include additional mark-up. Your second example is even further from the mark - you obviously can't create the required PNG file from a text file (or not easily), so it's not a good storage mechanism. However, given a well-defined XML document structure it becomes trivial to use XSL to transform it into a LaTeX document representing the same data, or an HTML document, or some other custom format. And XML is intended to be used for this very purpose.

    I've actually been through this process at work - we shifted from using a proprietary file format for our invoices to using an XML representation which can then be used to generate a range of views, from HTML for viewing on the intranet to LaTeX for printing and sending to customers. It's a great solution, and it means that we're not tied down to using LaTeX - at a later stage we can change to another document format, and all that needs to be changed is the XSL document for all of our invoices to be available in the new format. If the documents were originally stored in LaTeX format, we would not be able to do this - a change in the output format would require all the invoices to be re-entered (as was the case when we switched *from* the first proprietary format) or a large amount of custom code to be written.

    --
    ++ Say to Elrond "Hello.".
    Elrond says "No.". Elrond gives you some lunch.
  72. It may seem incredibly redundant... by Gendou · · Score: 4

    ...but I think XML is the clear answer here. XML is already very mature, can be used in a number of situations, and can incorporate more than just text.

    You can even embed binary data in an XML document (with a tiny bit of creativity) for all those people who like to populate their files with custom fonts, clipart, graphs, etc. (This is accomplished through something, say... <BINARY CLIPART><DATA>[image data]</DATA></BINARY CLIPART>. You get the idea.)

    How about special configuration parameters? You could incorporate tags that would handle the way a document is viewed by different people ("are you a techie, marketing drone, webbie, etc" -> certain data becomes visible).

    The biggest advantages here are obviously the standards provided by XML (thank you W3C). It's uses are broad. It's got high quality interpreters on ALL platforms (especially JAXP for Java - it's a joy to work with *g*).

    The only standards we'd really have to focus on would be which tags would be considered "key" tags.

    What else do you need? Doesn't OpenOffice already use XML as it's standard document type?

    Sure I could be wrong on this, so don't berate me too much. I've just had a lot of positive experience working with XML for sooo many different applications.

  73. Re:Dangerous and Offensive??? What is standard? by elenchos · · Score: 2

    At the university writing lab where I "work" (well, they pay me, and sometimes I do things that involve effort), students are constantly bringing in Word documents that they can't open. There's a million reasons. They wrote it in Works. They took the disk out while the Word had the file open. Or the #1 reason: they wrote it in Word 2000. So they tour the university computer labs in search of something that will open their document. They try Word 97 on our machines, Word 2000 on the ones upstairs, different Word 2000 on the NT boxes in the engineering building. It goes on and on. Sometimes it works, sometimes not.

    Often I try to convince the English department to teach their students to use something more compatible, like HTML or RTF. There is little demand for images and tables, and when there is it is really part of a Powerpoint sedative or spreadsheet. But they always say, we have to use Word because it is the standard. Meaning that it is the universally compatible format that everyone can read. Now am I just crazy here? Don't answer that. If Word were in fact any kind of standard, why do we have the Tower of Babel with all these incompatible Word documents? Word may be the standard word processor, but there is no standard Word format. There are a dozen different Word formats.

    Everyone might as well use whatever weird word processor they like, and pass along a second copy in plain text every time they try to move the document to antother machine. The net effect would be the same.



  74. Try Paper by Anonymous Coward · · Score: 2

    Lasts hundreds of years, and OCR scanning keeps getting better and better.

  75. It's all about PDF by fooguy · · Score: 2

    I ran into this at work last year. We let people submit to us as DOC/WPD/PS (PS usually come from LaTeX) and we convert them all into PDF. There is an Acrobat reader for Win32/MacOS/Linux/Solaris etc etc. It has served us very well, as there is already a cross platform suite of tools available, and it allows for embedding graphics with text (which was a huge concern for us).

    --
    "All I ever wanted was to see Larry Wall give Bill Gates a Perl necklace."
    http://www.eisenschmidt.org/jweisen
  76. Re:XML, DTD & XSL can be edited with some neat too by rtaylor · · Score: 2

    How do you embed graphics, usable independent spreadsheet imports, audio among other things required for presentation / documents.

    XML might be great for text but keeping binary data in it isn't the best of plans. Size being the other issue, but if always saved / restored with gzip or something...

    --
    Rod Taylor
  77. TeX is what you want. by gimpboy · · Score: 3

    You shouldnt have to worry about the standards outliving TeX - Donald Knuth designed it what that in mind.

    TeX (and the LaTeX frontend) runs on about as many platforms as linux.

    the output of a tech document is quite frankly spectacular. you just dont get this kind of quality with the word processing programs that are out there today.

    many people thing the learning curve high, but this isn't necessarly so. my advisor says that LaTeX has a one paper activation energy. ie it takes you about one document to learn most of what you need to know to get things done... and once you use it you will find it hard to use anything else in the future.

    use LaTeX? want an online reference manager that

    --
    -- john
  78. Re:Reverse engineer the thing by hugg · · Score: 2

    The way most Word documents are embedded with objects, you'd almost need to reverse-engineer the entire Windows OS. Embed this Visio graph, this equation, this COM object. Bleh!