Slashdot Mirror


Opera CTO Hits Back at Microsoft's Standards Push

Michael writes "Opera CTO Håkon Wium Lie hit back today at Microsoft's push to fast track Office Open XML into an ISO standard, in a blistering article on CNET. He also took a swipe at Open Document Format: 'I'm no fan of either specification. Both are basically memory dumps with angle brackets around them. If forced to choose one, I'd pick the 700-page specification (ODF) over the 6,000-page specification (OOXML). But I think there is a better way.' The better way being the existing universally understood standards of HTML and CSS. Putting this to the test, Håkon has published a book using HTML and CSS."

246 comments

  1. Yes. Well... by Frosty+Piss · · Score: 2, Funny

    Opera CTO Håkon Wium Lie sort of "bitch slapped" a picture of Bill Gates and splashed some white wine around...

    Ok... Cheese, anyone?

    --
    If you want news from today, you have to come back tomorrow.
  2. fsck'n ugly by Anonymous Coward · · Score: 5, Insightful

    Yeah, but that "book" is fsck'n ugly. It doesn't even compare to a professionally typeset book, or something produced in LaTeX. I hope that isn't the "solution" to this standards "problem". Let's face it, the average Joe is going to use whatever Microsoft pushes at them. Case closed.

    1. Re:fsck'n ugly by AKAImBatman · · Score: 5, Insightful

      Yeah, but that "book" is fsck'n ugly. It doesn't even compare to a professionally typeset book, or something produced in LaTeX.

      You don't typeset with Microsoft Word, either. Which makes the entire argument specious. Word processors like MS Word and OOo Writer are for creating common documents like letters, memos, and maybe the occasional flyer. Neither one is particularly good at anything even close to professional publishing work. Even the book authors just use Word (or surprisingly, OOo Writer!) to do the text content. That text is then exported to a more sophisticated program, where the actual typesetting and page layouts are done.

      I think this fellow's point is that HTML/CSS formats can store any information that a Word Processor might need to store, with no need to invoke new technologies. To a certain extent, he may be correct. Unfortunately, HTML/CSS may make a good intermediary format, but it is not particularly good from a performance or usability perspective. Then again, XML formats in general are fairly poor choices for the same reason.

      I think if we want to break this conundrum, the industry is going to have to learn how to keep local data stores that are of high performance, while exporting intermediary formats when emailing or uploading to external computers. The only problem is finding a way of doing this so that it's completely transparent to users. The mythical "mom" doesn't want to worry about emailing a document in the right format, or having the right program to read the attachment she received. She just wants it to do what she tells it, with no bloody prompting with questions she has no answers for.
    2. Re:fsck'n ugly by Anonymous Coward · · Score: 3, Informative

      You're entirely right. Word/OOo aren't used for pro typesetting and page layout. But if we exclude that, then we still have many, many other formats, like RTF too (or why not even BBCode while we're at it?). Yes it's quite ugly, but I don't see (x)html + css as being the answer either:

      -too many versions of html (4, and perhaps 5 soon) and xhtml (1.0, 1.1, strict, transitional, etc)
      -different versions of CSS, browser support for it varies quite a bit (and is pretty much non-existent for CSS3)
      -too many rendering engines, css hacks required so the content displays the same in most of them, etc
      -html/css sucks at MANY things - how about a self-updating TOC? (don't even try to say some javascript parsing the DOM for header tags with certain IDs to generate it dynamically!) Hell, how can you even tell the page numbers in a html "document" anyways?
      -while word/OOo formats aren't real typesetting (like InDesign CS2 would do), at least they have half-way decent typography. Yeah, no fancy glyphs or super precise kerning, but it's still usable. On the web there's only a handful of "just OK" fonts one can use (unless everything is rendered server-side as images).
      -if people use html/css, there would basically be no standards *at all* or anything even resembling it (much like anything we see on the web). And I'm not sure the W3C is really going to help much here... Not that their recommendations are implemented very quickly (so many nice standards, but with basically no support e.g. xforms). And I'm not sure they're really being too helpful anymore either - more like slow and misguided IMO.

      At least with the new formats you're starting fresh, with the chance to have most features (like a Table of Content), and have them implemented properly. Mind you I'm not saying the new word/OOo XML formats are perfect - nor even the answer to the problem in the first place...

      And yeah, it's not like (x)html has angle brackets either ;)

      Looks to me like Opera has only one tool: a hammer (or is that a web browser?) and everything is strangely starting to look an awful lot like a nail?

    3. Re:fsck'n ugly by EvanED · · Score: 5, Informative

      html/css sucks at MANY things - how about a self-updating TOC? (don't even try to say some javascript parsing the DOM for header tags with certain IDs to generate it dynamically!)

      This would have to be done by the tool displaying it, same as a self-updating TOC in a Word or OpenOffice Writer document. The information is present in a correctly-structured HTML document in the form of Hx tags.

      Hell, how can you even tell the page numbers in a html "document" anyways?

      The same way you would in a Word document. It doesn't make sense if you're looking at it as a web page in your browser, but if your editor used HTML it would work the same way. (This also partially alleviates the rendering issues.)

    4. Re:fsck'n ugly by Anonymous Coward · · Score: 5, Insightful

      I don't see (x)html + css as being the answer either:
      Only because you can't tell the difference between "XHTML + CSS" and "web pages".

      -too many versions of html (4, and perhaps 5 soon) and xhtml (1.0, 1.1, strict, transitional, etc)
      So? Pick one as your word-processor standard, and rule all the others out. The existence of too many versions of MS Word doesn't seem to have hurt the .doc format.

      -different versions of CSS, browser support for it varies quite a bit (and is pretty much non-existent for CSS3)
      What does browser support have to do with word processing? We're talking about word processors, not web sites.

      -too many rendering engines, css hacks required so the content displays the same in most of them, etc
      And this is different from word processors how? Microsoft's XML format is absolutely crammed full of hacks to duplicate obscure rendering features of obsolete versions of Word, WordPerfect, etc. And it would surprise me very much if the rendering of ODF was pixel-identical between all the products that support it.

      -html/css sucks at MANY things - how about a self-updating TOC? (don't even try to say some javascript parsing the DOM for header tags with certain IDs to generate it dynamically!)
      You're thinking of web pages, not HTML. HTML used for a document could easily have an auto-generated table of contents. Remember that we're talking about using HTML as the file format for a word processor. A word processor can trivially parse the DOM for header tags and update a table of contents without requiring any JavaScript at all. It's kind of what word processors are for.

      Hell, how can you even tell the page numbers in a html "document" anyways?
      By looking at the little "Page N of N" display in your word processor, I would assume.

      -while word/OOo formats aren't real typesetting (like InDesign CS2 would do), at least they have half-way decent typography. Yeah, no fancy glyphs or super precise kerning, but it's still usable. On the web there's only a handful of "just OK" fonts one can use (unless everything is rendered server-side as images).
      What does "on the web" have to do with word processors? We're not talking about the web here. We're talking about word processors, which will have access to all the fonts the user owns, just like any other application.

      -if people use html/css, there would basically be no standards *at all* or anything even resembling it (much like anything we see on the web).
      Why not? We're talking about word processors, not the web. We're talking about computer-generated HTML, not something some 13-year-old hacked together by copying-and-pasting examples into Notepad. It would be trivial to enforce valid XHTML 1.1 + CSS2.1, for example.
    5. Re:fsck'n ugly by Anonymous Coward · · Score: 0

      "Let's face it, the average Joe is going to use whatever Microsoft pushes at them. Case closed."

      The average Joe isn't Microsoft's target. It is the business environment.

    6. Re:fsck'n ugly by Anonymous Coward · · Score: 0

      CSS3 isn't a standard yet. Why would any browsers really care about supporting a moving target?

      It's actually counter-productive, because people will write 'draft-css3' webpages that work in todays browsers, and expect the obsolete code to properly display in future browsers.

      A very BAD idea.

    7. Re:fsck'n ugly by vtcodger · · Score: 2, Insightful
      ***I think this fellow's point is that HTML/CSS formats can store any information that a Word Processor might need to store, with no need to invoke new technologies. To a certain extent, he may be correct. Unfortunately, HTML/CSS may make a good intermediary format, but it is not particularly good from a performance or usability perspective. Then again, XML formats in general are fairly poor choices for the same reason.***

      The M in HTML stands for MARKUP. And it means it. HTML is NOT a layout language. Never has been, and apparently never will be despite unending attempts to use it for page layout. In fact, HTML documents look different in every browser -- which is not, I think, a characteristic that most users are going to desire for a large subset of documentation. How, for example, can you specify a an OCRable form, if the rendering program is free to move the damn boxes around?

      If someone would like to propose an standards based HTLL that focuses on document layout, they have my support. I don't care if it is XML based. Just that it works, is reasonably concise, everyone uses it, and that it replaces PDF as a vehicle for specifying documents that need to be rendered pretty much exactly as the author specified them.

      While I'm sure that an HTLL specification would be lengthy, I don't think it needs to embody every quirk of every version of Microsoft Word or Open Office.

      --
      You can't see ANYTHING from a car, You've got to get out of the goddamned contraption and walk...Edward Abbey
    8. Re:fsck'n ugly by Lost+my+low+ID+nick · · Score: 5, Insightful

      So, McSmarty, how do I
        - position an image on page 4 of my document?
        - add footnotes?
        - embed fields (date, last editor...)?
        - mark the embedded TOC as TOC so that it gets regenerated on reload?
      etc.

      And on the CSS side, there are quite a lot of shortcomings, too.

      Of course, all of this would work with custom XML tags or special id/class conventions, BUT then you'd have to specify those. And getting this below 700 pages won't be easy.

      So repeat after me:

      HTML is *not* a description language suitable for word processing in its current state, and it is unclear it can be made so without sacrificing device indepence.

    9. Re:fsck'n ugly by dom1234 · · Score: 1

      Let's face it, the average Joe is going to use whatever Microsoft pushes at them. Case closed.

      There is a contradiction here : to close the case is a very strange way to face it, isn't it ?

      You said it at first : let's face it. The fatalist way (i.e. the "case closed" attitude) is incompatible with progress.

    10. Re:fsck'n ugly by lahvak · · Score: 2, Interesting

      The M in HTML stands for MARKUP. And it means it. HTML is NOT a layout language. Never has been, and apparently never will be despite unending attempts to use it for page layout. In fact, HTML documents look different in every browser -- which is not, I think, a characteristic that most users are going to desire for a large subset of documentation. How, for example, can you specify a an OCRable form, if the rendering program is free to move the damn boxes around?


      I think that's why he says HTML/CSS. HTML takes care of the markup, while CSS supposedly takes care of the layout. I am not absolutely convinced that it actually works, but notice that the basic idea behid LaTeX is exactly that. LaTeX is supposed to be really just a markup language (I know it doesn't actually work that way), and document styles and packages are supposed to specify the layout.

      If someone would like to propose an standards based HTLL that focuses on document layout, they have my support. I don't care if it is XML based. Just that it works, is reasonably concise, everyone uses it, and that it replaces PDF as a vehicle for specifying documents that need to be rendered pretty much exactly as the author specified them.


      The question is, do we really need to replace PDF? I think the article was about replacing wordprocessor formats, not PDF. They are two very different things.
      --
      AccountKiller
    11. Re:fsck'n ugly by TheRaven64 · · Score: 5, Interesting
      I had a little go at using HTML for this kind of thing a few years ago. One thing that you might not be aware of is that CSS has a few things related to pagination. While you can't say 'put this image on page 4,' you can say 'if you need to put a page break in, put it before or after this div, so that this text and this image are on the same page.' For the table of contents, I wrote some ECMAScript that scanned the DOM tree for h1-4s and built a set of nested lists to display it, with links to the real headings. It didn't print the page number because, although this is possible with CSS it wasn't implemented in any browsers when I tried it. The embedded fields are already supported by meta tags in the document head. Footnotes, however, are a tremendous pain to get right with HTML.

      I just dug out the template I wrote, and the pagination and ToC worked fine in Safari. The auto-numbering of headers, however, didn't. This is due to a lack of support for counters in generated content, and the same problem with Mozilla was a significant reason for abandoning the whole idea in the first place; the only browser everything worked in was Opera.

      Another significant reason for abandoning this idea (not entirely relevant when talking about document formats being generated by tools) was that HTML is a huge pain to type, and XHTML is even worse. Something semantically equivalent to XHTML but using S-expressions would have been fine, but typing XHTML just involves spending far too much time hitting > and < keys (not to mention the redundancy of close tags having the full tag name). I turned to LaTeX, which is easier to type and also (being a Turing-complete programming language) much easier to extend than HTML.

      --
      I am TheRaven on Soylent News
    12. Re:fsck'n ugly by EsbenMoseHansen · · Score: 4, Informative

      So, McSmarty, how do I
      - position an image on page 4 of my document?

      You don't, nor do you want to. But you can anchor, float or bind the images to the text easily enough. This would be handled by css... for the HTML side, it would just be div and object tags --- not that you would ever see them, since this is an word app.

      - add footnotes?

      <p class="footnote">My footnote</p> with the appropriate CSS rule (presumably something like float: page or whatever.)

      - embed fields (date, last editor...)?

      Using XML entities, presumably

      - mark the embedded TOC as TOC so that it gets regenerated on reload?

      Regenerated on reload? Come on, have some ambition.. it should be in sync at all times. Anyway, by keeping tracks of the header tags, presumably.

      HTML is *not* a description language suitable for word processing in its current state, and it is unclear it can be made so without sacrificing device indepence.

      XHTML+CSS would need some expansions... but probably not much. A good layout program propably doesn't care about the device, but if it did, there are already @media tags to handle this situations. There are also a couple of other truly dedicated layout namespaces on w3 to consider.

      But all this matters not. This is politics. Sadly.

      --
      Religion is regarded by the common people as true, by the wise as false, and by rulers as useful.
    13. Re:fsck'n ugly by Mateo_LeFou · · Score: 3, Insightful

      "The mythical "mom" doesn't want to worry about emailing a document in the right format, or having the right program to read the attachment she received. She just wants it to do what she tells it, with no bloody prompting with questions"

      No offense, but I'm getting sick of this line of reasoning. You're right, mom wants the computer to read her thoughts, know exactly what she really meant when she said X, anticipate every need she might have, and pre-calculate its complexity out of existence.

      In other news, my boss would like this entire website built in one hour ($40), never need support, and scale to 300,000 users.

      At a certain point IT's job goes from "give every user what heshe wants" to "educate users about what is feasible in the current technological situation.

      --
      My turnips listen for the soft cry of your love
    14. Re:fsck'n ugly by SnapShot · · Score: 1

      I think you touch upon a point that I'd like to reiterate even clearer: word processing and page layout are two different things. Even after reading the article, I'm not really sure what's being proposed here; are they proposing HTML+CSS as a page layout format or as a word processing format? I think the technology as it currently stands could make a reasonable word processing format but would be a stretch as the foundation for a page layout application.

      However, I'd like to remind everyone of two points of history. One, HTML itself evolved from SGML which has more than enough flexibility to handle Navy service manuals (which is where I saw it used). They weren't exactly fun reads, but the level of complexity of the documents exceeds what you are going to see in your average magazine layout and they were being handled just fine by early-90s technology parsing huge SGML documents. Two, word processing documents have been "markup" documents since the very beginning. What was WordPerfect for DOS but a way to mark up a text document (/b for bold right, I can't even remember any more)?

      --
      Waltz, nymph, for quick jigs vex Bud.
    15. Re:fsck'n ugly by bigpat · · Score: 1

      HTML is *not* a description language suitable for word processing in its current state, and it is unclear it can be made so without sacrificing device indepence. Tell that to over a 100 million websites. Yes, most of them are ugly, but so are most Word docs. The only thing inherently lacking in html/css is a standard way of making the collection of files (image, text/html, css) that make up a document into a compressed portable document. Something like a standard .htd extension put onto a zip compressed file that contains all of the files that make up a document. Though maybe there is something already out there like this that just needs more standard application support such as standard behavior when you choose "save as" from a web browser. And it seems that every other so called shortcoming would be better handled by a good browser/word processing application.

    16. Re:fsck'n ugly by beyondkaoru · · Score: 1

      ok, i don't know much about computer generated html, but of the little computer generated xhtml i've seen, none of it has ever validated. xhtml is a lot simpler than html, but for some reason people have a hard time doing it. web browsers, even firefox, don't do a lot of the xml stuff that they should do, like xlink embed. ie afaik doesn't do xml at all. this kinda saddens me, since xlink is pretty cool, and could do a lot of the stuff people use javascript for, but without having to use javascript.

      --
      the privacy of one's mind is important.
      you do have something to hide.
    17. Re:fsck'n ugly by TheoMurpse · · Score: 1

      I think this fellow's point is that HTML/CSS formats can store any information that a Word Processor might need to store, with no need to invoke new technologies.
      There's a lot that the current HTML/CSS standard cannot do (I'm not including CSS 3.0, since it's not even finished yet, let alone supported). In particular, there is no vertically wrapping text. You cannot have multiple columns in HTML and CSS which will dynamically wrap to a second (or third) column. That's a pretty important thing in word processing.
    18. Re:fsck'n ugly by TheoMurpse · · Score: 2, Funny

      Looks to me like Opera has only one tool: a hammer (or is that a web browser?)
      Actually, I think it's a high-pitched voice capable of shattering glass.
    19. Re:fsck'n ugly by TheoMurpse · · Score: 3, Insightful

      So, McSmarty, how do I
          - position an image on page 4 of my document?
          - add footnotes?
          - embed fields (date, last editor...)?
          - mark the embedded TOC as TOC so that it gets regenerated on reload?
      I'm on your side in this debate, but as a web dev I have knowledge over these things which you apparently do not. To embed a field, how about <meta name="author" content="TheoMurpse">. As for marking the embedded TOC, how about <div id="TOC">? For positioning an image on page 4, well, I don't know if you've ever looked at a DOC or ODT file, but the file itself says nothing about where page 3 ends and page 4 begins. Instead, you see that once the word processor has rendered the file. Thus, I see no difference between HTML and any other format. Hell, I don't even know if you can say "put this on page 4" in a LaTeX document. First of all, you'd never want to put it on page 4. Instead, you'd want to put it in between other elements, which may end up placing it on page 4, but then when you update your text on page 3, it may cause the image to need to be on page 5.

      Footnotes are easy, too: Text Text that needs a footnote.<div class="footnote">This is the footnote</div>. That's the same concept as in LaTeX, the best typesetting software out there.
    20. Re:fsck'n ugly by AKAImBatman · · Score: 1

      The M in HTML stands for MARKUP. And it means it. HTML is NOT a layout language. Never has been, and apparently never will be despite unending attempts to use it for page layout.

      CSS *is* a layout language, however. It works in conjunction with HTML. In theory, at least, pure CSS/XHTML should layout the same in every browser. (Ftom a practical perspective, of course, no one fully implements the CSS standard.)

      In any case, that's beside the point I was trying to make. My point is that a 200 page HTML/CSS file is an unweildly beast that will take forever to make changes to. Which is not surprising, given that markup languages are serial formats. Microsoft Word gets around this problem by using a binary format that can be updated without rewriting the entire file. (Thus why Microsoft Documents have a nasty habit of getting bigger rather than smaller.) Markup formats don't have this advantage, and rely on heavy processing power and fast I/O to make up for their deficiencies.

      That's why I say that (X)HTML/CSS makes a great intermediary format, but it's not so good as a local data store.
    21. Re:fsck'n ugly by AKAImBatman · · Score: 1

      No offense, but I'm getting sick of this line of reasoning.

      Too bad. Because it is a real problem. When "mom" wants to give someone a movie, she only needs to give them a DVD. That's a standard. When "mom" wants to listen to music, she only needs a CD player. That's a standard. When "mom" wants to send a letter, she uses U.S. Letter paper with English words on it that fits into a Standard Size Envelope. That's a standard.

      When "mom" wants to send a document through email, she either has to use the MS Office psuedo-standard forced upon the industry, or she needs to thoroughly understand which format is which in an attempt to ensure that the person on the other end knows how to read the file. When she receives a file, if it's not in the same MS Office psuedo-standard, she can't read it.

      Like it or not, that's a real problem and Microsoft has the industry by the balls. If a standard interchange format existed, then "mom" would be able to use just about any computer and program, with the expectation that the computer will be able to read/send in that interchange format.
    22. Re:fsck'n ugly by jfengel · · Score: 1

      I used to do book evaluations for a major technical publisher. I got to see the manuscripts as the author wrote them.

      I reviewed one book by Jef Raskin, one of the designers of the Mac and an expert on user interface design. How did he write his book? Double-spaced Courier and hand-drawn illustrations.

      They were going to typeset it later and have the illustrations done by a pro anyway. If I'd known that I'd have saved myself all the work I put into formatting mine, which Word butchered on a regular basis.

    23. Re:fsck'n ugly by Mateo_LeFou · · Score: 1

      I understand your argument but think it needs fine tuning.

      "When "mom" wants to give someone a movie, she only needs to give them a DVD"

      True in the most common situation: sending within the DVD region. But mom might have kids in other regions, at which time she has to learn a tiny bit about formats. Music and paper letters also require you to learn something before you click on the "advanced" tab, so to speak.

      "When "mom" wants to send a document through email, she either has to .."

      See I think of this as an uncommon case. 90% of the time, mom's email is plain text. 90% of the 10% of the time that she attaches something, it's a single image file. Our current crop of technologies does a *good job of making the basic 99% of use-cases into no-brainers.

      Don't get me wrong; I'm am 100% in favor of a great, open document standard being widely dispersed. But I don't consider it "a big problem" insofar as I don't believe mom is often sending out spreadsheets with complicated macros in them. It's a problem for power users, and they'll probably solve it this year.

      --
      My turnips listen for the soft cry of your love
    24. Re:fsck'n ugly by John+Nowak · · Score: 1

      Too bad. Because it is a real problem. When "mom" wants to give someone a movie, she only needs to give them a DVD. That's a standard. When "mom" wants to listen to music, she only needs a CD player. That's a standard. When "mom" wants to send a letter, she uses U.S. Letter paper with English words on it that fits into a Standard Size Envelope. That's a standard.

      Yes, very good.

      When "mom" wants to send a document through email, she either has to use the MS Office psuedo-standard forced upon the industry, or she needs to thoroughly understand which format is which in an attempt to ensure that the person on the other end knows how to read the file.

      Nope. When your mom wants to send a file, she attaches it to an email via a standard method. This isn't the problem.

      The problem with your logic is that your mom probably doesn't want to send a file in the first place. She wants to send the content. An email editor could (should?) have an "insert document" button that would use the copy of Word on your machine to convert the file to RTF (which generally works quite well) and send it as an rtf email (which is fairly well supported). Alternatively, attaching a Word file could present a dialog: "Send unaltered" or "Convert to compatible format and set" (with better text for those two options). The problem isn't the standard per se -- the problem is the email interface.

    25. Re:fsck'n ugly by jfinke · · Score: 0

      As far as I know, and I may be wrong, there is no way to specify a page break in html or any of its varients. While I realize that this is a typesetting measure, it is still pretty needed.

    26. Re:fsck'n ugly by AKAImBatman · · Score: 1

      The problem with your logic is that your mom probably doesn't want to send a file in the first place. She wants to send the content. An email editor could (should?) have an "insert document" button that would use the copy of Word on your machine to convert the file to RTF (which generally works quite well) and send it as an rtf email (which is fairly well supported). Alternatively, attaching a Word file could present a dialog: "Send unaltered" or "Convert to compatible format and set" (with better text for those two options). The problem isn't the standard per se -- the problem is the email interface.

      That's exactly what I was getting at. There should be a common interchange format. It shouldn't even ask whether the file should be converted ("mom" doesn't know the answer), it should just do it. If everyone agrees on the format, then it will Just Work(TM). Which makes "mom" very happy indeed. :P
    27. Re:fsck'n ugly by Anonymous Coward · · Score: 0

      You don't typeset with Microsoft Word, either. Which makes the entire argument specious. Word processors like MS Word and OOo Writer are for creating common documents like letters, memos, and maybe the occasional flyer.
      And the "book" in question is too ugly to serve as a letter or memo, much less as a flyer, a white paper, or a even a weekly status report to the boss. It sucks shit through a straw.
    28. Re:fsck'n ugly by cecil_turtle · · Score: 1

      Text that needs a footnote.<div class="footnote">This is the footnote</div>

      Then all the new word processors need to agree on the terminology for these new pseudo-elements (css classes in your example) that aren't already defined in HTML. Sounds an awful lot like adding another layer on top of this fictional xhtml+css proposal, i.e. a new standard. Then Microsoft wants to use a different class name in their word processor, or thinks they should be wrapped in spans instead of divs, and we end up with another standards war. Rinse, repeat.

      I'll stick with ODF, thanks.

    29. Re:fsck'n ugly by Anonymous Coward · · Score: 0

      Yes you are wrong. You can specify page breaks in css with page-break-before or page-break-after

    30. Re:fsck'n ugly by tangohotel · · Score: 1

      ever heard of an inline stylesheet? It doesn't matter what the class definitions are called as the inline stylesheet embedded within the document will have that information.

    31. Re:fsck'n ugly by vtcodger · · Score: 1
      ***The question is, do we really need to replace PDF? I think the article was about replacing wordprocessor formats, not PDF. They are two very different things.***

      That's certainly a fair question.

      Answer -- I'm not sure. The indictment against PDF is fairly lengthy. Adobe's PDF offerings are slow, bloated and not especially reliable. XPDF often doesn't work. For that matter, neither does Foxit, but at least Foxit will run on an older PC without consuming all the available resources. Many PDFs aren't text searchable. Text recovery from PDF documents that permit text copy is often a total shambles -- looks more like OCR with insufficient queues to the OCR software than anything reasonable

      I think that PDF is used primarily for two things. It is used to specify documents where the author doesn't want things moved. Basically -- Here's the layout, no will you please (Godammit) try to adhere to it as closely as is physically possible when rendering/printing? Perfectly reasonable request. I'm not sure that PDF is a particularly good answer. It's just better than the alternatives. That's what I think a real layout language might help.

      The second use for PDF is here's a document that was produced for some other purpose -- e.g. for a scientific journal or whatever. How to make it available digitally without rewriting it from scratch? I don't think PDF does that at all well, but it may be better than the alternatives. I'm not even sure what the alternatives are.

      --
      You can't see ANYTHING from a car, You've got to get out of the goddamned contraption and walk...Edward Abbey
    32. Re:fsck'n ugly by cecil_turtle · · Score: 1

      An inline stylesheet? Are you kidding? The xhtml+css proposal was supposed to be easier than ODF / OOXML, thank you for proving my point that it isn't.

      Even with an inline stylesheet there would still need to be an agreement as to what, in our current example, a "footer" is. So that when you go to Format | Page or whatever in any word processor they would be able to identify that element as a footer. I thought I was pretty clear that the GP's example just happened to use classes but the same theory would apply no matter what your method (2nd e.g., inline styles).

    33. Re: fsck'n ugly by frisket · · Score: 1

      Exactly. Håkon is almost completely correct, but for the wrong reasons. If we exclude the few people who really understand pointy-bracket markup, the traditional author just wants to be able to type stuff and edit it. They don't give a tinker's spit about file structure or formats and never will, so if Microsoft or OpenOffice want to push their undistinguished XML formats as "the" save-formats, so be it. There are currently no XML editors suitable for use by people who do not grok pointy bracket markup. Only when we have an editor capable of detecting the author's intentions and silently adding the appropriate markup will any use of meaningful XML become possible at that level (claimer: yes, this is my PhD topic :-)

  3. Classic quote for the books, gotta love XML play by Tablizer · · Score: 5, Insightful

    "Both are basically memory dumps with angle brackets around them."

  4. Is it mature enough? by Goalie_Ca · · Score: 3, Interesting

    HTML and CSS are quite capable of rendering and displaying webpages. What happens with a simple thing like a file header showing page number and author name. Footers with footnotes? How about dealing with table of contents etc. How would a page in a document be broken down? Anyone who's tried to print HTML knows there are many issues with layout. What's sad though is that even HTML and CSS is not supported the same in all browsers.

    I'm a latex junkie. Latex though is a PITA to create templates and styles for. Someone willing to take up the task to modernize latex or completely replace it?

    --

    ----
    Go canucks, habs, and sens!
    1. Re:Is it mature enough? by willy_me · · Score: 5, Informative

      I'm a latex junkie. Latex though is a PITA to create templates and styles for. Someone willing to take up the task to modernize latex or completely replace it?
      Done. It's called ConTeXt.
    2. Re:Is it mature enough? by Anonymous Coward · · Score: 0

      Latex though is a PITA to create templates and styles for.

      With great power comes great learning curve.

      As for the rest of the post, I agree. Even as designed, there are lots of things CSS is incapable of on websites that are realistic. People like to thump their chests and talk about how tables are obsolete, but I have yet to see an honest replacement for tables for tabular data, and the table CSS is incredibly weak. If I'm generating data on the fly, I have to cache the entire data so that I can ensure the columns are the correct widths to display the data, since there is no notion of "fit to data" beyond "just make random columns that never look the same way twice". Of course, the measurements I make are meaningless should the user override my font setting, leading to the tables looking even shittier. I could just use proportional columns, on a widescreen display that just leads to shit like a column containing a two digit number being 3 inches wide, but hey, at least the table has rounded corners.

    3. Re:Is it mature enough? by lewp · · Score: 1

      http://www.alistapart.com/articles/boom describes how they handled that stuff using CSS2 and proposed CSS3 features.

      I'm not writing a book any time soon, and if I were I wouldn't take this approach, but it is an interesting read.

      --
      Game... blouses.
    4. Re:Is it mature enough? by 8-bitDesigner · · Score: 1

      CSS is mature as a standard, though it's lacking a good implementation of many of its features. Web browsers really haven't had much reason to properly implement CSS print standards and layouts, so most browsers are optimized for printing table-based layouts.

      I have no doubt that you could print a book with CSS/HTML or manage a Word style document, but you'd need a platform that holds the spec much tighter than the current crop (and yes, that includes Firefox).

    5. Re:Is it mature enough? by the_womble · · Score: 1

      Context has its weaknesses too.

      For example, it cannot produce print and HTML versions of the same document. This may not matter to everyone, but it was something I needed, so I stuck to Latex.

    6. Re:Is it mature enough? by tomstdenis · · Score: 2, Interesting

      The trick to using LaTeX safely is automation. The less TeX twiddling you have to do manually the better.

      For me, I write my user manuals [for my FL/OSS projects] in LaTeX because the layout is much better, and the process much simpler than wrestling with a word processor.

      Why anyone writes books in anything else is beyond me.

      My first book [math text] that was published was all LaTeX, and while it wasn't all super simple the vast majority of the layout and setting work was handled by TeX itself. My second book [crypto text] was written in Word [required by publisher] and the tables were not set properly, equations look like shit, etc.

      Word processor == memos, letter home to Grandma.

      Typesetter == Papers, Books, Print material

      Tom

      --
      Someday, I'll have a real sig.
    7. Re:Is it mature enough? by tomstdenis · · Score: 1

      Little tip, if your book has any technical edge to it, learn LaTeX and do the layout/setting yourself.

      Being in control of the layout and setting is very important if you value your creation at all. ... Just saying. not bitter.

      Tom

      --
      Someday, I'll have a real sig.
    8. Re:Is it mature enough? by sweetooth · · Score: 2, Insightful

      Keep in mind this was published by a bigwig at Opera. The Opera web browser tends to stay way ahead of the other browsers in terms of standards compliance. This includes things like the ability to use the page elements to force page breaking and to help create layouts useful for things like books, reports, etc. Opera is a great engine for rendering HTML & CSS, I personally just can't get past the UI.

    9. Re:Is it mature enough? by indiechild · · Score: 4, Informative

      Tables are not obsolete. Tables are still used for tabular data, which is what they were originally intended to be used for, and that has not changed.

      Tables shouldn't be used for page layout -- that's what CSS is for. It's as simple as that.

    10. Re:Is it mature enough? by Dutch+Gun · · Score: 0

      It is possible to build a new format on top of the universally understood HTML and CSS. Awesome. So he wants us to use a language originally designed to display text, images, and hyperlinks in a context-sensitive manner as a document format? That's just what the industry needs... another house built on a foundation of sand.

      To show that it's possible, Bert Bos and I published a book using HTML and CSS. File | Export to Web? Am I missing something here? What does that prove?

      Funny, he didn't mention that he "wrote" the book in HTML, just that he "published" it in HTML. I can publish my book to HTML, PNG, text, paper, Braile, and Morse Code. It doesn't make that an appropriate universal document format though.

      Look, I agree with his derision of MS and their hypocritical format games. But he really should have known where to quit. The specifications are just binary dumps with brackets around them? I think he stretches his credibility a bit there.
      --
      Irony: Agile development has too much intertia to be abandoned now.
    11. Re:Is it mature enough? by EvanED · · Score: 2, Informative

      File | Export to Web? Am I missing something here?

      Yes, the fact that he used a program called Prince to generate a reasonably professional-looking "book". Not "printed web page". Book.

      Funny, he didn't mention that he "wrote" the book in HTML, just that he "published" it in HTML.

      "It is now possible, even feasible, to use HTML as the document format for books." (Granted, that's two links off the /. summary.) But "To prove how powerful it can be, the authors decided to use CSS in the production process" is following only one link.

      That PDF posted above was generated entirely from an HTML + CSS document.

    12. Re:Is it mature enough? by MrNaz · · Score: 4, Funny

      You mean you display tabular data *without* tables? Dude, you missed the point in a big way. Like say for example Andre Agassi was serving a tennis ball at you, by "missed" I mean he was serving the ball on a court in California while you were standing waiting to receive on a court in Florida.

      --
      I hate printers.
    13. Re:Is it mature enough? by Anonymous Coward · · Score: 1

      That must be one of the crappiest websites and presentations I've ever seen.

      What do they want? Me downloading and checking out their list of files that they dumped on that site? How about some info or web design instead?

      Oh and that color. Must...not...get...eye...cancer.

    14. Re:Is it mature enough? by Anonymous Coward · · Score: 2, Informative

      If you want to stay in Latex use the memoir document class.

    15. Re:Is it mature enough? by lahvak · · Score: 1

      There isn't actually any inherent reason in ConTeXt that would prevent it from doing that. It just that the tools have not been created yet.

      --
      AccountKiller
    16. Re:Is it mature enough? by lahvak · · Score: 1

      HTML and CSS are quite capable of rendering and displaying webpages. What happens with a simple thing like a file header showing page number and author name. Footers with footnotes? How about dealing with table of contents etc. How would a page in a document be broken down? Anyone who's tried to print HTML knows there are many issues with layout. What's sad though is that even HTML and CSS is not supported the same in all browsers.


      All of these are problems with browsers, not the actual file format. What's the difference between hjkhkjlh l and \footnote{hjhjklhljk}?

      I'm a latex junkie. Latex though is a PITA to create templates and styles for. Someone willing to take up the task to modernize latex or completely replace it?


      http://www.latex-project.org/latex3.html

      --
      AccountKiller
    17. Re:Is it mature enough? by TheRaven64 · · Score: 2, Informative

      Why anyone writes books in anything else is beyond me. I couldn't agree more. I am currently writing a book, and I can't imagine how people use tools like Word. It has a lot of technical content, particularly code snippets. With LaTeX, I can easily insert a few lines from a code file, and have it automatically syntax highlighted. I never have to worry about copy-and-paste errors, since the source code is included directly from the source files, which I can compile and test.

      I can also define short commands like \code{} for inline code snippets (e.g. variable or structure names) and then decide how I want them typeset later. I have a \note{} command defined, that puts the note in square brackets, blue underlined when I compile a draft, and doesn't display it at all when I compile a final version.

      The other nice thing about LaTeX is that it works with all of my standard development tools. I can keep my document in subversion, and have human-readable output to svn diff. I have a Makefile that makes generates all of my inclusions (e.g. graphs from gnuplot, images from OmniGraffle) and then typesets my document.

      The only problem with LaTeX is that it's not really a well-defined format. A LaTeX document is basically a program that generates a document. My source code if pretty easy to read, because I use English words for typesetting commands and then define them in my document class. Without the document class file, someone would be able to extract most of the semantic information from my source, but they would find it hard to generate my output.

      --
      I am TheRaven on Soylent News
    18. Re:Is it mature enough? by tomstdenis · · Score: 0, Offtopic

      Ah, hubris of the troll. Congrats you're upset in public. What did that accomplish? People don't need folk like you to make up their minds about others. Or are you saying everyone on slashdot is stupid and needs your deep insight to figure out what's what?

      Give me a call when you grow up kid. Maybe you have a fighting chance at being a decent human being, but right now it's hard to tell.

      Tom

      --
      Someday, I'll have a real sig.
    19. Re:Is it mature enough? by bcrowell · · Score: 1

      How is ConTeXt better?

    20. Re:Is it mature enough? by VENONA · · Score: 1

      Only place I disagree is "Word processor == memos, letter home to Grandma."

      Neither of these typically needs word processor overhead, unless you need graphics (logo or whatever) in your memo. I have templates for both, for use in my text editor, which is pretty much always running, anyway. The memo template never gets used any more, though. It's been completely subsumed by email.

      Plain old text is also going to remain readable for much longer than any word processor file format. If any of those letters might need to be read fifty years from now, even (maybe especially) from sentimentality, a lot of people might be wishing they were ASCII.

      About the only thing I use a word processor for any more is printing envelopes--and that's a legacy thing I just haven't bothered to fix. A few OOo envelopes for stuff that has to go by snail mail, monthly.

      --
      What you do with a computer does not constitute the whole of computing.
    21. Re:Is it mature enough? by Deadguy2322 · · Score: 0

      Decent or not, I'm less of a conceited asshole than you are. Oh, by the way, if you want to sign your bullshit posts, use the fucking signature for it. You are an ignorant maritimer stallmanite cocksmoker, and I hope your tiny, pathetic genitals slowly rot away. P.S. Fuck off and die in a fire.

      --
      Check out my foes list to see who is so retarded that they can't use the signature line!!!
  5. huh? by User+956 · · Score: 4, Funny

    Putting this to the test, Håkon has published a book using HTML and CSS.

    Uhm. I'm no expert, but isn't a book that uses HTML and CSS called a website?

    --
    The theory of relativity doesn't work right in Arkansas.
    1. Re:huh? by 8-bitDesigner · · Score: 5, Informative

      Actually one of the highlights of the CSS spec is support for non-standard display types, such as screen readers, projectors, PDA, and yes, print. CSS is a rather brilliant standard, but since W3C hasn't really seen fit to publish a reference platform for it, there's no real compliance checking in the major browers.

    2. Re:huh? by EvanED · · Score: 1

      Not really when it's rendered into, say, a PDF formatted like any book. Based on that sample, I'd say that if that's not a (proof-of-concept) book possibly nothing you download is. It looks pretty reasonable.

      Did you RTFA?

    3. Re:huh? by natrius · · Score: 1

      Isn't a book that uses Microsoft Word's .doc format called a Word document?

      A document doesn't turn into a physical book until you hit print. The book itself is about the content, not the physical form. Dive into Python is a book that I just happened to read online in web page form.

    4. Re:huh? by hixie · · Score: 1

      Given how much difficulty all the browser vendors (who are working on this full time) have had getting their CSS implementations right, I don't see how we could expect the W3C to make a perfect implementation... Some things are just hard to get right. Dynamic typographic layout of the kind CSS allows is one of them.

    5. Re:huh? by kestasjk · · Score: 2, Insightful

      CSS would be a great standard, but it leaves too much to the people who implement it; is this a block type or inline? What should the default for this nonstandard tag be? etc, etc.

      If they spelled everything out without any ambiguity it would make a better standard.. but then it would be another "600 page long" standard with is what he seems to be against in the first place.

      --
      // MD_Update(&m,buf,j);
    6. Re:huh? by Anonymous Coward · · Score: 0

      FTA:
      Less than a year after CSS (Cascading Style Sheets) became a W3C Recommendation, Microsoft co-submitted the competing XSL (Extensible Stylesheet Language) to the World Wide Web Consortium.

      I wouldn't buy a book from someone who spews out crap like this: he clearly doesn't understand these 2 technologies well enough to appreciate their differences.

    7. Re:huh? by CaymanIslandCarpedie · · Score: 1

      Agreed with all that CSS is very useful, but as much as I like the Opera browser it seems this guy is suffering a bit of "if your a carpenter you think every problem can be solved with a hammer" syndrome.

      --
      "reality has a well-known liberal bias" - Steven Colbert
    8. Re:huh? by atamido · · Score: 1

      HTML4, and even more so XHTML, most certainly have all of the block/inline information well defined in the spec. It may not be very readable, but it is all there. What isn't there are default values for stuff like padding and margins. One of the results of this is that Firefox and Internet Explorer have different methods to indent list items, and both are technically correct.

      That said, there are a number of features in any basic word processor that simply aren't feasible to get done. Granted most of these would work much better in a real typesetting program, but your average user really just needs/wants what is offered in Word/etc.

    9. Re:huh? by General+Wesc · · Score: 1

      'What should the default for this nonstandard tag be?'

      Your objection is that the standard doesn't specify defaults for non-standard tags? How about no default, because it's not part of HTML?

    10. Re:huh? by daft_one · · Score: 0

      Well that's just stupid. Sometimes you need a maul.

  6. Borat's here by cntlzed · · Score: 1, Funny

    FTA: Kazakhstan recently joined the relevant ISO group.

    OMG, Borat is teaming up with Steve Ballmer to spew out 6000 page docs!! Run for cover!

    1. Re:Borat's here by User+956 · · Score: 0, Offtopic

      OMG, Borat is teaming up with Steve Ballmer to spew out 6000 page docs!! Run for cover!

      You should run for cover, because he's spewing chairs with those documents.

      --
      The theory of relativity doesn't work right in Arkansas.
  7. CSS for Documents? by zaydana · · Score: 5, Insightful

    Having a word processor act more like a web browser would be awesome. Ever since I started using word processors (which for me was a long time after I started using web browsers), i've always thought, why doesn't updating this style make all text with that style update? Why do I always have to change the same thing over and over again?

    While turning word processors into web browsers would be stupid, things like CSS would be awesome to have in word processors.

    1. Re:CSS for Documents? by athakur999 · · Score: 1

      WordPerfect has had a feature like this for years called "show tags" (I think). It'd show you where formatting markers started and stopped (similar to an HTML source listing). It was pretty useful. I'd love to see OpenOffice incorporate a feature like this (if it doesn't already).

      --
      "People that quote themselves in their signatures bother me" - athakur999
    2. Re:CSS for Documents? by Coryoth · · Score: 3, Informative

      Ever since I started using word processors (which for me was a long time after I started using web browsers), i've always thought, why doesn't updating this style make all text with that style update? Why do I always have to change the same thing over and over again?

      Such things exist. TeX provides a decent the base for such things, so it's a matter of finding a TeX centric editor. LyX would be a good example, and indeed it has the sort of functionality and general approach to document creation that you seem to be after. Of course it doesn't necessarily have all the other features that other word processors might have (like mail merge or what have you).
    3. Re:CSS for Documents? by the_womble · · Score: 2, Insightful

      Latex: its not that hard to learn.

      Lyx provides a GUI front end, but you lose a lot of flexibility.

      Texmacs might work for you as well, although I found it very clunky.

    4. Re:CSS for Documents? by eelke_klein · · Score: 1

      Word processors have had such features for years. They are called styles. If you apply to each chapter title the same style and later change the style all the chapter titles will be changed immediatly. You can even couple things like chapter numbers to such styles.

    5. Re:CSS for Documents? by EvanED · · Score: 1

      ...i've always thought, why doesn't updating this style make all text with that style update? Why do I always have to change the same thing over and over again?

      Both OpenOffice and Word have this, they're called styles. Word has had it since at least version 6 and maybe before, though at least before 2007 (which I haven't used so can't comment on) they haven't done much to bring attention to the feature. OO has had it since the beginning, and puts rather more emphasis on it.

    6. Re:CSS for Documents? by EvanED · · Score: 1

      Hmm, looks like I was beaten to the punch. Feel free to mod me redundant.

    7. Re:CSS for Documents? by Antique+Geekmeister · · Score: 3, Interesting

      Indeed: LyX is extremely handy for providing to undergraduates or research assistants whose thesis advisors insist on using TeX or LaTeX, who lack the time to learn yet another language. LyX is the difference between having slightly more elegant .tex files, and getting an hour more of sleep a night when writing your thesis because you can edit in a GUI and don't have to debug your .tex files.

      I am finding myself wishing that OpenOffice had pursued putting a vastly better interface on TeX and LaTeX, rather than writing their own standard. It would probably have been faster and certainly would have been a lot more stable. Microsoft couldn't have even thought about it: its clean, open standards would not have lent themselves to the proprietary "extend" part of Microsoft's "embrace and extend" approach, or Microsoft's software licensing models.

    8. Re:CSS for Documents? by Anonymous Coward · · Score: 0

      i've always thought, why doesn't updating this style make all text with that style update

      In current versions of Word, at least, it does. If you open the Styles sidebar and modify a style, it applies to the whole document. If you click a formatting widget, you're making a local change, so the correct behavior is to change just that text.

    9. Re:CSS for Documents? by fredboboss · · Score: 1

      I've been searching for an easy and fast way to publish documents.
      In the places where I worked I saw people were loosing too much time
      with text processors. Instead of focusing on content they were fiddling
      with fonts, colors, ... style. This usely lead to poor content.

      What is needed is separation of content and style :
      text then style, content then presentation.

      Opera CTO is right with HTML and CSS,
      but they might not be appropriate for publishing.

      PDF is a good format for publishing and it is widely used but
      it is missing the template feature.

      Lyx can do the trick but template customization and generation
      is not easy for non (latex)latex aware people (non programmers).

      I've searched for such a mean to separate text and style, you just type text,
      then your style is automatically applied, check my web page :
      http://fredboboss.free.fr/pyfpdf/index.php

      At last you get a PDF ready for publishing, this is just a proof of concept,
      but the idea is that you can keep your content an change the style
      whenever needed.

      I don't know what ODF format consists in, but a good document format
      for producing document ready for publishing would allow seperation of
      content and style à la CSS stylesheet so that users can just concentrate
      on text.

      fred

    10. Re:CSS for Documents? by Haeleth · · Score: 2, Informative

      Latex: its not that hard to learn.
      But it is tricky to use for any language other than English. Out of the box, it's English or nothing. Other European languages are complicated; more complex languages like Arabic, Hindi, or Chinese require some very involved hacks indeed.

      It can be done, some of the time, but it's very, very easy to mess up. I have tried numerous times to get Japanese support, using one of the several special Japanese versions that exist (it seems it simply can't be done with standard TeX), and only once did I manage to generate a DVI - which I was unable to convert to a usable format, because doing so always stripped out all Japanese text, for some reason I never managed to fathom.

      And this is all fair enough, because TeX was written to scratch Knuth's itch, and therefore it does what Knuth needed very well: it's brilliant for typesetting English and mathematics. Unfortunately that doesn't make it the solution to all the world's typesetting problems.

      I hate to say it, but "inferior" products like MS Word, OpenOffice.org, etc. have supported Arabic, Hindi, Chinese, and Japanese perfectly for as long as I can remember. Largely because they use Unicode internally, rather than one of the numerous inadequate and non-standard encodings that TeX and its derivatives rely on.

      To be fair, there's a Unicode version of TeX called Omega or some such. I'd doubtless have found it very useful if I'd ever managed to get it to work at all.
    11. Re:CSS for Documents? by Constantine+Evans · · Score: 1

      LyX is the difference between having slightly more elegant .tex files, and getting an hour more of sleep a night when writing your thesis because you can edit in a GUI and don't have to debug your .tex files.

      That depends. LaTeX does have a very steep learning curve, but an adroit user can write in LaTeX itself far faster than in LyX (and faster, I believe, than in nearly any other system), especially for papers with heavy math content. The \def and \newcommand commands are extremely useful in this regard: simply define things that you will need often, and then use them. This can mean the difference between:

      \begin{equation}
      r = 1 + \epsilon\sum_{n=0}^{\infty}\sum_{m=-n}^{n}
      \left( F_{nm}(t)\mathcal{Y}_{nm}(\theta,\phi)\right)
      \en d{equation}
      and

      \beq
      r = 1 + \e \sumni\sum_{m=-n}^{n} \lt( \F{nm} \Ynm{\t,\p} \rt)
      \eeq
    12. Re:CSS for Documents? by zsau · · Score: 1

      To be fair, there's a Unicode version of TeX called Omega or some such. I'd doubtless have found it very useful if I'd ever managed to get it to work at all.

      Take a look at XeTeX. It installed without a hitch on my computer (ppc debian) once I altered the Debian control stuff to compile against the TeXLive TeX packages rather than teTeX. Or if you run on a more normal platform (x86 ubuntu/debian/SuSE, MacOS X, maybe Windows) there's precompiled packages for you. It will use any OpenType (or TTF, or on OSX those Apple fonts) font you've got installed on your computer with complete (low-level) access to the special features, or higher-level access to most stuff via the fontspec LaTeX package. I quite happily no longer bother pissfarting around with stupid font packages. The only disadvantage I've found from XeTeX is that because it uses xdvipdfmx to convert to PDF you can't get special features from pdfTeX, and that it has the potential to make your input files platform-specific.

      As for Omega, it's a dead end; AFAIK what it's given us will at some point be intergrated along with the scripting language lua and pdfTeX into something called LuaTeX. I might be mistaken on that front.

      --
      Look out!
    13. Re:CSS for Documents? by zsau · · Score: 1

      In Word, modify your formatting toolbar. Get rid of almost everything from it, except for lists and the first dropdown (and the button before it). Click the button before it. Now you have a setup like mine (when I'm forced to use a word processor--I much prefer TeX). Use the styles. When you think "this would be better in red", just create a new style and format it as red.

      I've been doing this since Word 6.0, when I first used a GUI wordprocessor. Stylesheets aren't by any means a new thing: They're just one of the many features of them that most people don't seem to know about.

      --
      Look out!
    14. Re:CSS for Documents? by the_womble · · Score: 1
      Incidentally, the site linked to from my sig is generated from a latex file. I have some TCL scripts that parse the Latex and generate more Latex files for the index pages.

      I did it this was so that I could also do a print version from the same source document.

    15. Re:CSS for Documents? by Antique+Geekmeister · · Score: 1

      Sir or madam, I understand your point. I can write in C far, far faster than most object-oriented programmers can write in C++ or Java, and get far better performance out of fewer lines of code. But to do that, I had to learn C. Expecting the many casual document authors to write in a programming language instead of being able to "click on this and click on that" to make a statement in bold text, or change its font, or even make a list of elements, is asking a lot from a casual user.

      This stuff needs to work for casual users who are already pressed for time, or they won't use it.

    16. Re:CSS for Documents? by 1u3hr · · Score: 1
      , i've always thought, why doesn't updating this style make all text with that style update? Why do I always have to change the same thing over and over again?

      The idea of styles didn't orginate in CSS, it was used in page layout decades before the web. I use Ventura, which features this heavily, but PageMaker, Quark, etc all have styles.

      And actually, Word does too, but using styles correctly in Word is fraught with difficulty. The method of updating styles is capricious. I do DTP for a living, and when I get a new document from an author, I generally have to spend time, up to hours, sorting out the mess of styles into something logical. Users can hardly be blamed, the online help is atrocious, and none of the "how to" books I've looked at teaches how to use styles consistently. But this site has some pointers.

    17. Re:CSS for Documents? by 1u3hr · · Score: 3, Insightful
      though at least before 2007 (which I haven't used so can't comment on) they haven't done much to bring attention to the feature

      Word DOS (version 4 at least) had it back almost 20 years ago. And actually it was much easier to use styles back in the DOS version. Current versions try so hard to second guess you in the quest for user-friendliness and layering features on top of features that you can change or create new styles without knowing or intending to. Old-school required you to RTFA, but then you could use styles very efficiently. Now styles are much more sophisticated, but hardly anyone uses them correctly. I get docuements from all kinds of people, including many university lecturers. None, out of hundreds over the last 15 years, has had a clue of how to style their documents. Headings are "Normal" with font commands to make them large; body text is "Heading 1" converted to 12-point Times; bulleted and numbered lists are a minefield, tables are a quagmire of hacks, spaces and tabs, etc...

    18. Re:CSS for Documents? by Kjella · · Score: 3, Insightful

      Having a word processor act more like a web browser would be awesome. Ever since I started using word processors (which for me was a long time after I started using web browsers), i've always thought, why doesn't updating this style make all text with that style update? Why do I always have to change the same thing over and over again?

      Every word processor I've seen like forever has support for styles. The problem is:

      1) It's impossible to avoid creating a million new styles by accident. Try looking at the styles list and you'll see it's full of junk
      2) It's impossible to clean up a document with such a bunch of styles, for example say you have a document which has been completely fucked up with pseudo-styles. You've set "Normal" to be what the bulk text should be, and "Headings" to what they should be. What happened last time I tried it? Well, it was impossible to easily apply it without killing any bullet lists, bold, italics or any other intended variation of the normal text. Headers and numbering went beserk. Trying to do the same with the bullet list style lead to numbers going completely nutzoid, for some reason it thought everyone in the same style belonged to the same list so later lists would start at some random number.
      3) If you for some reason is stuck copying between different versions of Word (norwegian and english comes to mind) then you'll have double the number of styles, which obviously aren't in synch.

      So to sum it up what I would like:
      1) Don't auto-create styles
      2) This sentence does not contain three styles
      3) Sane "apply style" functions
            - Parituclary directed at fixing a mess
      4) Make styles have an ID, at least for the default ones make them international so header 1 is header 1 in every language
      5) Ability to "style-lock" documents for things like company standards, you can create new styles but not just randomly change around sizes and fonts
      6) More visible styles (OpenOffice does this, MS word doesn't) because people don't see them

      --
      Live today, because you never know what tomorrow brings
    19. Re:CSS for Documents? by aproposofwhat · · Score: 1
      Yow owd git!

      I too was using Word back in the DOS days - from Word 2, in fact - and have to admit that it was a damn sight easier to produce decently formatted and styled documents before all the WYSWIG crap arrived.
      It was so much easier when you had to think about how your final output would look before you started to write, rather than trusting (program x) to render whatever guff you had put on screen correctly.

      Having said that, a standard mark-up language for documents is a good idea, and given the choice of an open standard or a proprietary standard, the preference is obvious.

      --
      One swallow does not a fellatrix make
    20. Re:CSS for Documents? by TheRaven64 · · Score: 2, Informative

      But it is tricky to use for any language other than English. Out of the box, it's English or nothing. Other European languages are complicated; more complex languages like Arabic, Hindi, or Chinese require some very involved hacks indeed. Really? All of my LaTeX files are UTF-8, and most include some non-English characters. I tend to use the raw unicode, rather than the LaTeX sequences because they are easier to type on a Mac. I'm not using a custom version of LaTeX although I vaguely remember having to include a package that told LaTeX to use UTF-8. Things like Greek letters and accents just work. I've not tried Arabic, Hindi or Kanji, however.
      --
      I am TheRaven on Soylent News
    21. Re:CSS for Documents? by TheRaven64 · · Score: 1

      This is really obvious when someone uses Acrobat to generate a PDF from a Word document. Acrobat uses some style information to fill in the table of contents in the PDF bookmarks metadata section. If you don't use headings correctly, then you get all sorts of strange output from the PDF.

      --
      I am TheRaven on Soylent News
    22. Re:CSS for Documents? by jfengel · · Score: 1

      I've come to the conclusion that there is not a correct way to use styles.

      I've struggled with Word for years. I'd love to be able to define a hierarchical set of styles and manipulate them. But I've been unable to grasp what's really going on inside a style, and how the various dialogue boxes manipulate what elements of the style.

      Maybe there's a way to produce a clean set of styles and set them in a .dot file and never touch them again, but I've never seen it.

      It's vastly clearer in CSS, where the style is sitting in front of me, though it's a pain to have to cycle through an edit-render step trying to get what I want and the layout scheme is not well suited for anything except fixed-width pages (and doesn't even flow well there much of the time; I sure wouldn't want to write a book in it.)

      I finally gave up on Word and went to OpenOffice. It's even worse, but at least the price is right. And most of the time I've simply concluded that the proper thing is to simply do less with styling and just write the text.

    23. Re:CSS for Documents? by jbengt · · Score: 1

      and I've had Word documents crash and burn because the simple act of cutting and pasting in numbered lists can silently create new styles that are exactly the same as the copied style. Word only displays about 8 styles in its' little style dialog, though it can hold up to 255. When you get above 255 styles, Word gets very confused.

    24. Re:CSS for Documents? by metamatic · · Score: 1

      Ever since I started using word processors (which for me was a long time after I started using web browsers), i've always thought, why doesn't updating this style make all text with that style update?

      It does. Every word processor I've used has this feature--Apple Pages, Microsoft Word, OpenOffice Writer, even AppleWorks. You just have to learn to use it.

      Hint: Look up "style" in the help. (In OOo, hit F11 for a starting point.)

      --
      GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
    25. Re:CSS for Documents? by Stewie241 · · Score: 1

      You shouldn't. And you don't. Even in Word and in OpenOffice you can mark a heading with a particular style and then just change the style. Look up "Styles and Formatting". This is the beginning of the difference by semantic and syntactical markup. Most people use their word processors and mark up syntactically. They say, bold this selection of text and italicize that. There is no way for the word processor to know what is a title, what is a heading 1, heading 2, and thus no way of knowing how to to TOCs.

      If people use the tools properly, this is all possible. In short, there is something similar to CSS in word processors.

    26. Re:CSS for Documents? by Daengbo · · Score: 1

      This brings up a policy which I really hate that is in both MSW and OO.o -- changing a style doesn't remove other font attributes. If I copy something over from another source (quoting or even my own work) and some font attributes come with it (10pt angsana new), changing the style to "text" or "none" doesn't change the font or size. I wish that it did. Either use a style or don't. The only things that should be kept are bold, italics, and underline.

    27. Re:CSS for Documents? by lifebouy · · Score: 1

      The word processor you're looking for is OpenOffice and the key you'll be wanting to press is F11. Once upon a time the styles window was displayed by default, but the concept of styles confused the clueless, and so in more recent versions, you have to go looking for it.

      --
      Drop me a line at:
      Key ID: 0x54D1D809
  8. I don't know that I agree completely by Evardsson · · Score: 5, Insightful

    While I do agree that the ISO doesn't need more than one standard for printable documents, I don't think that Håkon Wium Lie is on the right track with HTML/CSS for print.

    Sure, it works, with enough tweaking, and CSS3, and a $350 download of a product to turn HTML/CSS3 into a PDF. This is better how? What about LyX, LaTeX, or even OpenOffice if you are just going to convert to PDF?

    The whole HTML/CSS-to-print thing shoots the real argument in the foot.

    --
    Death looks every man in the face. All any man can do is look back and smile. - Marcus Aurelius
    1. Re:I don't know that I agree completely by Rosyna · · Score: 1

      Sure, it works, with enough tweaking, and CSS3, and a $350 download of a product to turn HTML/CSS3 into a PDF. This is better how? What about LyX, LaTeX, or even OpenOffice if you are just going to convert to PDF?

      Yes, exactly. Instead of taking one of two specifications created just for rich document formats, he suggests making a brand new specification by extending CSS/HTML to do something it doesn't yet seem ready to do.

    2. Re:I don't know that I agree completely by EvanED · · Score: 1

      Instead of taking one of two specifications created just for rich document formats, he suggests making a brand new specification by extending CSS/HTML to do something it doesn't yet seem ready to do.

      Did you RTFA? He's not suggesting making NEW specifications because THEY ARE ALREADY MADE. There is a proof-of-concept "book" linked from the page that demonstrates that the technology to do this at least reasonably well already exists.

    3. Re:I don't know that I agree completely by nick.ian.k · · Score: 1

      Yes, exactly. Instead of taking one of two specifications created just for rich document formats, he suggests making a brand new specification by extending CSS/HTML to do something it doesn't yet seem ready to do.

      This talk of not creating new standards is ludicrous: there are already existing XML schemes geared towards this sort of task. Why should HTML/CSS be extended for publishing non-web documents when the work's already been done elsewhere? It gets even more ridiculous when you stop to think about all the bullshit that's been edged out of HTML over the past decade or so that we're *still* struggling to get people away from.

    4. Re:I don't know that I agree completely by Antique+Geekmeister · · Score: 1

      PDF would have been a candidate, but Adobe's licensing and that of ancestor, Postscript, are awkward to deal with. That's hindered their acceptance in other uses, such as Postscript display systems. (It could have been a superios display system to X, and much easier to display remotely.)

      But it hardly takes a $350 tool to handle: PDFcreator, available over at sourceforge.net, and the old Ghostview viewer both rely on Ghostscript to process PDF and work more quickly and reliably than Adobe's conversion tools, especially with mixed language documents. And they're both freeware.

    5. Re:I don't know that I agree completely by larry+bagina · · Score: 1

      PostScript (and PDF) have the adobe problem, but there is a better format that doesn't: DVI (the device independent format created by Donald Knuth). Just as NeXT took PS and made Display PostScript, and Apple took PDF and made Display PDF, the DisplayDVI project has been working on a windowing display system based on DVI. It's still beta, of course, but I've been using it without problems. It includes a rootless X-Window client, so legacy X-Window apps can be run with native DVI apps.

      --
      Do you even lift?

      These aren't the 'roids you're looking for.

    6. Re:I don't know that I agree completely by Evardsson · · Score: 1

      But it hardly takes a $350 tool to handle: PDFcreator, available over at sourceforge.net, and the old Ghostview viewer both rely on Ghostscript to process PDF and work more quickly and reliably than Adobe's conversion tools, especially with mixed language documents. And they're both freeware.

      I won't dispute the availability of free PDF creation tools, but do they understand and parse CSS3? That was the key element that made this particular implementation work. Considering that CSS3 isn't even a finalized specification yet, I would be more than a little surprised.
      --
      Death looks every man in the face. All any man can do is look back and smile. - Marcus Aurelius
    7. Re:I don't know that I agree completely by Kalriath · · Score: 1

      They usually don't parse anything. Usually, they are printer drivers which just print off whatever the browser hands them.

      --
      For a site about things like basic rights, Slashdot users sure do like to censor "dissent".
    8. Re:I don't know that I agree completely by panaceaa · · Score: 3, Insightful

      Why is anyone even talking about the opinion of a CEO? Opera is an HTML company -- they make HTML browsers. Why would the CEO of Opera have anything objective to say about OOXML or OpenXML? He wouldn't, which is why his pushes his own company's core competency: HTML. While Opera doesn't have a huge market share, if the market for HTML viewers grows, his company's likely to take a piece of that pie. But it's completely bunk because HTML's a mess of different standards, with many people using HTML 4.01 Transitional to this day, and the idea of people adopting CSS3 and writing documents using HTML is pretty far fetched. But you would never hear that from the CEO of an HTML browser company.

    9. Re:I don't know that I agree completely by Antique+Geekmeister · · Score: 1

      How interesting: does it have anything resembling the necessary performance for graphics to handle games?

    10. Re:I don't know that I agree completely by TheRaven64 · · Score: 2, Informative

      PostScript (and PDF) have the adobe problem, but there is a better format that doesn't: DVI (the device independent format created by Donald Knuth). The DVI format doesn't even have the capability to include bitmap images. LaTeX cheats and uses the comment section to point to an external encapsulated postscript file. dvips will read this and include the EPS, and so will some DVI viewers but this can lead to all sorts of hard-to-track-down bugs. I ditched latex for pdflatex a while ago, and haven't looked back.
      --
      I am TheRaven on Soylent News
    11. Re:I don't know that I agree completely by lahvak · · Score: 1

      Why is anyone even talking about the opinion of a CEO?

      Maybe because, surprising as it may be, part of what he says makes a lot of sense. I completely agree with him that the two proposed "standards" are both complete crap. And I also agree that HTML/CSS combination has, in principle, a lot of merit.

      A good portable document format should not have anything about internal representation of the document in the memory, neither it should any specific software, or even a specific version of such software, be mentioned in it. One thing that is great about HTML/CSS in principle is the separation of the content and layout. I agree with you that, mostly for historical reasons, HTML and CSS are currently a horrible vomitous mess, but so are current wordprocessor document formats. Taking HTML/CSS, cleaning it up, fixing some blatant problems and omissions, could very well produce a format that is far superior to both OO's and MS's formats.

      People won't write document in HTML/CSS now because there is no good tool that would make it easy. Should there be a wordprocessor like software that encouraged users to create well structured documents and (something Word fails at miserably, and Writer makes only a partially successful attempt) that saved documents in HTML/CSS, I bet people would use it.

      --
      AccountKiller
    12. Re:I don't know that I agree completely by Anonymous Coward · · Score: 0

      the link doesn't seem to exist... and google doesn't know about it. Where might I find it?

    13. Re:I don't know that I agree completely by Threni · · Score: 1

      > Why would the CEO of Opera have anything objective to say about OOXML or OpenXML?

      And the CEO of a company who tried to sell a browser! Talk about not having your finger on the pulse of the tech world.

      I'm sure his mother likes Opera and paid for it. It does a lot of things Firefox can't do. Right?

  9. indeed by game+kid · · Score: 1

    ...and sig'd in tribute.

    Even more classic perhaps, 'The "layer" element?!' Sure raised my eyebrow; a huge change from "Netscape engineers are weenies!" by any metric. :)

    --
    You can hold down the "B" button for continuous firing.
  10. No, NO. by game+kid · · Score: 0, Offtopic

    OMG, Borat is teaming up with Steve Ballmer to spew out 6000 page docs

    Steve squirts them out, remember?

    --
    You can hold down the "B" button for continuous firing.
  11. I don't think he gets it. by 8-bitDesigner · · Score: 2, Insightful

    Hmm... both of these standards suck. I know what, we need another choice!

    Somehow I don't think that's going to fix the problem. Oh, and pointing out that the Microsoft letter doesn't validate. Isn't that a little petty?

  12. How come? by ShaunC · · Score: 5, Funny

    If forced to choose one, I'd pick the 700-page specification (ODF) over the 6,000-page specification (OOXML).
    So I'd ask Håkon, "how come?" :)
    --
    Thanks to the War on Drugs, it's easier to buy meth than it is to buy cold medicine!
    1. Re:How come? by im_thatoneguy · · Score: 1

      I don't know. I would also like to know how you can evaluate the strengths and weaknesses of a system based solely by its size.

      Besides, how often is a human planning on parsing the files manually? If you ask me, the only purpose these open document file formats serve is to be opened by other word processors, which means as long as its standardized it could probably look like Chinese and it wouldn't phase me in the least.

    2. Re:How come? by vmcto · · Score: 1

      So you don't think an application developer wanting to do something as simple as text searches to provide document integration capabilities should be considered? Let the word processor people make our documents as incomprehensible as possible?

    3. Re:How come? by _Shad0w_ · · Score: 2, Insightful

      My speculation would be that no-one wants to sit and read a 6,000 page specification. 700 pages is far more palletable.

      It's a crap way of judging the relative merits of specifications, but human nature will out.

      --

      Yeah, I had a sig once; I got bored of it.

    4. Re:How come? by martin-boundary · · Score: 1

      "Fly aeroplane."

    5. Re:How come? by PCM2 · · Score: 4, Informative

      So I'd ask Håkon, "how come?" :)

      Since nobody gets it, I'll spoil it: That's how Håkon advises people to pronounce his name. It's even on his business card.

      --
      Breakfast served all day!
    6. Re:How come? by jlarocco · · Score: 1

      I don't know. I would also like to know how you can evaluate the strengths and weaknesses of a system based solely by its size.

      I don't think he was evaluating the strenghs and weaknesses by their size. He plainly said both standards suck, regardless of their size. Given that, if forced to choose which one to implement, it makes more sense to suffer through a "mere" 700 pages instead of 6000.

      Besides, how often is a human planning on parsing the files manually? If you ask me, the only purpose these open document file formats serve is to be opened by other word processors, which means as long as its standardized it could probably look like Chinese and it wouldn't phase me in the least.

      There are a whole bunch of reasons why the standard should be as simple and small as possible. Not so much that requirements get left out or "blurred", but there shouldn't be a bunch of unnecessary complexity..

      The most important reason is that the actual implementations will be done by several groups of developers, working on several unrelated products. The more complex or unclear the standard is, the more likely it becomes that different groups will interpret the standard differently. Since the point is to make an interoperable, standardized document format, it defeats the point if different products implement the standard differently.

      Also, the size and complexity of the standard is likely to have a direct impact on the number of bugs in the implementations. Microsoft can't get backward compatibility with Word 95 right, yet it's in their standard. Other projects wouldn't have a chance.

      Overall, I agree with the guy from Opera. Both standards are basically memory dumps with brackets. Using HTML+CSS for publishing doesn't make sense, though.

    7. Re:How come? by 1u3hr · · Score: 1
      My speculation would be that no-one wants to sit and read a 6,000 page specification. 700 pages is far more palletable.

      Yep. You could certainly fit almost nine times as many on a standard pallet.

    8. Re:How come? by RzUpAnmsCwrds · · Score: 1

      My speculation would be that no-one wants to sit and read a 6,000 page specification. 700 pages is far more palletable.


      You don't need to. The only people who read entire specifications are their authors and the standards bodies. As a designer, you only care about the parts of the specification that you are responsible for implementing.

      ODF is unfortunately rather incomplete in some areas. There are no specifications for which spreadsheet formulas have to be implemented, or how they are implemented. Tables aren't allowed in presentations. There is no scripting language. Nor is there any mathematical layout description (ODF relies on MathML, which is another 665 page specification).

      Once you add the missing functionality and specification dependencies in, ODF starts to look a lot more like OOXML.
    9. Re:How come? by manastungare · · Score: 1

      Occam's Razor: "All other things being equal, the simplest solution tends to be the best one." As detailed at Groklaw, ODF reuses other standards such as SVG, Dublin Core and XLink, while MS tries to bring up new ones. I believe much of this contributes to the bloat in the spec, and will translate to bloat in an actual implementation.

    10. Re:How come? by Anonymous Coward · · Score: 0

      The problem is that ODF doesn't lead to a cohesive standard. For one thing, those 700 pages include references to thousands of pages of other standards, so that page count thing just doesn't fly. Can you even list all of the standards that it references (and the ones those reference)? I bet you can't produce a list of links to download everything I'll need.

      For another thing, those standards were all written in a vacuum, so they don't interrelate. If you want a fancy box around your math equations, you're in trouble because SVG doesn't support arbitrary text blocks (certainly not MathML), nor does it support the ability to have an element sized to fit its contents (i.e. you want the box to grow if you make the equation bigger). What if you want to track changes in your equations? MathML doesn't support that either.

      Of course you could always extend these standards to support every feature you have, but then you're not compliant with the standards anymore. Plus, you probably end up with one way to add comments to MathML, another way to add comments to SVG, and a third way to add comments to regular text.

      Microsoft's single cohesive standard actually looks pretty good in this light. At least I have only a single document to reference and a single model to understand.

      dom

    11. Re:How come? by im_thatoneguy · · Score: 1

      My guess, however, is that all 6,000 nor all 700 pages are *required* on every document. I would assume the complexity scales with the complexity of the document.

      Sure you might not be able to perfectly reproduce the document with only 500 or 600 lines of each standard, but it should be enough to get the general idea.

      Take this example from a simple MS-Word document.

      After only about 20 lines defining all of the classes. It hits the document:

      Hello World. This is a short sentence to test Word XML.Here is a new paragraph.

      After that it goes on to define a million other things such as my margins etc.

      If you wanted to say... "search a document" such as another poster requested. Just search for hte text that *isn't* in brackets. There you go, now search that. "" Simple enough. Include that in your basic spec.

      Looking at the XML document: about 75% of this example is either A)Defining HTML safe font equivalents or else B)Defining the parameters of the document's styles such as "header" "norma" "title 1" etc...

      I'm no programmer but it wouldn't take me a whole lot of time to write a basic parser.

    12. Re:How come? by im_thatoneguy · · Score: 1

      Whoops missed preview, that was inept mouse work on my part.

      >>After only about 20 lines defining all of the classes. It hits the document:

      <w:body><w:p><w:pPr><w:ind w:firstLine="720"/></w:pPr><w:r w:rsidR="00E94934"><w:t>Hello World.  This is a short sentence to test Word XML.</w:t></w:r></w:p><w:p><w:r w:rsidR="00E94934"><w:tab/><w:t>Here is a new paragraph.</w:t></w:r></w:p><w:sectPr w:rsidR="00E94934" w:rsidSect="008A7339"><w:pgSz w:w="12240" w:h="15840"/><w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w:gutter="0"/><w:cols w:space="720"/><w:docGrid w:linePitch="360"/></w:sectPr></w:body>

      >>After that it goes on to define a million other things such as my margins etc.

      P.S. it did illustrate though how easy it would be to parse a document and search for text.  HTML parseing does it automatically.  The horror!

    13. Re:How come? by guruevi · · Score: 1

      Well, I know your pun was intended, but for the people that don't understand the play on names here goes his answer:

      Because ooxml is NOT an open standard. ODF is a truly open standard which everybody can use and implement and extend. With OOXML you're again at the whims of a large industry monopolist and nobody wants that anymore do they?

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    14. Re:How come? by jlarocco · · Score: 2, Insightful

      I'm no programmer but it wouldn't take me a whole lot of time to write a basic parser.

      Well, the basic parser isn't really an issue. I haven't investigated either standard in any detail, but assuming they're actual XML, or even reasonably close, there are a million libraries that can handle the parsing. Expat, Xerces, Arabica, the Qt XML parser, and the Java library XML parser come to mind.

      The majority of the work is interpretting the tags and actually laying out the document in a standardized way. I can already load a Word *.doc file into OpenOffice and have it look relatively close to how it looks in Word. The reason it only looks "close" is that the Word .doc format isn't documented, so developers for competing products have to make some assumptions and ignore some stuff they can't figure out. Having a standard is supposed to rectify that.

      Ideally, with a standardized format, a document would look identical in any word processor that supported all the features used by the document. That's the whole point.

      Off-topic: I absolutely hate when people make statements like "I'm no programmer, but I could write that software in very little time." Contrary to popular belief, programming isn't trivial. Sure, you may be able to write the parser in very little time, but would other people want to maintain your code? Would people reading your code even be able to tell what you're doing? Would it have fewer bugs than code written by actual programmers? Would it be fast enough? Would you know what to do if it wasn't? Sorry, it's just a pet peeve.

    15. Re:How come? by jlarocco · · Score: 1

      P.S. it did illustrate though how easy it would be to parse a document and search for text. HTML parseing does it automatically. The horror!

      Searching for text in an XML file is trivial. You don't need the standard at all. grep or any text editor will be more than happy to find a string in text. The difficult part is searching for text in a meaningful way. How would you find "Hello World" in a heading? How would you find it in a footer? How about in a list nested in a paragraph on the second page? The only way to sensibly do that is to have software that understands the tags.

      You're vastly underestimating the amount of work that would have to be done for the 700 page spec, much less the 6000 page spec.

      As you pointed out, for a simple, one sentence document there are "about 20 class definitions", and "a million other things" that the software would need to understand. It's undoubtedly more complex for larger documents.

      As a side note, the effect you noticed wasn't HTML parsing, it was /. stripping out everything betwen a less than and greater than sign.

    16. Re:How come? by CrimsonScythe · · Score: 1

      That's how Håkon advises people to pronounce his name.

      Really? I wonder if he knows that they are pronounced completely differently.

      Though, come to think of it, they may sound alike if spoken with a really heavy Norwegian accent.

      --
      The view was horrible and the smell was even worse; Julie severely regretted becoming a proctologist.
    17. Re:How come? by im_thatoneguy · · Score: 1

      Continuing off topic...

      I agree the code would be garbage I probably wouldn't even be able to make heads or tails of it in a few months and nobody would want to use it. However my point is, the task is so simple, it can be understood by someone with a cursory understanding of the subject.

      For instance: a man wanders into a restaurant bleeding profusely from a wound. You might expect someone to exclaim "I'm no doctor but we should probably put a bandage on that wound and try to stop the bleeding!" I'm sure a surgeon could a better job even with the equipment at hand but it's a perfectly valid remark. I know it's nigh on impossible, but perhaps that is one pet peeve that could benefit from some reexamination.

    18. Re:How come? by jlarocco · · Score: 1

      For instance: a man wanders into a restaurant bleeding profusely from a wound. You might expect someone to exclaim "I'm no doctor but we should probably put a bandage on that wound and try to stop the bleeding!" I'm sure a surgeon could a better job even with the equipment at hand but it's a perfectly valid remark. I know it's nigh on impossible, but perhaps that is one pet peeve that could benefit from some reexamination.

      That's a flawed analogy. They'd put a bandage on and then presumably call an ambulance or take the guy to the hospital precisely because they're not docotors. The doctors and surgeons at the hospital would be the ones deciding if the bandage was enough and fixing the problem if it wasn't.

      The entire software industry has a hard time estimating schedules and complexity. But hey, maybe we're all wrong, and your cursory understanding is enough for you to know better.

    19. Re:How come? by PCM2 · · Score: 1

      Really? I wonder if he knows that they are pronounced completely differently.

      No, it's pretty close. The A with a ring over it in Scandinavian languages isn't really pronounced like an A. It sounds more like a kind of O.

      --
      Breakfast served all day!
    20. Re:How come? by im_thatoneguy · · Score: 1

      You're in a spacecraft with 2 small children. The spacecraft is leaking air. You exclaim "I'm no engineer, but I bet a piece of duct-tape would patch that up." You apply a strip of duct tape and the leak is stopped.

      This is the exact same analogy. If the air stops leaking. You've found a solution. I'm not suggesting it's a good one. But that individual *has* found a solution.

      Now if you were an engineer I'm sure you would wander in and exclaim! See now this is the problem with the world, all of these non-engineers failing to understand the difficulty of designing a high quality long term, maintainable solution to a problem!

      The point is, the amateur found a solution. The whole point of the "I'm no ___ but ______" statement is not to imply that the be all end all solution has been found it simply means. "If I can figure this out, I'm sure someone with more understanding and experience should be able to manage just fine." Which was my point all along, and you missed 2 times now.

      Specifically my original comment was to response to someone who exclaimed something to the effect of 'Woe is me! I wouldn't even be able to write a simple text search algorithm with these new standards!' If that were the case. That individual would be incompetant if I could manage well enough.

      So sit on your high pedestal of superiority. But let me tell you if I can find a solution, I will sure as hell expect a professional to do as well or better.

    21. Re:How come? by jlarocco · · Score: 1

      Specifically my original comment was to response to someone who exclaimed something to the effect of 'Woe is me! I wouldn't even be able to write a simple text search algorithm with these new standards!' If that were the case. That individual would be incompetant if I could manage well enough.

      You just don't get it. It's *not* a simple text search. Your solution is useless. When you do a search in Word or OpenOffice, does it tell you that it found the text on line 597 of the xml file? No, it doesn't because that would be stupid and irrelevant. It has to know that line 597 of the xml file is actually inside a list, inside a paragraph, on page 3 of the fully laid out and formatted document.

      To know that, it has to understand the document format, all those "20 class definitions" and "million other things" you completely ignored while suggesting your "solution" a few posts up. If it doesn't take those things into account, it's not a solution. It's not even "almost" a solution. That's why the person you responded to was complaining, and why people are making a big deal out of the size of the specs. Implementing a spec that large is very difficult and a whole lot of work.

      So sit on your high pedestal of superiority. But let me tell you if I can find a solution, I will sure as hell expect a professional to do as well or better.

      Look, I'm not trying to put myself on a pedestal or make myself seem superior. I'm a computer programmer, not a rock star. But actual "professionals" are telling you the task is really difficult, and you're telling them that it isn't. That's usually a good indication that you're wrong.

    22. Re:How come? by CrimsonScythe · · Score: 1

      I realize that I should have added that I'm Norwegian, and that I've lived in the US for many years. "How" does not remotely sound like "hå", unless spoken with a heavy accent. The "å" sound is very close to the "oo" sound in "door". The closest I can come up with right now must be "hoakohn", though the last "o" isn't used much in the English language. Hope this clears it up.

      --
      The view was horrible and the smell was even worse; Julie severely regretted becoming a proctologist.
    23. Re:How come? by alnicodon · · Score: 1

      Out of curiosity, does this help:

              http://sk.idm.fr/skxml/

      If you were to look for text-in-nodes with attributes, I'd suggest you should go for XPath (quick, even for large document, at least in MSXML).

      The SkXML link above lets you do just that, but can take text flexions into account, and on large XML document collections. Well, it requires batched indexing, though :(

      Al.

    24. Re:How come? by PCM2 · · Score: 1

      Still, after about the 100th person walked up to you and said, "Hey, how's it going, Hack-On?? Or should I say Hake-On?" you'd probably think of something, too.

      --
      Breakfast served all day!
    25. Re:How come? by CrimsonScythe · · Score: 1

      Being as my last name is unpronounceable in English, I know the feeling very well. ;-) Still, teaching people a just another wrong way of saying his name doesn't make much sense to me. Then again, if he's happy with that, good for him. In my case, I just gave up and adjusted.

      --
      The view was horrible and the smell was even worse; Julie severely regretted becoming a proctologist.
  13. The proper term is by rdwald · · Score: 1

    Reveal Codes

    WordPerfect on Linux

  14. Validation is relevant by SanityInAnarchy · · Score: 2, Insightful

    Only problem is, the Oasis page itself doesn't validate. However, it seems Wikipedia does...

    But if the Oasis pages did validate, the basic argument goes like this: "How can they claim to care about standards if they can't even bother to support that most universal standard of standards, HTML?" And indeed, I could still make that argument -- just look at the sad, sad state of affairs that is Internet Explorer's CSS [mis]handling.

    --
    Don't thank God, thank a doctor!
    1. Re:Validation is relevant by Bert64 · · Score: 1

      Many things which people might want to do on a website, and which are easily accomplished using standard code, simply don't work on IE... Thus, many sites deviate from the standards to support IE users, since there are rather too many of them out there. The alternative, is sticking to the subset of the standard that IE does support, and having a reduced site.
      Look at the CSS homepage - http://www.w3.org/Style/CSS and select the blue shadow style, that used to be the default, but because it gets completely mangled by IE6 (not tried 7) they changed it because people thought the site was broken.

      --
      http://spamdecoy.net - free throwaway anonymous email - avoid spam!
    2. Re:Validation is relevant by SanityInAnarchy · · Score: 1

      Which would make it all the more poetic if my argument held. Standard code doesn't work on IE, and Microsoft completely ignored the issue until Firefox started to be a threat.

      --
      Don't thank God, thank a doctor!
  15. XML by infonote · · Score: 1

    Both standards follow XML, because XML helps documents become universally available whatever the device. HTML and CSS have limitations.

    --
    Visit http://www.kaizenlog.com
    1. Re:XML by 00lmz · · Score: 1

      Both standards follow XML, because XML helps documents become universally available whatever the device. HTML and CSS have limitations.



      Yes, because XHTML is not a form of XML and there are more implementations of Yet Another XML Format for "whatever the device" than there are implementations of HTML/XHTML and CSS for "whatever the device"...

  16. Is there a FOSS way to make PDF from XHTML/CSS? by Tracy+Reed · · Score: 1

    Prince is a commercial product. I have a minor need to produce PDF's from XHTML/CSS and I really don't want to deal with licensing. I would need to run it on a server where multiple people can access it which means I would have to pay $3800 for Prince. Ouch! I don't need to do this that bad. Is there any way to do this with Free/Open Source software?

    1. Re:Is there a FOSS way to make PDF from XHTML/CSS? by vtrac · · Score: 1, Informative

      I don't know if there's an automated way, especially because you run into the problem of differences in rendering. But, if you are on Linux, just install CUPS-pdf or on Windows, use PDFCreator (http://sourceforge.net/projects/pdfcreator/). Both are print drivers so you can use the HTML/CSS rendering engine of your choice (pick a browser), then print.

    2. Re:Is there a FOSS way to make PDF from XHTML/CSS? by Tracy+Reed · · Score: 1

      Yeah, I have considered that. Unfortunately for my application it would have to be a bit more automated/scripted which I don't think I could do using the renderer from a web browser. Thanks!

    3. Re:Is there a FOSS way to make PDF from XHTML/CSS? by Spliffster · · Score: 1

      HTML + XSLT + XML-FO. XML-FO is an apache project (or at least was 3-4 years back when I used it).

      Cheers,
      -S

    4. Re:Is there a FOSS way to make PDF from XHTML/CSS? by smoker2 · · Score: 2, Informative
      CSS not withstanding, you can use HTMLDOC to produce PDFs from html pages. If you are creating reports etc dynamically anyway, just create a temporary html file and convert it through HTMLDOC. I use Perl to generate reports and interface with HTMLDOC, but YMMV.
      An example of the HTMLDOC specific code used in the conversion :

      # Run HTMLDOC to provide the PDF file to the user...
      system "htmldoc --continuous --browserwidth 800 --landscape --size A4 --header ... --left 1in --embedfonts -f $fileref.pdf $filename";
      (the command is all on one line)This is running on RHEL 3 through Apache 2 and Perl 5.8.0
    5. Re:Is there a FOSS way to make PDF from XHTML/CSS? by Anonymous Coward · · Score: 0

      Although not the best solution ever designed, it is in fact quite doable.

      If you have, say, a php web-based back-end, you can easily have the user form-upload the desired doc to the server, and then use the com functions of the language to invoke either ms word or IE-mshtml objects and order them to open the uploaded doc, print it to the ghostscript printer and die, and feed back to the user the generated file.

      Things to take care of:
      - mshtml refuses to print without displaying the print dialog. you have to use wshell com obj to send it keypress events. It works best on a virtual machine, where no one logs out ever and auto-logon is enabled
      - locking issues: with the print dialog above popping up and not going away until finished, you have to serialize user requests for pdf conversion
      - security: for added security, the generated file should only be served to the same user session that uplodaded it, thus via php again, not giving the user a direct link to it
      - users pushing on the stop buton while pdf conversion is underway. your best bet is to have php ignore user aborts while in the critical section

      If the workflow is acceptable, you might opt instead to set up a queue of documents to be printed, and when they are done mail back to the submitter a link to the

      BTW: I have such code developed and in use. If you are interested, post back..

    6. Re:Is there a FOSS way to make PDF from XHTML/CSS? by zakkie · · Score: 1

      A while back there used to be html2pdf and similar utilities - I am not sure about their level of css support, but piping output of links through it might yield usable results. I'd opt for the perl version over the identically-named php one though.

    7. Re:Is there a FOSS way to make PDF from XHTML/CSS? by Jorophose · · Score: 1

      If you don't mind making the PDF's yourself and uploading (Rather than being dynamicly created as needed) OO.o has an "Export as PDF" function. Couldn't you just open the (X)HTML file and export as PDF?

    8. Re:Is there a FOSS way to make PDF from XHTML/CSS? by maxume · · Score: 1

      I don't know for sure, but it seems likely that those operations can be automated through the scripting interface.

      --
      Nerd rage is the funniest rage.
  17. Can != Should by gbulmash · · Score: 2, Insightful

    Been a long time since I typeset anything, but I used Adobe Pagemaker when I typeset a couple of college magazines in the mid-90s and FrameMaker when I was maintaining courseware in the late '90s for Nortel.

    HTML + CSS vs. Word vs. OO.o seems to me to be an argument related to formatting documents, not a "book". It's not that you couldn't do it, but I'd consider using Quark or InDesign (what seems to be Adobe's successor to PageMaker) or even Tex and its variants (haven't used any Tex-based stuff, but heard wonderful things) for typesetting.

    Arguments about standards aside, proof of concepts aside, I'd think that the real issue when it comes to any job is using the best tool for it. It's not a question of whether you can use these tools to typeset a book, but if you should.

    The point of the proof of concept is to prove that the system is flexible or capable enough to go beyond its original intended use. I get that. But proving a chainsaw can be used to spread butter, doesn't mean it's inherently superior to a coping saw.

    - Greg

  18. to kill a mockingstandard by mennucc1 · · Score: 2, Insightful
    An extract of H Wium arguments:

    ODF is an XML-based dump of the internal data structures of OpenOffice, while OOXML is an XML-based dump of the internal data structures of Microsoft Office.

    In 2006, a year or so after ODF entered the fray, Microsoft submitted OOXML to the standardization process. Are we seeing a pattern here? Is Microsoft undermining standards by submitting them? Could it be that it wants both ODF and OOXML to fail?
    so Wium proposes to build a new standard from scratch , starting from HTML and CSS ; but, recognizing that they would not cover all "Office" documents, he goes to saying

    Additional semantics (say, formulas in spreadsheets) can be encoded as attributes, as do microformats, and CSS 3 offers advanced features for printing (e.g., footnotes and header and footers).
    My thoughts:
    • Suppose MicroSoft were to listen to Wium (which they wont). Guess what ? Those additional fields containing formulas (and anything else that makes {MS,Open}Office much more useful than HTML) again would be just an XML-based dump of the internal data structures of so and so.
    • I dont like , more in general this article. Wium is saying that MicroSoft is proposing OOXML to kill ODF ; and at the same time he is proposing to kill ODF in favour of a non-existent extension of HTML+CSS. It is like the guy saying : "I dont like the power plugs in my new house, lets tear the house down and rebuild it" , and at the same time saying "why are they taking so much time to build the house?". Suppose MicroSoft would use arguments as those by Wium to convince ISO to reject ODF and then start a new draft based on HTML, drafted in cooperation between MicroSoft and other partners (including OpenOffice). That would really kill any hope of an ISO standard for "office" documents.
  19. He gets it. by Per+Abrahamsen · · Score: 1

    > Hmm... both of these standards suck. I know what, we need another choice!
    >
    > Somehow I don't think that's going to fix the problem.

    Depends on what you define the problem as. That there is too many "standards", or that all of them sucks. If the later, defining a new standard that does not suck solves the problem.

    1. Re:He gets it. by TheRaven64 · · Score: 1
      He is not proposing a new standard that doesn't suck. He is proposing a new standard that sucks, but is already partially supported in variety of slightly different, not quite compatible, ways.

      To anyone who doesn't think XHTML/CSS sucks, look up how many ways there are of saying 'red' in CSS. I was implementing a partial CSS parser a while back, and the specification seems to have been written by document authors with no thought to implementers.

      --
      I am TheRaven on Soylent News
  20. Why not HTML for books? by zerblat · · Score: 1, Insightful

    HTML sucks for books. The reason is simple. HTML was designed for web pages. HTML does a fairly good job of covering the things you need when you create a web page (although, why is there no , and a bunch of other stuff that need to be fudged by using elements that don't really fit). In HTML there is no , no , no , no . Also, with HTML, one file == one document. If you're writing a book, it would be nice to be able to for example have one file per chapter and include them all in a master file (assuming you're writing your HTML by hand, of course). That's not possible with HTML.

    It would be possible to extend HTML to include such features or to create a HTML-like format that is more suitable for books (cf docbook). I agree that "word processors" today are a horrible mess, and we definately need something like a modernised LaTeX, but HTML isn't it.

    --
    Please alter my pants as fashion dictates.
    1. Re:Why not HTML for books? by Anonymous Coward · · Score: 0
      XHTML2 fixes most problems. I've only browsed through the proposed standard, so I might be wrong.

      • It has <section>-elements that can be used to specify chapters
      • Elements can have a "src"-attribute, that can include content
      • Headers/footers/pages can be defined using CSS3
    2. Re:Why not HTML for books? by howlingmadhowie · · Score: 1
      how about

      ?

      I would however concede that footnotes would be a lot more difficult using this method.

    3. Re:Why not HTML for books? by Anonymous Coward · · Score: 0

      This is new? In the late 1990's, I used a technical publishing system which had the features you are asking for:
      automated Index, automated Table of Contents, Font selection, Graphics capability, Mathmatical Formulas, etc.

      Basic use of the product took very little training, yet advanced features were available.
      And in 1999, it began storing documents in XML ...

      See: http://www.encyclopedia.com/doc/1G1-55614156.html

    4. Re:Why not HTML for books? by maxume · · Score: 1

      Is that for a long paragraph or for a short chapter? "div" would seem more appropriate.

      --
      Nerd rage is the funniest rage.
  21. I think it is an important point by blind+biker · · Score: 1

    I think it is important that a document format is humanly-readable and understandable, so that one can get at least some idea of the layout of a document by reading the content of the file. I understand very well when he says "memory dump in angle brackets". Besides, anything that is humanly-"parsable", can be parsed by software, while the other way around is not usually the case.

    --
    "The agriculture ministry is not in charge of Gundam" - Japanese ministry official.
    1. Re:I think it is an important point by Anonymous Coward · · Score: 0

      Not really. English is humanly-"parsable" but it's too difficult to write software to understand English. File formats should be easy to parse by software and not necessarily by humans.

  22. Re:Classic quote for the books, gotta love XML pla by nigels · · Score: 1

    I also liked: "Microsoft--please--if you think standards are so important, why not start using them?"

  23. fonts by cybpunks3 · · Score: 2, Informative

    The problem with using HTML for publishing is that to this day there is no viable downloadable font system. So you are limited to a lowest-common-denominator list of 2-3 fonts like verdana and new times roman. With Flash and PDF you can do a lot more, but obviously authoring becomes a problem.

    1. Re:fonts by nick.ian.k · · Score: 1

      With Flash and PDF you can do a lot more, but obviously authoring becomes a problem.

      Well, maybe. Have you ever looked at font licenses? There are more than a few digital type foundries using licenses that expressly prohibit embedding outlines of the glyphs in other files. This is because the outlines of the glyphs (mathematically represented by the font software) are the one part of the font that's actually copyrightable; the actual glyphs themselves are not (this is why so many foundries have their own individual versions of Helvetica and so on). Not everybody's bad, but Linotype and Adobe have been rather outspoken about this over the last few years, though admitedly I haven't really seen them actually enforcing it too much.

    2. Re:fonts by foniksonik · · Score: 1

      This is entirely flawed logic. There is no 'downloadable' font system, true. BUT. You can assign any font you care to use within your CSS and ANYONE who has that font on their system will see the correct font being used. The standard for this is to assign your custom font and follow it with a common font and then a generic font.

      This is not a browser problem. It is a copyright and licensing problem. An average font costs $25 - $40 per typeface. If you were to be charged every time someone came to your web page you'd be screwed, regardless of whether such a system was implemented efficiently or not.

      --
      A fool throws a stone into a well and a thousand sages can not remove it.
  24. 6000 Pages, say 30 pages/day, =200 days by Anonymous Coward · · Score: 1, Insightful

    So say you can absorb 30 pages a day, thats 200 days to read the spec.

    Oh and the spec is defined to fit an existing product, so that product fits the spec and there are unspecified patent hurdles attached to it. Wow which idiot would fall for that one.

    1. Re:6000 Pages, say 30 pages/day, =200 days by Anonymous Coward · · Score: 0

      "So say you can absorb 30 pages a day, thats 200 days to read the spec".

      At which time the spec is obsolete. However there is a new spec of 7000 pages.

  25. Too true by iamacat · · Score: 3, Insightful

    700 pages is not understandable by anyone but authors. "C programming language" book is 1/3 in size, have endured for 20 years and was instrumental in solving many more problems than word processing. Also, creating an ODF document is a minor function in most applications and is not worth the effort to understand such a huge standard. Proponents of both standards should come up with a modular design instead. At the base level, stick with basic HTML - bold and italic tags, fonts and sizes, paragraph breaks. Define many extensions that can be implemented independently or in any combination, in a manner convenient for both computers and, in a pinch, humans. Opera guy is biased as well - while basic HTML is great at its limited function, CSS is not very readable by humans. Nor does it solve pagination, collaborative editing, resolution independence, color profiles for printing...

    1. Re:Too true by le_lotus_604 · · Score: 0, Offtopic

      aaahh Kernighan, Richie .. the black cover, always to remember !!! HTML, it stands for HyperText right !!

  26. I wrote my thesis book this way by Rudd-O · · Score: 3, Informative

    And it worked out great.

    http://software-libre.rudd-o.com/

    Used MediaWiki to write the chapters, wrote a small python proggie (available there) to consolidate the wiki into a single HTML file (mostly conforming to the Boom! microformat), then used Prince and Hakom's book CSS to generate the PDF.

    Great typesetting, collaborative book editing, screw LaTeX!

    Hakom was right.

    --
    Rudd-O - http://rudd-o.com/
    1. Re:I wrote my thesis book this way by AlXtreme · · Score: 2, Insightful

      Great typesetting, collaborative book editing, screw LaTeX!

      Those who don't understand LaTeX are doomed to reinvent it... poorly.
      --
      This sig is intentionally left blank
    2. Re:I wrote my thesis book this way by mithras+the+prophet · · Score: 1

      Your book looks nice, but apparently some part of the toolchain scrambled all the text into some kind of indecipherable gibberish. ;)

      --
      four nine eighteen twenty-7 thirty-nine forty-7 fiftyeight sixty-nine seventy-9 eighty-8 one-hundred-and-nine one-twenty
  27. Scribus? by Anonymous Coward · · Score: 0
    1. Re:Scribus? by greenguy · · Score: 1

      Good, glad to see I'm not the only one wondering why Scribus hasn't been mentioned (even if the other person is an AC).

      I publish a PDF magazine, I wouldn't use anything except Scribus to lay it out. I'm considering a print-CSS version, but I'm confused why anyone would make a print-CSS version and then convert it to PDF. To me, this seems to miss the point of both formats.

      --
      What if I do the same thing, and I do get different results?
  28. Maybe... by Bellum+Aeternus · · Score: 1

    If Html+Css offered a better model instead of the box model (example the point-line model) and offered some way of doing basic data structures I'd agree. The current box model is very limiting in its layout abilities.

    Modern documents have so many binary data types inserted in them (images, fonts, etc.) that Html+Css isn't enough. It isn't even enough on the web and that's why Javascript and Flash are so prevalent. There needs to be another specification to support all the needs/wants of the users (who are not willing to go backwards for any ISO standard).

    --
    - I voted for Nintendo and against Bush
    1. Re:Maybe... by mattyrobinson69 · · Score: 1

      Modern documents have so many binary data types inserted in them (images, fonts, etc.)

      Firefox and Opera (iirc) both support base64 encoded binary data being stuffed inside img src. I cant remember if IE supports it or not, or if its in the actual standard though. Not that I agree with using html for typesetting.

  29. !RTFA by Anonymous Coward · · Score: 1

    Okay, I may have read a tiny bit, but it's irrelevant.

    The question is `what does it do and how well does it do it?'

    Should we desire for screen displayed content be equivalent to printed to published? No.

    Do we desire for the three to have accessible translations between each other? Yes. And note that it goes both ways. I want to be able to throw an e-mail on a webpage and throw a doc on an e-mail and throw all three in a book and so forth. As more technology becomes available and therefore new mediums develop I want to be able to throw stuff in them and throw them in stuff.

    Now sure, a book isn't that accessible to go back to digital (yet). Why not? Throw a barcode system in the books. They have an index and a table of contents and chapters and a bibliography. Why not a `printed information to digital information code' (PITDIC)?

    But we're talking digital document formats.. which means that there should be what? What is the equivalent of having a barcode system in a book to allow a quick scan to give you all the data contained therein as a digital contents with relevant tagging/metadata? Uhh, dumbass. It's the metadata and the content, together!

    Which is what XML and ODF and whatever dreck Microsoft has tried to override ODF with. Now, I have not studied the specific formats (so yes, you can call me an ass for being biased against MSFT), but if they are properly designed then they are a step in the right direction. What is properly designed? It's when they are simple and basic and elegant enough that we can change down the road and not break the old.

    We're not children anymore, humanity. It's time we think about how to do that. And before you bitch at me, note there's a difference between being obsolete/technologically inferior and broken. Broken is when I can't access my old stuff. Obsolete/technologically inferior is when I don't want to.

  30. ODF a memory dump of OOo? by renoX · · Score: 1

    I wonder if it's true, after all there are two implementations of ODF: OOo and KOffice, it'd be interesting to hear KOffice developers on the subject.

    Recently I hear a criticism of ODF by Miguel de Icaza is that ODF doesn't reuse standards like SVG as much as it should..

    1. Re:ODF a memory dump of OOo? by k8to · · Score: 1

      If you include all programs that have reimplemented some amount of their own code to work with the ODF format, you get about 15. If you include programs with fairly good support, you still get at least Abiword, Gnumeric, OpenOffice, KOffice, Zoho, ajax, scribus.

      Granted that only KOffice and OpenOffice seem to have implemented the vast majority of the format, but the number of implementers is continuing to grow. Calling it an OO native format is specious at this time. It did start out that way at one point, but rounds of standards process and implementation have filed down the edges and made it (apparently!) workable enough for various parties to reimplement without crying in their beer.

      Calling it a memory dump is insulting.

      --
      -josh
  31. Open Office Herecy (sold here) by IBitOBear · · Score: 5, Interesting

    I use OpenOffice. I support Open Document Format over MS/XML and .doc.

    That said, ODF it kind of blows. Really.

    I write novel-length "books" and it is FREAKING IMPOSSIBLE to do some very basic things in any/every ODF based word processor I have tried to date.

    Exercise for the Interested:

    Make a "Book" with an automatic table of contents, said table to contain an "Authors Note", "Prologue", auto-numbered chapters 1 to N with their associated chapter titles (where the actual chapter number is the chapter number internal variable), and finally "Epilogue" all at the same level of the index.

    This simple task is essentially impossible. The flaw is caused by the fact that everything goes through the "styles" and the styles don't inherit their list membership properties. You should be able to make a style "TOC Entry" that is assigned to a particular table of contents level (e.g. level 1) then make a sub-style "Chapter Heading" based on "TOC Entry" but with the chapter numbering magic attached, and in so doing, create "different styles" that go to the same level/point in the list.

    Exercise for the Interested:

    Make a "Book" with each chapter, and the prolog, and the epilog in separate sub documents. The linkage thing is a mess, it is hard to move "the pile of files" around especially if you want to use subdirectories (etc). If you have a custom style in the master document style list you have to _USE_ it in the master document if you want it to be pushed into the created sub-documents. Once the sub-documents are created it is a royal pain (read effectively impossible, or "supremely hidden feature required") to update those styles in those sub documents if you change that style.

    Exercise for the Interested:

    Put three separate "outlines" into one ODF Document. In ODF the outline is a function of the style headers, they only exist as implications of structure instead of first class abstractions. This is largely the fault of Microsoft Word, since the Word folks totally messed this up when they supplanted WordPerfect (which did this inset outline/object sort of thing right).

    ODF was, IMHO, poisoned by the slavish attempt by someone trying to make a Word killer instead of a "good word processor."

    And there are stacks more of these issues.

    And all that said, I *STILL* use ODF (Open Office etc) because I CATEGORICALLY REFUSE to _RENT_ the right to access my own work from a third party. Microsoft has plainly stated that such rental model is their intended business plan, which makes them a non-starter.

    In my opinion, having used both Word and OpenOffice for years; and having used Word Perfect and wordstar before them, ODF is a "workman like effort" to create a document format suitable for "normal business purposes". There is a reason that the legal profession never moved over to Word, and they likewise will not move to ODF, when you need to get to a tightly proscribed document format, both Word and ODF have a "you can't get there from here" fundamental limitation. Both formats simply refuse to represent some things because the designers "know" that a different format is better. Neither ODF nor Word has any allowances for _art_, professional or poetical.

    So, governments should use ODF because it is "no worse" than Word in terms of the ability to represent the documents it can represent, and given that congruence, the shorter, 100% open standard is, or should be, a hard minimum requirements.

    In terms of ODF being the be-all and end-all of document representation, I'd have to say "hardly!" I looked into the OpenOffice code base a while back to see if adding/changing the format to allow for "a book" would be reasonable. It didn't appear to be. Too many of the original StarOffice assumptions about document structure seemed pathologically uninspired. It was like looking at a big pile of Visual Basic. Everything in the standard is way too global, nothing "nests organically" it all nests pedagogically. (Every

    --
    Innocent people shouldn't be forced to pay for inferior software development.
    --"Code Complete" Microsoft Press
    1. Re:Open Office Herecy (sold here) by digitect · · Score: 1


      Insightful comments, I've rarely heard the argument made so well. You obviously ARE a writer. ("Renting the right to access your documents", great way to put it.)

      --
      There is no need to use a SlashDot sig for SEO...
    2. Re:Open Office Herecy (sold here) by Blahbooboo3 · · Score: 1

      There is a reason that the legal profession never moved over to Word, and they likewise will not move to ODF, when you need to get to a tightly proscribed document format, both Word and ODF have a "you can't get there from here" fundamental limitation.
      Uh, just an FYI. The legal profession uses MS Word. For years they did keep to Word Perfect in the USA, and of the 20 lawyer friend of mine at Tier 1 USA law firms, they all use MS Word. Word Perfect died for the legal profession when it failed to create a decent Windows 3.1 and 95 versions -- lawyers have no idea what is a document format more than related to the word processor they use. :)
    3. Re:Open Office Herecy (sold here) by Requiem · · Score: 1

      Sounds like you ought to be using LaTeX. It does all of those things very easily.

    4. Re:Open Office Herecy (sold here) by Anonymous Coward · · Score: 0

      A writer who can't spell 'heresy'.

      No doubt OpenOffice has a good spellchecker, at least.

    5. Re:Open Office Herecy (sold here) by doktor-hladnjak · · Score: 1

      Actually, a lot of lawyers do still use WordPerfect. Apparently, there have even been judges who rejected legal documents presented in Word format, demanding WordPerfect instead. I've been told the reason for this is that WordPerfect has historically had better document comparison features (how are these two contracts different? what clauses were changed in this version?)

  32. output, at most by Anonymous Coward · · Score: 1, Interesting

    HTML/CSS is at most only an output format, i.e. one uses it as the final presentation format, like PDF or ps. To actually write and edit something (even through some GUI), then save, reload and make significant changes is absolutely horrible with it. This was supposed to get better when CSS was introduced, but somehow the format specification failed miserably (IMHO, of course).

    I'm not a fan of the two mentioned formats either, way too much bloat. somehow that seems to be normal with xml based formats (but that's probably just a pet peeve of mine).

  33. Google docs by edxwelch · · Score: 2, Interesting

    He wants HTML/CSS documents? Isn't this what Google docs do?
    Anyways sounds like a good idea to me. I often have to share documents and I don't like to have to force people to install a specific application just to read them.

  34. Published a book...? by Anonymous Coward · · Score: 0

    Håkon invented CSS, and is largely responsible for making most of the designs you online possible.

  35. uh, SGML anyone? by Anonymous Coward · · Score: 0

    If only someone had thought of such an implementation in the 1960's... oh wait, they did. Anymore wheels need reinventing?

  36. Um... NO by salesgeek · · Score: 4, Informative

    ODF is not about web pages or word processing. It's a standard for office documents including spreadsheets, presentation and word processing. That's a big difference from what Opera's CTO is talking about. CSS/HTML might make a good format for one part of the suite (word processing) with a lot of work on the standard. The issue: that's not what is needed for a standard. It's about doing for office documents what HTML did for websites. ODF is actually an opportunity for opera - extend the browser to support ODF so people can post ODF documents, make dynamic applications render to ODF and so on. It takes the web to the next level and further erodes the big monopoly.

    --
    -- $G
    1. Re:Um... NO by The+Cydonian · · Score: 1

      It's a standard for office documents including spreadsheets, presentation and word processing.

      For your presentation needs.

      You are on your own when it comes to spreadsheets, though.

      ODF is actually an opportunity for opera

      The Hakon dude also came up with the CSS specification. I'm fairly certain it was this hat he was wearing, not his CTO, Opera Inc hat (which, truth be told, is a great red hat and pays his bills, but that's not the real reason you ought to listen to him).

    2. Re:Um... NO by monomania · · Score: 2, Interesting

      Isn't the idea of an "office" document a fiction? The idea that these disparate representations or methods of manipulating data (spreadsheets, publications, A/V presentations) should (or even could) be subsumed within a common file format is rather boffy. The whole paradigm is derivative of the way applications that fulfill these functions have been bundled. If MS had dubbed their flagship application product "MS Adminstrative Assistant" we'd all be referring to to a common format for 'Administrative Assistant' documents. I still don't buy this model...We took the wrong fork in the road some miles back. Is it too late to ask for the RIGHT directions?

  37. OK, then here's a challenge for the Opera CEO by alizard · · Score: 1

    Build a word processor that uses html/CSS with options and flexibility comparable to OpenOffice.

    Actually, I have every confidence that Opera can. . . I've been a happy Opera user since 1999.

  38. Why is this an issue? by sonofagunn · · Score: 1

    How about Microsoft and OpenOffice just keep their own XML formats? One of the great things about XML is that you can use XSLT to transform one XML document into another one with different syntax. As long as both products can open, display, and convert the other format then I don't really see the need for a standard in this situation.

    A standard is going to limit innovation in word processors unless you specifically allow extensions in the standard, which kind of defeats the purpose of a standard.

    If the goal is to send out a document that anyone can read, then convert to PDF or a web page. "I shouldn't have to convert b/c I'm a stupid user" you say? Don't expect a 600-6000 page standard to solve this problem.

  39. Latex here and now? by mlewan · · Score: 1
    Thanks for the XeTeX link. I'm sure that will be useful for many of us (even though the installation failed for me).

    However, I think the parent post's main point was that LaTeX is not here and now usable across the globe. With MS Word or OpenOffice, I can type and mix Japanese, Korean, Russian and French in one and the same document, and I can share it with millions of users on different platforms.

    With the default installations of LaTeX, that is impossible.

    1. Re:Latex here and now? by zsau · · Score: 1

      With the default installation of LaTeX, that is difficult. With XeLaTeX, it's no harder than with Word or OpenOffice, and you have the advantage that it will look the same. You type a UTF-8 file, use OpenType fonts, and get a PDF that people who can't process XeLaTeX can still read.

      I'm sure that will be useful for many of us (even though the installation failed for me).

      I suppose the regular disclaimers like "make sure you have all the dependencies installed" apply. It's a pity GNU/Linux distributions are still like that...

      --
      Look out!
    2. Re:Latex here and now? by mlewan · · Score: 1
      I suppose the regular disclaimers like "make sure you have all the dependencies installed" apply. It's a pity GNU/Linux distributions are still like that...

      Yes. This is going a bit off topic, but in case anyone is interested...

      I downloaded the Mac version of the program and clicked on the installer. I got a generic error message, that did not tell me what was wrong.

      I then browsed the readme, and learnt that I needed to install Tex separately.

      I downloaded the recommended i-Installer, and found that it simply was a tool to install Tex - it did not contain Tex, which I had hoped (but not really expected).

      So I followed the instructions to use i-Installer to install Tex, which was not in the list of installable programs. However, there was a product called gwTeX with six different packages, each of which might be what I wanted. I took the first one, "gwTeX based on TeX Live". It took well above five minutes, perhaps a quarter of an hour, to install. It hang with the sliders at 100% for another five minutes, but when I tried to close it, is assured me that it was still doing things. In the end I got a message that all was fine.

      I could now finally successfully launch the XeTeX installer. Hooray! I am done! I can start typing!

      Not really.

      There was no program icon for XeTeX to be found. Now, I know, but my grandmother does not, that anything with TeX often is a command line tool. So I went back to the XeTeX readme, where they suggested a number of different tools to use with XeTeX. I went for the recommended TexShop. It is a drag and drop installation, and I sighed of relief.

      In TexShop I selected UTF8 and XeLaTeX, settings that were not retained unless I also changed in the Preferences, by the way.

      I then tried opening several of the sample files that came with XeTeX. All of them failed with... yes! a generic error message!

      This is where I gave up. If I had had a file in front of me, which I knew was written by an 18th century pirate, who was describing where he had hidden his treasure, I would have gone on trying, and I am sure I would have succeeded in the end. But my more generic curiosity has faded away, and I will let this lie for another five years, before I try again with updated versions.

  40. Compilation by Vexorian · · Score: 1

    "2 standard formats is not a good idea"

    "That's the reason I'd wish to add a third"

    I wish he didn't ruin the entire opinion with all those html/css pipe dreams, they are so extremely unrealistic besides of all the mess that would actually be to adapt them in a useful way for office formats. Really this was kind of a shame.

    I could also write a book in .txt but that doesn't make it any better of a format for office software.

    --

    Copyright infringement is "piracy" in the same way DRM is "consumer rape"
  41. And wrong at that by temcat · · Score: 1

    Because only one of the specifications is really akin to a memory dump - the Microsoft one.

    1. Re:And wrong at that by SEMW · · Score: 0

      Because only one of the specifications is really akin to a memory dump - the Microsoft one. You're right! I know you are, even though I haven't actually read through either specification, or seen the XML either one outputs -- because you must be right! Because Microsoft are evil; and, errr... Yeah! -- Really! Very evil, very, uh, bad, and, err... --And 6000isabiggernumberthan700! And OOXML is a longer acronym than ODF. So yeah, there you go. Only the MS one is a memory dump. Evil! Very evil. And... something.

      ...

      Nope, I got nothing.

      ...

      And I suspect you don't either...
      --
      What's purple and commutes? An Abelian grape.
    2. Re:And wrong at that by temcat · · Score: 2, Informative

      I suggest that you read a bit about both formats and how they were developed. And actually look at the XML samples of both. Google it, it's not so hard.

  42. My idea for a document format by Mihai+Cartoaje · · Score: 1

    My idea for a document format is something like Tex, but with dynamic a.k.a. just-in-time compilation. This way it would be fast enough to be interactive.

    Packages would be translated into binary VM code. When a document is opened, they would be dynamically compiled or interpreted and would be able to respond to mouse or keyboard actions.

    1. Re:My idea for a document format by theguywhosaid · · Score: 1

      I think theres always going to be value in having applications and documents as distinct types. Letting documents execute arbitrary code means you must either trust the author more or build incredible security into the VM. Thats why TeX files are not for publishing, and DVI files are. I have yet to hear of a virus that exploited the DVI format to spread.

    2. Re:My idea for a document format by Anonymous Coward · · Score: 0


      You would be very interested in a SourceForge project that I am also interested in: TexPerfect. If only they had more developers!

      TexPerfect on SourceForge: http://sourceforge.net/projects/texperfect/
      TexPerfect homepage: http://texperfect.sourceforge.net/

  43. SICP by Nicolay77 · · Score: 1

    http://mitpress.mit.edu/sicp/full-text/book/book.h tml

    Is this a book? or a Website?

    Actually is both of them, but it's to me more a book than a website. A book is defined by its contents, the sequential flow of text where you keep reading it from start to end. With coherent writing and good spelling.

    So I think that a book that uses HTML and CSS is still called a book. An online book, but still a book.

    --
    We are Turing O-Machines. The Oracle is out there.
    1. Re:SICP by dhasenan · · Score: 1

      A book is defined by its contents, the sequential flow of text where you keep reading it from start to end. With coherent writing and good spelling. Never heard of James Joyce, I take it.

  44. LaTeX by Nicolay77 · · Score: 1

    It looks you would enjoy working with LaTeX.

    You can specify everything to the smallest detail.

    I like the syntax, however some people can't get used to it. YMMV.

    --
    We are Turing O-Machines. The Oracle is out there.
    1. Re:LaTeX by vidarh · · Score: 1

      I'm sure you can. However I've yet to see anything typeset with LaTeX that didn't look like it was produced by a complete amateur... I'd love to be proven wrong, though.

    2. Re:LaTeX by dhasenan · · Score: 1

      What were your complaints? Were they different for each document, or were there consistent errors? If the errors are consistent enough, then the next LaTeX specification should probably correct them.

      For my part, all the LaTeX I've seen either looks mostly the same, or diverts from the standard appearance and looks somewhat crappy. Probably because the people coding it tried doing so with repeated local changes rather than global changes.

  45. You're using the wrong tool. by Luke · · Score: 3, Insightful

    Using a word processor to write a book is like using stone tablets and and abacus for spreadsheets. You really ought to look at markup-based typesetters like LaTeX or DocBook or software specifically designed for book production.

  46. Complicated? by Per+Abrahamsen · · Score: 1

    Other European languages are complicated

    Uh, I have two additional lines in the pre-amble:

    \usepackage[danish]{babel}
    \usepackage[latin1]{in putenc}
    The first specify that I want Danish typesetting conventions, and the seconds specify what character set is used in the text. Doesn't seem complicated to me, especially since it is the same in every document (and I use the "latin1" one in my English documents as well).

    One more addition compared to US users:

    \documentclass[a4paper]{article}
    The rest of the world use A4 paper.
    1. Re:Complicated? by Anonymous Coward · · Score: 0

      But to actually get *decent* results for Danish, you also need

            \usepackage[T1]{fontenc}

      because otherwise hyphenation is going to be screwed with when you use
      words containing "å".

      Now, obviously I, as a long time TeX nerd, and probably you, as the the author/maintainer of auctex, will know why this has to be added, but to the rest of the world there is a *much* easier solution: just use Word and forget about this shitty, weird, nerdy "typesetting" system that never works right anyway.

  47. suspicious use of the 'i' word .. by rs232 · · Score: 1

    'A standard is going to limit innovation in word processors'

    A novel argument if I ever heard one. For argument sake lets say innovation in GPS is being limited by a lack of multiple standards .. ?pause to cogitate .... nop, still don't make any sense . How is a document format going to affect a display format. I always suspect where the argument is coming from when they have to invoke the 'i' word.

    'If the goal is to send out a document that anyone can read, then convert to PDF or a web page'

    No the goal is to send a document that anyone can edit, display and print the same on any Word Processor. The real reason the file formats change with each new version of msOffice is to force us back each time for some more 'innovation'. Your best bet is to send them an OpenDocument file and point them to the Open Office site.

    Or else .. receive a word doc, send it as Word 2000 file to someone who has word XP who sends it to someone who has Word 2003 and saves it as DOCX who sends it back to me, who can't open it, who saves it as RTF, which then loses all the lovely shaded boxes.

    http://www.openoffice.org/

    was: Why is this an issue?

    --
    davecb5620@gmail.com
    1. Re:suspicious use of the 'i' word .. by sonofagunn · · Score: 1

      That sounds nice in theory, but really, how often are people passing around documents to other people for shared editing, when these people don't have access to the same word processor? We can already share documents for display and printing - it's not a problem (pdf, html, etc.). Multiple people editing the same document have access to the same word processor in just about every scenario I can think of (corporations, school). If you are going to create some document for shared editing around the world by people in different places who may or may not have the same word processor as you, then by all means, start the document off in Open Office. If someone doesn't have OO, they can easily download it.

      Now, for your terrible GPS analogy. I really can't think of a single reason why a government GPS signal is analagous to a word processing file format. GPS signals are extremely simple and putting your own GPS satellites in orbit isn't practical for many people. A word processing file format standard is 600-6000 pages long apparently and anybody who can write software can create an implementation of it. Plus, even if GPS signals are file formats were analogous, it would still be a bad argument because one can easily see how GPS companies could constantly be improving (innovating) GPS signals and associated products if it weren't a standard.

      Assume for a minute that we had a standard word processing file format before the internet was around. Now, fast forward until a little while after Al Gore invents the internet. Now the guys at Open Office decide to add the ability to embed a hyperlink in a word processor document, but there is no standard way to do this. What happens? If the standard leaves open a method for declaring vendor specific extensions you use that method - but this arguably makes a standard pointless when common features are implemented in a vendor specific way. The other option the guys at Open Office would have would be to go to the standards organization and try to get it on the next draft. This would take a long time, and would assure every other word processor company to have the time to implement the feature as well.

      I just think a standard file format in this scenario is unnecessary. If you are a company, buy all your employees the same word processor! If you share documents elsewhere (for editing, not just printing and display) then make sure everyone has the same word processor as you or choose to use a free one, such as Open Office.

    2. Re:suspicious use of the 'i' word .. by Graymalkin · · Score: 1

      That sounds nice in theory, but really, how often are people passing around documents to other people for shared editing, when these people don't have access to the same word processor?


      All the time.

      We can already share documents for display and printing - it's not a problem (pdf, html, etc.). Multiple people editing the same document have access to the same word processor in just about every scenario I can think of (corporations, school).


      Not every company decides to mass-upgrade to the latest and greatest version of Office as soon as it is released. Most companes wait until their next buying cycle comes up and grab what it available then. It is not unheard of for systems in the same company to be running different versions of software packages. The situation is similar in schools, institution machines are likely running whatever was purchased in the last upgrade cycle while the freshmen class starting in the fall has whatever new software was available when they got their shiny new laptops. In both corporations and schools any number of versions of Office may have to coexist.
      --
      I'm a loner Dottie, a Rebel.
    3. Re:suspicious use of the 'i' word .. by sonofagunn · · Score: 1

      Good thing Office versions are always backwards compatible. I work in a 9000 or so person company and it's never a problem sharing Word documents. So a word processor standard format does nothing for us.

  48. LaTeX3 by backwardMechanic · · Score: 1

    I'm a physicist so LaTeX is the only sensible way to write reports/papers, but it's starting to look a little long in the tooth and could do with a scrub.

    Have the LaTeX3 project produced anything? I get the (perhaps completely wrong) impression that the project has sunk into navel-gazing, in a search for the perfect solution. I'd be happy with the next iteration, if it comes out soonish. Release early, release often?

    1. Re:LaTeX3 by lahvak · · Score: 1

      I use PdfLaTeX for pretty much everything, and with a good set of packages and a nice font, it works for me really well most of the time. Two things that I have trouble with are changing margins in the middle of the document, and paragraphs flowing around pictures (the picinpar package works fine for ordinary paragraphs, but breaks lists).

      I had the same impression as you about LaTeX3, but recently there appeared to be some development. In the last Tugboat, there was an article on page design in LaTeX3, and I believe there were also some talks about that at TUG 06 in Marrakesh.

      I am also eagerly awaiting luaTeX, which seems to be chugging along at similar slow pace. They are suposedly going to release something later this year.

      I would at least partially switch to ConTeXt, but what keeps me back is a lack of a good ConTeXt support in Vim. I am a Vim junkie, and editting LaTeX documents using Vim latex-suite is really easy and fast. If there was something like that for ConTeXt, I would probably use it for lot of my documents.

      --
      AccountKiller
    2. Re:LaTeX3 by Anonymous Coward · · Score: 0

      If you're looking for something that releases often and something that could manage your layouts in an easier way, try ConTeXt (30-50 releases per year on average). The ConTeXt developers have learned a lesson from LaTeX: the version IV will probably come out soon (following version II and skipping the dangerous III ;)

      To lahvak: margins are usually controlled on macro level, not on low-level, so luaTeX probably won't solve your problems with margins. It will be easier to handle fonts, non-latin scripts & do tricky things, but layout will still be controlled in the "old standard way" (unless a LaTeX package write will jump in and extend an old package or write a new one to suit your needs better).

      About vim: any suggestions and improvements are more than welcome. There is some really basic ConTeXt support in vim 7, but in no way as extensive as in latex-suite. Partially because latex has older tradition and there are more users who master both vim and LaTeX (those who master both ConTeXt and vim, and are willing to contribute, would probably fit on fingers of someone's hand[s]). If you're a vim-addict, the ConTeXt community would warmly welcome your contributions to vim.

  49. Re:Classic quote for the books, gotta love XML pla by Ronald+Dumsfeld · · Score: 1

    XML is simple... It's like violence. If it didn't work, you didn't use enough of it.

    --
    Where's the Kaboom?
    There's supposed to be an Earth-shattering Kaboom.
  50. Hello World in Office XML by im_thatoneguy · · Score: 1

    He's telling me that this is really difficult to understand and parse?

    Sure it's not as clean as HTML for such a small bit of text, but it's not impossible to wield, unless you want pixel accuracy, in which case, CSS is difficult as well.

    Office XML document:
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <pkg:package xmlns:pkg="http://schemas.microsoft.com/office/200 6/xmlPackage"><pkg:part pkg:name="/_rels/.rels" pkg:contentType="application/vnd.openxmlformats-pa ckage.relationships+xml" pkg:padding="512"><pkg:xmlData><Relationships xmlns="http://schemas.openxmlformats.org/package/2 006/relationships"><Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocu ment/2006/relationships/extended-properties" Target="docProps/app.xml"/><Relationship Id="rId2" Type="http://schemas.openxmlformats.org/package/20 06/relationships/metadata/core-properties" Target="docProps/core.xml"/><Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocu ment/2006/relationships/officeDocument" Target="word/document.xml"/></Relationships></pkg: xmlData></pkg:part><pkg:part pkg:name="/word/_rels/document.xml.rels" pkg:contentType="application/vnd.openxmlformats-pa ckage.relationships+xml" pkg:padding="256"><pkg:xmlData><Relationships xmlns="http://schemas.openxmlformats.org/package/2 006/relationships"><Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocu ment/2006/relationships/webSettings" Target="webSettings.xml"/><Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocu ment/2006/relationships/settings" Target="settings.xml"/><Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocu ment/2006/relationships/styles" Target="styles.xml"/><Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocu ment/2006/relationships/theme" Target="theme/theme1.xml"/><Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocu ment/2006/relationships/fontTable" Target="fontTable.xml"/></Relationships></pkg:xmlD ata></pkg:part><pkg:part pkg:name="/word/document.xml" pkg:contentType="application/vnd.openxmlformats-of ficedocument.wordprocessingml.document.main+xml">< pkg:xmlData><w:document xmlns:ve="http://schemas.openxmlformats.org/markup -compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:o12="http://schemas.microsoft.com/office/200 4/7/core" xmlns:r="http://schemas.openxmlformats.org/officeD ocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeD ocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp="http://schemas.openxmlformats.org/drawin gml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordpro cessingml/2006/main" xmlns:wne="http://schemas.microsoft.com/office/wor d/2006/wordml">

    <w:body>
    <w:p><w:pPr><w:ind w:left="720"/></w:pPr>

    <w:r w:rsidR="00F4543A">
    <w:t>Hello World.</w:t>
    </w:r>

    <w:r w:rsidR="00F4543A"><w:br/></w:r>
    <w:r w:rsidR="00F4543A"><w:br/>
    <w:t>Hello Universe.</w:t>
    </w:r>

    </w:p>

    <w:sectPr w:rsidR="0074581F" w:rsidSect="008A7339"><w:pgSz w:w="12240" w:h="15840"/>
    <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w:gutter="0"/>
    <w:cols w:space="720"/>

    1. Re:Hello World in Office XML by PPH · · Score: 1
      Either example is parseable with similar levels of effort. The problem I have with Office XML is: When my application gets to the following tag

      <w:r w:rsidR="00F4543A">
      what does it do with the rsidR attribute value? What sort of licenses must I accept, NDAs must I sign and deals with the devil must I make to unearth the meaning of this?
      --
      Have gnu, will travel.
  51. Hello World in ODF by mhesd · · Score: 1

    <?xml version="1.0" encoding="UTF-8"?>
    <office:document-content xmlns:office="urn:oasis:names:tc:opendocument:xmln s:office:1.0"
    xmlns:style="urn:oasis:names:tc:opendocument:xmln s:style:1.0"
    xmlns:text="urn:oasis:names:tc:opendocument:xmlns :text:1.0"
    xmlns:table="urn:oasis:names:tc:opendocument:xmln s:table:1.0"
    xmlns:draw="urn:oasis:names:tc:opendocument:xmlns :drawing:1.0"
    xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:x sl-fo-compatible:1.0"
    xmlns:xlink="http://www.w3.org/1999/xlink"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:meta="urn:oasis:names:tc:opendocument:xmlns :meta:1.0"
    xmlns:number="urn:oasis:names:tc:opendocument:xml ns:datastyle:1.0"
    xmlns:svg="urn:oasis:names:tc:opendocument:xmlns: svg-compatible:1.0"
    xmlns:chart="urn:oasis:names:tc:opendocument:xmln s:chart:1.0"
    xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns :dr3d:1.0"
    xmlns:math="http://www.w3.org/1998/Math/MathML"
    xmlns:form="urn:oasis:names:tc:opendocument:xmlns :form:1.0"
    xmlns:script="urn:oasis:names:tc:opendocument:xml ns:script:1.0"
    xmlns:ooo="http://openoffice.org/2004/office"
    xmlns:ooow="http://openoffice.org/2004/writer"
    xmlns:oooc="http://openoffice.org/2004/calc"
    xmlns:dom="http://www.w3.org/2001/xml-events"
    xmlns:xforms="http://www.w3.org/2002/xforms"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-insta nce"
    office:version="1.0">
    <office:scripts/>
    <office:font-face-decls>
    <style:font-face style:name="Times New Roman"

  52. PDF? by gravis777 · · Score: 1

    Putting this to the test, Håkon has published a book using HTML and CSS. Strange, when I click on the link, it takes me to a page with several links. The top one is a book about using HTML and CSS, and is distributed as a PDF. Seems to me that if you are wanting to make a point about using HTML and CSS to distrubute data, you would distribute your paper arguring this case in the same format.
    1. Re:PDF? by howcome · · Score: 1

      Seems to me that if you are wanting to make a point about using HTML and CSS to distrubute data, you would distribute your paper arguring this case in the same format.
      The HTML and CSS source files are available from that page. The PDF file is also there to show, in a frozen format fit for a printer, what can be generated directly from HTML/CSS.
  53. It's called "programming" by Ivan+Matveich · · Score: 1

    Object-oriented programming concepts should have sorted out this mess years ago.

    1) Write a class for each document type, with methods to construct the logical document structure (eg, add paragraphs to a report, define the author name, whatever).

    2) Define a set of standardized rendering interfaces (screen, printer, audio, etc).

    3) Write some renderers for various (document-class, rendering-interface) permutations (eg, one that renders articles to the screen, or books to the printer).

    You're welcome. :)

  54. HTML is Evil! by fm6 · · Score: 1

    I actually did publish a book that I authored in HTML. More precisely, we used HTML run through a really ugly preprocessor that one of the original authors of the book created while she was teaching herself Perl.

    Fortunately, our publisher found an SGML/XML wizard who did a very good job of converting the HTML to XML, which then got converted PDF using an off-the-shelf XSL-FO processor. I was very impressed with his work, without which the conversion would have been a total nightmare. It was still very tedious, though, because HTML is not a true structured format, and you cannot completely automate its conversion.

    It would, of course, have been much more efficient to have authored the document in XML in the first place. I remember this actually being proposed back in 1998 for an earlier version of the book. (I was not a co-author back then, but I was working for the department that owned the content.) The manager who responsible for this had zero interest: HTML got the job done, she didn't have the resources to do a big XML conversion. Never mind the huge inefficiency of authoring that book, and a lot of other content related to the Java SE platform, in plain HTML. Only now, after this manager has left the company and it has become painfully obvious that they can no longer afford to hack such a huge mountain of HTML code, is the company getting round to making the conversion.

    HTML is just not a good format. It's barely adequate for creating web pages, and totally useless for anything else.

    1. Re:HTML is Evil! by Steve001 · · Score: 1

      fm6 wrote as part of a post:

      HTML is just not a good format. It's barely adequate for creating web pages, and totally useless for anything else.

      I respectfully disagree. I think HTML is a good format for its purpose. The problem is that people have come to expect more from it than it can deliver. It is now expected to deliver types of data and display it in ways that were not conceived of when HTML was first established.

      I think the biggest problem in the design of word processing formats is trying to make a single format that is suitable for all users for all uses in all situations. For example, I find basic HTML is a good format for making e-books, but it is not suitable for high-level word processing. What is most likely needed is a group of common word processing formats, each designed with a different use in mind (such as web pages, text only, newsletters, etc.) with each format able to easily import and use the data from the other formats.

      I normally use RTF for my word processing because I find it a good format for text-only documents. It also has the advantage that just about every word processor can read the files, and it also tends to produce smaller files compared to other formats. But if I was writing more complicated documents I would choose a format that supports the features that I need.

      The main problem with multiple formats has not been that there are multiple formats. The problem has been that it is extremely difficult to accurately translate a file from one format to another (one of the reasons that RTF was created: to allow data to be passed from one word processor to another). This is why lock-in has been such an issue over the years.

      Returning to the issue of HTML, one of the biggest advantages of it is one shares with WordPerfect: you are able to easily view the format codes within the document. This is also an advantage of the OpenDocument format and RTF: with minimal effort you can view the actual files that make up the document.

    2. Re:HTML is Evil! by fm6 · · Score: 1

      I think HTML is a good format for its purpose. The problem is that people have come to expect more from it than it can deliver. It is now expected to deliver types of data and display it in ways that were not conceived of when HTML was first established.

      You're talking about HTML 1.0. Nowadays, "HTML" means HTML 4.0 or XHTML 1.0, or something very close to those specifications, usually used in conjunction with CSS. For these, people have had 10 years to polish the spec and make it more flexible. The result is, as I said, barely adequate for doing web pages, and useless for anything else.

      I find basic HTML is a good format for making e-books, but it is not suitable for high-level word processing.

      You're making the standard mistake: you're judging formats entirely on how expressive they are. And if documents never got re-written, you'd be right. But documents do get rewritten, unless the subject matter never changes. (And even then, the document gets reviewed and revised, unless the author is a total hack.) If you write documents that are going to be revised even once (and as a technical writer, I write documents that are going to be revised dozens of times), you want a document format that's maintainable. That means a useful semantic structure. The semantic structure of HTML is very basic. The semantic structure of RTF is essentially nonexistent.

      I normally use RTF for my word processing because I find it a good format for text-only documents. It also has the advantage that just about every word processor can read the files, and it also tends to produce smaller files compared to other formats.

      You have got to be kidding. Yes, every word processor reads RTF — but they all do different things with it. Try copying a complicated table or a multi-level list between word processors using RTF.

      There's nothing wrong with using a convenient application (a word processor, a spread sheet, even a text editor) to write a one-off document with no future. But don't pretend that any of these create "universal" documents. That's wishful thinking.

      Returning to the issue of HTML, one of the biggest advantages of it is one shares with WordPerfect: you are able to easily view the format codes within the document. This is also an advantage of the OpenDocument format and RTF: with minimal effort you can view the actual files that make up the document.
      And if you want to spend all your time hacking the fine details of document, that's fine. But if you've got a lot of documentation to get written, you don't have time for that. Besides, not every writer is a (like you and me) a format weenie. And they should not have to be.
  55. Opera vs XML by Anonymous Coward · · Score: 0

    The history, in case you were wondering, is that Opera consistently disapprove of XML. They don't like XSLT, XSL-FO, or even XPath. They would prefer we use a CSS syntax with poorly implemented selectors and lack of code-generation. Opera always hate XML, they always like LISP-looking syntaxes despite being slower (try finding a stream-based CSS parser). This is a continuation of their backwards ideas about the web.

    1. Re:Opera vs XML by Anonymous Coward · · Score: 0

      Bullsh!t. Opera has XML, XSLT and XPath support. The only thing they don't support is XSL-FO, but I really don't know how it could mean that they "hate XML".

    2. Re:Opera vs XML by Anonymous Coward · · Score: 0
      They added XSLT belatedly (v.9), and only once everyone else had supported it for years. They write long essays about why XML is complicated, why XSLT sucks, why XSL-FO sucks, and why CSS syntax is best for everything.

      Of course they have SGML/XML support -- and XSLT means supporting XPath. But they weren't happy about it.

      Google, "Formatting Objects considered harmful"
      http://www.biglist.com/lists/xsl-list/archives/199 905/msg00495.html
      http://www.xml.com/pub/a/2005/01/19/print.html

  56. I agree...to disagree by Gazzonyx · · Score: 1
    I think you have a very valid point concerning the HTML/CSS thing, but I disagree about your assessment of 700 pages being too large. You compare it to "C programming language", but the fact of the matter is that Bjarne (sp?) was building from the ground up; he didn't have the problems that we now face, being 3 or 4 layers removed from the 'ground'. We are, for better or worse, stuck building on top of existing technology. Even though it affords us the power to leverage the abilities of underlying technologies, it also leaves us dealing with the inherent shortcomings, collectively, of them. Sure, it's just a format specification, but it's going to be built on top of another format, which will be very good solving the problems faced when it was drafted, but probably not so good at handling the problems we now face.

    Ergo, when drafting out the spec, it will have to be somewhat larger to accomodate all the 'what ifs' and "(insert underlying spec. here) doesn't natively have an ability to express that - we'll have to extend it"

    I've specifically not used HTML/CSS/XML/ETC for examples because I believe that we will be facing this scenario regardless of what standard is used. The only way to avoid it, IMHO, is to do as they did for C and build from the ground up.

    Let me know what you think.

    --

    If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.

  57. WP51 by Nicolay77 · · Score: 1

    You can't create documents in word with good styling.

    But if you did it in WP51, and then converted it to Word, it worked perfectly fine.

    So I guess the Word engine can handle them just right, but the interface is incapable of generating them the right way.

    --
    We are Turing O-Machines. The Oracle is out there.
  58. imagine the conformance suites by asky · · Score: 1

    If the point is for businesses and governments to adopt a standard, then at some point, a credible third party (standards body, government agency) needs to produce a conformance suite, and vendors need to show that they pass with flying colors.

    Given the 6000 pages for OOXML and 700 for ODF, it will be interested to see if either will be done. Just imagine the test cases and the explanation of what they do.

    I start to understand why Håkon Lie doesn't much care for either.

    1. Re:imagine the conformance suites by Anonymous Coward · · Score: 0

      I wish to god someone would produce a PDF conformance suite.

      About 99.99999999999% of all PDFs in this world aren't even tainted by a whiff of conformance.

      As with the Adobe Reader/Acrobat guys, any implementers of this spec who are forced to render are going to have their hands full making their software work with dodgy instance documents generated by a raft of poorly written software.