Opera CTO Hits Back at Microsoft's Standards Push
Michael writes "Opera CTO Håkon Wium Lie hit back today at Microsoft's push to fast track Office Open XML into an ISO standard, in a
blistering article on CNET. He also took a swipe at Open Document Format: 'I'm no fan of either specification. Both are basically memory dumps with angle brackets around them. If forced to choose one, I'd pick the 700-page specification (ODF) over the 6,000-page specification (OOXML). But I think there is a better way.' The better way being the existing universally understood standards of HTML and CSS. Putting this to the test, Håkon has published a book using HTML and CSS."
Ok... Cheese, anyone?
If you want news from today, you have to come back tomorrow.
Yeah, but that "book" is fsck'n ugly. It doesn't even compare to a professionally typeset book, or something produced in LaTeX. I hope that isn't the "solution" to this standards "problem". Let's face it, the average Joe is going to use whatever Microsoft pushes at them. Case closed.
"Both are basically memory dumps with angle brackets around them."
Table-ized A.I.
HTML and CSS are quite capable of rendering and displaying webpages. What happens with a simple thing like a file header showing page number and author name. Footers with footnotes? How about dealing with table of contents etc. How would a page in a document be broken down? Anyone who's tried to print HTML knows there are many issues with layout. What's sad though is that even HTML and CSS is not supported the same in all browsers.
I'm a latex junkie. Latex though is a PITA to create templates and styles for. Someone willing to take up the task to modernize latex or completely replace it?
----
Go canucks, habs, and sens!
Putting this to the test, Håkon has published a book using HTML and CSS.
Uhm. I'm no expert, but isn't a book that uses HTML and CSS called a website?
The theory of relativity doesn't work right in Arkansas.
FTA: Kazakhstan recently joined the relevant ISO group.
OMG, Borat is teaming up with Steve Ballmer to spew out 6000 page docs!! Run for cover!
Having a word processor act more like a web browser would be awesome. Ever since I started using word processors (which for me was a long time after I started using web browsers), i've always thought, why doesn't updating this style make all text with that style update? Why do I always have to change the same thing over and over again?
While turning word processors into web browsers would be stupid, things like CSS would be awesome to have in word processors.
While I do agree that the ISO doesn't need more than one standard for printable documents, I don't think that Håkon Wium Lie is on the right track with HTML/CSS for print.
Sure, it works, with enough tweaking, and CSS3, and a $350 download of a product to turn HTML/CSS3 into a PDF. This is better how? What about LyX, LaTeX, or even OpenOffice if you are just going to convert to PDF?
The whole HTML/CSS-to-print thing shoots the real argument in the foot.
Death looks every man in the face. All any man can do is look back and smile. - Marcus Aurelius
...and sig'd in tribute.
Even more classic perhaps, 'The "layer" element?!' Sure raised my eyebrow; a huge change from "Netscape engineers are weenies!" by any metric. :)
You can hold down the "B" button for continuous firing.
Steve squirts them out, remember?
You can hold down the "B" button for continuous firing.
Hmm... both of these standards suck. I know what, we need another choice!
Somehow I don't think that's going to fix the problem. Oh, and pointing out that the Microsoft letter doesn't validate. Isn't that a little petty?
Thanks to the War on Drugs, it's easier to buy meth than it is to buy cold medicine!
Reveal Codes
WordPerfect on Linux
Only problem is, the Oasis page itself doesn't validate. However, it seems Wikipedia does...
But if the Oasis pages did validate, the basic argument goes like this: "How can they claim to care about standards if they can't even bother to support that most universal standard of standards, HTML?" And indeed, I could still make that argument -- just look at the sad, sad state of affairs that is Internet Explorer's CSS [mis]handling.
Don't thank God, thank a doctor!
Both standards follow XML, because XML helps documents become universally available whatever the device. HTML and CSS have limitations.
Visit http://www.kaizenlog.com
Prince is a commercial product. I have a minor need to produce PDF's from XHTML/CSS and I really don't want to deal with licensing. I would need to run it on a server where multiple people can access it which means I would have to pay $3800 for Prince. Ouch! I don't need to do this that bad. Is there any way to do this with Free/Open Source software?
Been a long time since I typeset anything, but I used Adobe Pagemaker when I typeset a couple of college magazines in the mid-90s and FrameMaker when I was maintaining courseware in the late '90s for Nortel.
HTML + CSS vs. Word vs. OO.o seems to me to be an argument related to formatting documents, not a "book". It's not that you couldn't do it, but I'd consider using Quark or InDesign (what seems to be Adobe's successor to PageMaker) or even Tex and its variants (haven't used any Tex-based stuff, but heard wonderful things) for typesetting.
Arguments about standards aside, proof of concepts aside, I'd think that the real issue when it comes to any job is using the best tool for it. It's not a question of whether you can use these tools to typeset a book, but if you should.
The point of the proof of concept is to prove that the system is flexible or capable enough to go beyond its original intended use. I get that. But proving a chainsaw can be used to spread butter, doesn't mean it's inherently superior to a coping saw.
- Greg
Start a happiness pandemic
> Hmm... both of these standards suck. I know what, we need another choice!
>
> Somehow I don't think that's going to fix the problem.
Depends on what you define the problem as. That there is too many "standards", or that all of them sucks. If the later, defining a new standard that does not suck solves the problem.
HTML sucks for books. The reason is simple. HTML was designed for web pages. HTML does a fairly good job of covering the things you need when you create a web page (although, why is there no , and a bunch of other stuff that need to be fudged by using elements that don't really fit). In HTML there is no , no , no , no . Also, with HTML, one file == one document. If you're writing a book, it would be nice to be able to for example have one file per chapter and include them all in a master file (assuming you're writing your HTML by hand, of course). That's not possible with HTML.
It would be possible to extend HTML to include such features or to create a HTML-like format that is more suitable for books (cf docbook). I agree that "word processors" today are a horrible mess, and we definately need something like a modernised LaTeX, but HTML isn't it.
Please alter my pants as fashion dictates.
I think it is important that a document format is humanly-readable and understandable, so that one can get at least some idea of the layout of a document by reading the content of the file. I understand very well when he says "memory dump in angle brackets". Besides, anything that is humanly-"parsable", can be parsed by software, while the other way around is not usually the case.
"The agriculture ministry is not in charge of Gundam" - Japanese ministry official.
I also liked: "Microsoft--please--if you think standards are so important, why not start using them?"
The problem with using HTML for publishing is that to this day there is no viable downloadable font system. So you are limited to a lowest-common-denominator list of 2-3 fonts like verdana and new times roman. With Flash and PDF you can do a lot more, but obviously authoring becomes a problem.
So say you can absorb 30 pages a day, thats 200 days to read the spec.
Oh and the spec is defined to fit an existing product, so that product fits the spec and there are unspecified patent hurdles attached to it. Wow which idiot would fall for that one.
700 pages is not understandable by anyone but authors. "C programming language" book is 1/3 in size, have endured for 20 years and was instrumental in solving many more problems than word processing. Also, creating an ODF document is a minor function in most applications and is not worth the effort to understand such a huge standard. Proponents of both standards should come up with a modular design instead. At the base level, stick with basic HTML - bold and italic tags, fonts and sizes, paragraph breaks. Define many extensions that can be implemented independently or in any combination, in a manner convenient for both computers and, in a pinch, humans. Opera guy is biased as well - while basic HTML is great at its limited function, CSS is not very readable by humans. Nor does it solve pagination, collaborative editing, resolution independence, color profiles for printing...
And it worked out great.
http://software-libre.rudd-o.com/
Used MediaWiki to write the chapters, wrote a small python proggie (available there) to consolidate the wiki into a single HTML file (mostly conforming to the Boom! microformat), then used Prince and Hakom's book CSS to generate the PDF.
Great typesetting, collaborative book editing, screw LaTeX!
Hakom was right.
Rudd-O - http://rudd-o.com/
Anyone? http://www.scribus.net/
If Html+Css offered a better model instead of the box model (example the point-line model) and offered some way of doing basic data structures I'd agree. The current box model is very limiting in its layout abilities.
Modern documents have so many binary data types inserted in them (images, fonts, etc.) that Html+Css isn't enough. It isn't even enough on the web and that's why Javascript and Flash are so prevalent. There needs to be another specification to support all the needs/wants of the users (who are not willing to go backwards for any ISO standard).
- I voted for Nintendo and against Bush
Okay, I may have read a tiny bit, but it's irrelevant.
The question is `what does it do and how well does it do it?'
Should we desire for screen displayed content be equivalent to printed to published? No.
Do we desire for the three to have accessible translations between each other? Yes. And note that it goes both ways. I want to be able to throw an e-mail on a webpage and throw a doc on an e-mail and throw all three in a book and so forth. As more technology becomes available and therefore new mediums develop I want to be able to throw stuff in them and throw them in stuff.
Now sure, a book isn't that accessible to go back to digital (yet). Why not? Throw a barcode system in the books. They have an index and a table of contents and chapters and a bibliography. Why not a `printed information to digital information code' (PITDIC)?
But we're talking digital document formats.. which means that there should be what? What is the equivalent of having a barcode system in a book to allow a quick scan to give you all the data contained therein as a digital contents with relevant tagging/metadata? Uhh, dumbass. It's the metadata and the content, together!
Which is what XML and ODF and whatever dreck Microsoft has tried to override ODF with. Now, I have not studied the specific formats (so yes, you can call me an ass for being biased against MSFT), but if they are properly designed then they are a step in the right direction. What is properly designed? It's when they are simple and basic and elegant enough that we can change down the road and not break the old.
We're not children anymore, humanity. It's time we think about how to do that. And before you bitch at me, note there's a difference between being obsolete/technologically inferior and broken. Broken is when I can't access my old stuff. Obsolete/technologically inferior is when I don't want to.
I wonder if it's true, after all there are two implementations of ODF: OOo and KOffice, it'd be interesting to hear KOffice developers on the subject.
Recently I hear a criticism of ODF by Miguel de Icaza is that ODF doesn't reuse standards like SVG as much as it should..
I use OpenOffice. I support Open Document Format over MS/XML and .doc.
That said, ODF it kind of blows. Really.
I write novel-length "books" and it is FREAKING IMPOSSIBLE to do some very basic things in any/every ODF based word processor I have tried to date.
Exercise for the Interested:
Make a "Book" with an automatic table of contents, said table to contain an "Authors Note", "Prologue", auto-numbered chapters 1 to N with their associated chapter titles (where the actual chapter number is the chapter number internal variable), and finally "Epilogue" all at the same level of the index.
This simple task is essentially impossible. The flaw is caused by the fact that everything goes through the "styles" and the styles don't inherit their list membership properties. You should be able to make a style "TOC Entry" that is assigned to a particular table of contents level (e.g. level 1) then make a sub-style "Chapter Heading" based on "TOC Entry" but with the chapter numbering magic attached, and in so doing, create "different styles" that go to the same level/point in the list.
Exercise for the Interested:
Make a "Book" with each chapter, and the prolog, and the epilog in separate sub documents. The linkage thing is a mess, it is hard to move "the pile of files" around especially if you want to use subdirectories (etc). If you have a custom style in the master document style list you have to _USE_ it in the master document if you want it to be pushed into the created sub-documents. Once the sub-documents are created it is a royal pain (read effectively impossible, or "supremely hidden feature required") to update those styles in those sub documents if you change that style.
Exercise for the Interested:
Put three separate "outlines" into one ODF Document. In ODF the outline is a function of the style headers, they only exist as implications of structure instead of first class abstractions. This is largely the fault of Microsoft Word, since the Word folks totally messed this up when they supplanted WordPerfect (which did this inset outline/object sort of thing right).
ODF was, IMHO, poisoned by the slavish attempt by someone trying to make a Word killer instead of a "good word processor."
And there are stacks more of these issues.
And all that said, I *STILL* use ODF (Open Office etc) because I CATEGORICALLY REFUSE to _RENT_ the right to access my own work from a third party. Microsoft has plainly stated that such rental model is their intended business plan, which makes them a non-starter.
In my opinion, having used both Word and OpenOffice for years; and having used Word Perfect and wordstar before them, ODF is a "workman like effort" to create a document format suitable for "normal business purposes". There is a reason that the legal profession never moved over to Word, and they likewise will not move to ODF, when you need to get to a tightly proscribed document format, both Word and ODF have a "you can't get there from here" fundamental limitation. Both formats simply refuse to represent some things because the designers "know" that a different format is better. Neither ODF nor Word has any allowances for _art_, professional or poetical.
So, governments should use ODF because it is "no worse" than Word in terms of the ability to represent the documents it can represent, and given that congruence, the shorter, 100% open standard is, or should be, a hard minimum requirements.
In terms of ODF being the be-all and end-all of document representation, I'd have to say "hardly!" I looked into the OpenOffice code base a while back to see if adding/changing the format to allow for "a book" would be reasonable. It didn't appear to be. Too many of the original StarOffice assumptions about document structure seemed pathologically uninspired. It was like looking at a big pile of Visual Basic. Everything in the standard is way too global, nothing "nests organically" it all nests pedagogically. (Every
Innocent people shouldn't be forced to pay for inferior software development.
--"Code Complete" Microsoft Press
HTML/CSS is at most only an output format, i.e. one uses it as the final presentation format, like PDF or ps. To actually write and edit something (even through some GUI), then save, reload and make significant changes is absolutely horrible with it. This was supposed to get better when CSS was introduced, but somehow the format specification failed miserably (IMHO, of course).
I'm not a fan of the two mentioned formats either, way too much bloat. somehow that seems to be normal with xml based formats (but that's probably just a pet peeve of mine).
He wants HTML/CSS documents? Isn't this what Google docs do?
Anyways sounds like a good idea to me. I often have to share documents and I don't like to have to force people to install a specific application just to read them.
Håkon invented CSS, and is largely responsible for making most of the designs you online possible.
If only someone had thought of such an implementation in the 1960's... oh wait, they did. Anymore wheels need reinventing?
ODF is not about web pages or word processing. It's a standard for office documents including spreadsheets, presentation and word processing. That's a big difference from what Opera's CTO is talking about. CSS/HTML might make a good format for one part of the suite (word processing) with a lot of work on the standard. The issue: that's not what is needed for a standard. It's about doing for office documents what HTML did for websites. ODF is actually an opportunity for opera - extend the browser to support ODF so people can post ODF documents, make dynamic applications render to ODF and so on. It takes the web to the next level and further erodes the big monopoly.
-- $G
Build a word processor that uses html/CSS with options and flexibility comparable to OpenOffice.
Actually, I have every confidence that Opera can. . . I've been a happy Opera user since 1999.
Tech Public Policy stuff
How about Microsoft and OpenOffice just keep their own XML formats? One of the great things about XML is that you can use XSLT to transform one XML document into another one with different syntax. As long as both products can open, display, and convert the other format then I don't really see the need for a standard in this situation.
A standard is going to limit innovation in word processors unless you specifically allow extensions in the standard, which kind of defeats the purpose of a standard.
If the goal is to send out a document that anyone can read, then convert to PDF or a web page. "I shouldn't have to convert b/c I'm a stupid user" you say? Don't expect a 600-6000 page standard to solve this problem.
However, I think the parent post's main point was that LaTeX is not here and now usable across the globe. With MS Word or OpenOffice, I can type and mix Japanese, Korean, Russian and French in one and the same document, and I can share it with millions of users on different platforms.
With the default installations of LaTeX, that is impossible.
"2 standard formats is not a good idea"
.txt but that doesn't make it any better of a format for office software.
"That's the reason I'd wish to add a third"
I wish he didn't ruin the entire opinion with all those html/css pipe dreams, they are so extremely unrealistic besides of all the mess that would actually be to adapt them in a useful way for office formats. Really this was kind of a shame.
I could also write a book in
Copyright infringement is "piracy" in the same way DRM is "consumer rape"
Because only one of the specifications is really akin to a memory dump - the Microsoft one.
My idea for a document format is something like Tex, but with dynamic a.k.a. just-in-time compilation. This way it would be fast enough to be interactive.
Packages would be translated into binary VM code. When a document is opened, they would be dynamically compiled or interpreted and would be able to respond to mouse or keyboard actions.
http://mitpress.mit.edu/sicp/full-text/book/book.h tml
Is this a book? or a Website?
Actually is both of them, but it's to me more a book than a website. A book is defined by its contents, the sequential flow of text where you keep reading it from start to end. With coherent writing and good spelling.
So I think that a book that uses HTML and CSS is still called a book. An online book, but still a book.
We are Turing O-Machines. The Oracle is out there.
It looks you would enjoy working with LaTeX.
You can specify everything to the smallest detail.
I like the syntax, however some people can't get used to it. YMMV.
We are Turing O-Machines. The Oracle is out there.
Using a word processor to write a book is like using stone tablets and and abacus for spreadsheets. You really ought to look at markup-based typesetters like LaTeX or DocBook or software specifically designed for book production.
Uh, I have two additional lines in the pre-amble: The first specify that I want Danish typesetting conventions, and the seconds specify what character set is used in the text. Doesn't seem complicated to me, especially since it is the same in every document (and I use the "latin1" one in my English documents as well).
One more addition compared to US users: The rest of the world use A4 paper.
'A standard is going to limit innovation in word processors'
.. ?pause to cogitate .... nop, still don't make any sense . How is a document format going to affect a display format. I always suspect where the argument is coming from when they have to invoke the 'i' word.
.. receive a word doc, send it as Word 2000 file to someone who has word XP who sends it to someone who has Word 2003 and saves it as DOCX who sends it back to me, who can't open it, who saves it as RTF, which then loses all the lovely shaded boxes.
A novel argument if I ever heard one. For argument sake lets say innovation in GPS is being limited by a lack of multiple standards
'If the goal is to send out a document that anyone can read, then convert to PDF or a web page'
No the goal is to send a document that anyone can edit, display and print the same on any Word Processor. The real reason the file formats change with each new version of msOffice is to force us back each time for some more 'innovation'. Your best bet is to send them an OpenDocument file and point them to the Open Office site.
Or else
http://www.openoffice.org/
was: Why is this an issue?
davecb5620@gmail.com
I'm a physicist so LaTeX is the only sensible way to write reports/papers, but it's starting to look a little long in the tooth and could do with a scrub.
Have the LaTeX3 project produced anything? I get the (perhaps completely wrong) impression that the project has sunk into navel-gazing, in a search for the perfect solution. I'd be happy with the next iteration, if it comes out soonish. Release early, release often?
XML is simple... It's like violence. If it didn't work, you didn't use enough of it.
Where's the Kaboom?
There's supposed to be an Earth-shattering Kaboom.
He's telling me that this is really difficult to understand and parse?
Sure it's not as clean as HTML for such a small bit of text, but it's not impossible to wield, unless you want pixel accuracy, in which case, CSS is difficult as well.
Office XML document:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pkg:package xmlns:pkg="http://schemas.microsoft.com/office/200 6/xmlPackage"><pkg:part pkg:name="/_rels/.rels" pkg:contentType="application/vnd.openxmlformats-pa ckage.relationships+xml" pkg:padding="512"><pkg:xmlData><Relationships xmlns="http://schemas.openxmlformats.org/package/2 006/relationships"><Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocu ment/2006/relationships/extended-properties" Target="docProps/app.xml"/><Relationship Id="rId2" Type="http://schemas.openxmlformats.org/package/20 06/relationships/metadata/core-properties" Target="docProps/core.xml"/><Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocu ment/2006/relationships/officeDocument" Target="word/document.xml"/></Relationships></pkg: xmlData></pkg:part><pkg:part pkg:name="/word/_rels/document.xml.rels" pkg:contentType="application/vnd.openxmlformats-pa ckage.relationships+xml" pkg:padding="256"><pkg:xmlData><Relationships xmlns="http://schemas.openxmlformats.org/package/2 006/relationships"><Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocu ment/2006/relationships/webSettings" Target="webSettings.xml"/><Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocu ment/2006/relationships/settings" Target="settings.xml"/><Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocu ment/2006/relationships/styles" Target="styles.xml"/><Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocu ment/2006/relationships/theme" Target="theme/theme1.xml"/><Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocu ment/2006/relationships/fontTable" Target="fontTable.xml"/></Relationships></pkg:xmlD ata></pkg:part><pkg:part pkg:name="/word/document.xml" pkg:contentType="application/vnd.openxmlformats-of ficedocument.wordprocessingml.document.main+xml">< pkg:xmlData><w:document xmlns:ve="http://schemas.openxmlformats.org/markup -compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:o12="http://schemas.microsoft.com/office/200 4/7/core" xmlns:r="http://schemas.openxmlformats.org/officeD ocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeD ocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp="http://schemas.openxmlformats.org/drawin gml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordpro cessingml/2006/main" xmlns:wne="http://schemas.microsoft.com/office/wor d/2006/wordml">
<w:body>
<w:p><w:pPr><w:ind w:left="720"/></w:pPr>
<w:r w:rsidR="00F4543A">
<w:t>Hello World.</w:t>
</w:r>
<w:r w:rsidR="00F4543A"><w:br/></w:r>
<w:r w:rsidR="00F4543A"><w:br/>
<w:t>Hello Universe.</w:t>
</w:r>
</w:p>
<w:sectPr w:rsidR="0074581F" w:rsidSect="008A7339"><w:pgSz w:w="12240" w:h="15840"/>
<w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w:gutter="0"/>
<w:cols w:space="720"/>
<?xml version="1.0" encoding="UTF-8"?> :text:1.0" :drawing:1.0" :meta:1.0" :dr3d:1.0" :form:1.0"
<office:document-content xmlns:office="urn:oasis:names:tc:opendocument:xmln s:office:1.0"
xmlns:style="urn:oasis:names:tc:opendocument:xmln s:style:1.0"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns
xmlns:table="urn:oasis:names:tc:opendocument:xmln s:table:1.0"
xmlns:draw="urn:oasis:names:tc:opendocument:xmlns
xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:x sl-fo-compatible:1.0"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:meta="urn:oasis:names:tc:opendocument:xmlns
xmlns:number="urn:oasis:names:tc:opendocument:xml ns:datastyle:1.0"
xmlns:svg="urn:oasis:names:tc:opendocument:xmlns: svg-compatible:1.0"
xmlns:chart="urn:oasis:names:tc:opendocument:xmln s:chart:1.0"
xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns
xmlns:math="http://www.w3.org/1998/Math/MathML"
xmlns:form="urn:oasis:names:tc:opendocument:xmlns
xmlns:script="urn:oasis:names:tc:opendocument:xml ns:script:1.0"
xmlns:ooo="http://openoffice.org/2004/office"
xmlns:ooow="http://openoffice.org/2004/writer"
xmlns:oooc="http://openoffice.org/2004/calc"
xmlns:dom="http://www.w3.org/2001/xml-events"
xmlns:xforms="http://www.w3.org/2002/xforms"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-insta nce"
office:version="1.0">
<office:scripts/>
<office:font-face-decls>
<style:font-face style:name="Times New Roman"
Object-oriented programming concepts should have sorted out this mess years ago.
:)
1) Write a class for each document type, with methods to construct the logical document structure (eg, add paragraphs to a report, define the author name, whatever).
2) Define a set of standardized rendering interfaces (screen, printer, audio, etc).
3) Write some renderers for various (document-class, rendering-interface) permutations (eg, one that renders articles to the screen, or books to the printer).
You're welcome.
I actually did publish a book that I authored in HTML. More precisely, we used HTML run through a really ugly preprocessor that one of the original authors of the book created while she was teaching herself Perl.
Fortunately, our publisher found an SGML/XML wizard who did a very good job of converting the HTML to XML, which then got converted PDF using an off-the-shelf XSL-FO processor. I was very impressed with his work, without which the conversion would have been a total nightmare. It was still very tedious, though, because HTML is not a true structured format, and you cannot completely automate its conversion.
It would, of course, have been much more efficient to have authored the document in XML in the first place. I remember this actually being proposed back in 1998 for an earlier version of the book. (I was not a co-author back then, but I was working for the department that owned the content.) The manager who responsible for this had zero interest: HTML got the job done, she didn't have the resources to do a big XML conversion. Never mind the huge inefficiency of authoring that book, and a lot of other content related to the Java SE platform, in plain HTML. Only now, after this manager has left the company and it has become painfully obvious that they can no longer afford to hack such a huge mountain of HTML code, is the company getting round to making the conversion.
HTML is just not a good format. It's barely adequate for creating web pages, and totally useless for anything else.
The history, in case you were wondering, is that Opera consistently disapprove of XML. They don't like XSLT, XSL-FO, or even XPath. They would prefer we use a CSS syntax with poorly implemented selectors and lack of code-generation. Opera always hate XML, they always like LISP-looking syntaxes despite being slower (try finding a stream-based CSS parser). This is a continuation of their backwards ideas about the web.
Ergo, when drafting out the spec, it will have to be somewhat larger to accomodate all the 'what ifs' and "(insert underlying spec. here) doesn't natively have an ability to express that - we'll have to extend it"
I've specifically not used HTML/CSS/XML/ETC for examples because I believe that we will be facing this scenario regardless of what standard is used. The only way to avoid it, IMHO, is to do as they did for C and build from the ground up.
Let me know what you think.
If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.
You can't create documents in word with good styling.
But if you did it in WP51, and then converted it to Word, it worked perfectly fine.
So I guess the Word engine can handle them just right, but the interface is incapable of generating them the right way.
We are Turing O-Machines. The Oracle is out there.
If the point is for businesses and governments to adopt a standard, then at some point, a credible third party (standards body, government agency) needs to produce a conformance suite, and vendors need to show that they pass with flying colors.
Given the 6000 pages for OOXML and 700 for ODF, it will be interested to see if either will be done. Just imagine the test cases and the explanation of what they do.
I start to understand why Håkon Lie doesn't much care for either.