Universal Ebook Format Debated
Amy Hsieh writes "A well-known ebook industry expert, Jon Noring, recently wrote an interesting article for eBookWeb, formally calling upon the ebook industry to adopt a single universal ebook distribution format. Right now there's a plethora of essentially incompatible ebook formats, and this format 'babel' is hampering the growth of the ebook industry. In the article, Mr. Noring proposes a promising open-standards candidate which appears to meet a list of basic requirements: The Open eBook Forum's OEBPS Specification. Andy Oram, a Linux programming editor for O'Reilly, wrote an interesting reply to the article that should also be read." On the other hand, Noring's proposal has also met with some skepticism elsewhere.
What about .txt?
If the software pirating industry can all agree on plain text "NFO" files with ASCII-painted flames, dragons eating your group's logo, and pot leaves surrounding shout-outs to your boys on efnet, I think the slightly more professional and law-abiding ebook industry can agree on a standard format.
Small potatoes make the steak look bigger.
Is Project Gutenberg and a Palm Pilot.
I don't think any format will get Ebooks to catch on until we have reader hardware that makes reading those books at least as pleasant as reading a paper book.
Here's hoping that all those e-paper efforts will produce something usable soon.
Why should i choose a format that have all possibilities to have DRM included in the future thus allowing only one read. And will require Electricity to read.
This is especially true for for factbooks who are often used as reference and not to be read just one time.
So far Ebooks cant beat the paper version in portability, convenience and ease of use.
Paperbook still seems more favorable to me.
Yup - just like there's a plethora of essentially incompatible word processing formats - hampering the growth of the office/word processing market.
But the industry doesn't matter to one player - only their market share does.
The only way to really win this sort of thing is to persuade all (or at least most) consumers to boycott products that deliberately break compatability with standards.
But how likely is that to happen?
I think this was the mistake of the iTunes Music Store. While not terrible (actually slightly better quality) AAC is not as universal a standard as Mp3 or even Ogg. There are WAYS to encrypt and secure those formats. Napster, just before its demise, had figured out how to secure MP3's that were downloaded from it's system.
Yell & scream & rant & rave... it's no use... you need a shaaaave ~ Bugs Bunny
I had a taste of incompatible e-Book formats when I got my first colour Palm.
Sadly, there were better (open) formats using better compression and rendering, losing out to closed formats with big marketing push.
The format that ultimately prevails will not necessarily be the best. It'll be the format pushed by those with the greatest marketing skills/budget, and the one which gives them the greatest control over how their works are used.
It wouldn't surprise me if authors are already signing e-book distribution deals which forbid them from releasing in rival formats.
One of these days, the masses will choose software and data formats according to quality and freedom.
But something within me suspects that the Pope will convert to Islam, and the Jews will profess the divinity of Christ first.
-- In the beginning was the WORD, and the WORD was UNSIGNED, and the main(){} was without form and void...
There also is this e-book xml format:o ok_descri ption.html
http://haali.cs.msu.ru/pocketpc/FictionB
I use his excellent HaaliReader as a text reader on my pocketpc (fullscreen, landscape mode). There are also html2xml and word2xml tools on his site.
The Project Gutenberg Etexts should so easily used that no one should ever have to care about how to use, read, quote and search them ...
.it is the only text mode that is easy on both the eyes and the computer.
.to IBM, to Mac, to TRS-80. . .
.not just those your access has allowed you to get from Project Gutenberg. The point is that a decade from now we probably won't have the same operating systems, or the same programs and therefore all the various kinds of etexts that are not Plain Vanilla ASCII will be obsolete. We need to have etexts in files a Plain Vanilla search/reader program can deal with; this is not to say there should never be any markup. . .just those forms of markup should be easily convertible into regular, Plain Vanilla ASCII files so their utility does not expire when programs to use them are no longer with is. Remember all the trouble with CONVERT programs to get files changed from old word processor programs into Plain Vanilla ASCII?
.so is very much of the value of most of the various markup systems we have in the world. But until some real standards arrive-- we would be limiting our options a great deal if we do not keep copies of all etexts in Plain Vanilla ASCII as well.
.an operating system, a program, a markup system. . .will not.
.as the .Z compression format does in a similar manner today.
This has created a need to present these Project Gutenberg Etexts in "Plain Vanilla ASCII" as we have come to call it over the years.
The reason for this is simple. .
However, this encourages others to improve our etexts in a variety of ways and to distribute them in a variety of the available media, as follows:
Once an etext is created in Plain Vanilla ASCII, it is the foundation for as many editions as anyone could hope to do in the future. Anyone desiring an etext edition matching, or not matching, a particular paper edition can readily do the changes they like without having to prepare that whole book again. They can use the Project Gutenberg Etext as a foundation, and then build in any direction they like.
Thus any complaints about how we do italics, bold, and the underscoring, or whether we should use this or that markup formula are sent back with encouragement to do it any ways any person wants it, and with the basic work already done, with our compliments.
The same goes for media. We have had a long-standing work ethic of providing our etexts in any medium people wanted: Amiga, Apple, Atari. .
However, now that our etexts are carried in so many BBS's, networks and other locations, it is easier to download the file in a manner that puts them in your format than we can make and mail a disk, so we don't really do that too much.
The major point of all this is that years from now Project Gutenberg Etexts are still going to be viable, but program after program, and operating system after operating system are going to go the way of the dinosaur, as will all those pieces of hardware running them. Of course, this is valid for all Plain Vanilla ASCII etexts. .
Do you want to go through all that again with every book a whole world ever puts into etext?
The value of Plain Vanilla ASCII is obvious. .
We don't have anything against markup. Not vice versa.
Alice in Wonderland, the Bible, Shakespeare, the Koran and many others will be with us as long as civilization. .
This includes the many requests we have for compression in particular formats. There are only two formats we know of that are suitable for transfer to a wide general audience: Plain Vanilla ASCII (.txt files) and ZIPped files of them, (.zip files). Requests for other compression formats must be ignored as they are appropriate only for small portions of our target audience. However, (programmers take note: we will need help) we are planning to put some compression links on our files so they can be transmitted in any of an assortment compression formats on the fly. i.e. we should be able to generate any kind of file asked for, but we can keep only one copy of each etext on our servers. .
Shows what I know.
A couple of side notes: And how can you not know what babel is? Babel: Tower of babel: a story from the bible where King Nebekenezur (there is no correct spelling for that in english, just commonly accepted ones) wanted to build a tower to god, so god being jealous, put a spell on everyone, and they all ended up speaking a different language. It's how the christians believe that there came to be multiple languages.
Now the website babelfish gets its name from 'The Hitchikers Guide to the Galaxy' by Douglas Adams, where the characters 'stick a babelfish in their ear' to act as a universal translator.
Speak for yourself.
Different readers, different platforms, and different applications have different requirements!
Some uses want a format which is compact as possible. Some focus on readibility (switchable fots, etc.) Others -- facimile-style releases -- emphasize that the copy should as closely mimic the original work as possible. Formats can emphasize the syntactic structure of the text (sentences, paragraphs), or the structural qualities (line breaks, pages).
Even in their paper forms, books have different formats for different uses. Libraries prefer hardcovers, with durable bindings. Travlers prefer paperbacks, with small and light pages. Collectors pay extra for special editions, with quality supplies. Some readers prefer large-print copies, abridgements, or books on tape (in a choice of cassette tape or compact disc!)
Any format makes assumptions, and deletions. It's perfectly fine to have a multiplicity of formats. If its useable, and reasonably priced, people will buy it.
For me, the major hindrance to e-books is the price. Since there is no associated cost of the materials (paper/cardboard), printing, physical transportation, stocking space, and delivery, e-books should be [i]cheaper[/i] than physical books. But many of them are priced the same, or even high (you can check this at Amazon.) what's up with that?
Why not just use SVG?
My other account has a 3-digit UID.
... is paper. Seriously.
The nice thing about a book is that it doesn't have a power switch - it's actually relaxing to sit there and read it.
If it were possible to obtain a high speed printer capable of printing out "e-books" in the same form-factor as a normal book (ie double sided pages, standard size, neatly bound) then I for one would pay for *lots* more books (and paper, and ink.)
I'm still amazed that the whole of the business world is happy to accept MSWord .doc as the standard to store virtually all of their documentation. I don't think the film industry would be happy standardizing on .avi or the music industry on .wav, so why doesn't the business word get it's act together and accept a better format than the crappy .doc?
Take a look at this - 1dok.org - an open document format
Is Project Gutenberg and a Palm Pilot.
I would like to put up a server to serve up Gutenberg, etc. a page or so at a time for low-end WAP phones, with simple indexing and serching capabilities. The simpler cell-phone is what I really always have in-hand with good connectivity when I would like to read. Palm Pilots never seem to have enough storage to keep whole books or widespread connectivity.
Ha anyone done this? It should be popular and not too resource-intensive.
Take 2 minutes and read this article from RMS
Right to Readplease proff read !
You have to be careful. Half of you are saying "I won't use this until e-books are as pleasant as paper books" and half of you are saying "why not use the standards that are already there? Just make the device do everything."
Don't you see these are at odds?
To make e-books as pleasant as real books, you're going to want to make them thinner and thinner in profile. You're going to want to make them run on a single lithium cell battery or AAA. You're going to want to drop all of the interface but the forward, back, and bookmarking buttons. You're going to want the computing device to be as close to nothing as possible, so you can put weight into making the device indestructible like a real book. You want to go to the store, buy the title, and have it just work, or go to Amazon and *know* your desired title is published in that format. That's the ideal, in the near term. It isn't a device that will easily accomodate PDFs and HTML and a number of other standards.
The lack of an ebook standard has thusfar kept me from buying any hardware ebook reader. I would be happy to shell out the cash for one if i knew i could use it with all the books out there.
Won't you be my my neighbor?
Yes, of course some spiffy new format will have other advantages. But it's unlikely to gain quick acceptance. Plain text documents are everywhere, as are readers and other software. There are even online publishers selling text files. In fact, ASCII text is arguably the most successful electronic standard there is!
Ceterum censeo subscriptionem esse delendam.
BZZZT. Wrong.
Nebuchadnezzar lived during the time of Daniel. The events of the Tower of Babel are chronicled in the book of Genesis.
I have developed a higly sophisticated format for storing books in a computer file.
.TXT extention.
Each character of the book is to be enciphered to a byte. I reserve the first 32 codes (0-31) for various system function characters. The next 32 codes (32-63) encipher the space character, various punctuation marks, and numerals. The next 32 codes encipher the capital alphabet and a few more punctuation characters. With the simple use of 00111111 binary mask 'A' maps to 1, 'B' maps to 2, and 'Z' maps to 26. Quite clever if I say so myself! Naturally the next 32 codes encipher the lowercase letters in the same manner. Using the very same 00111111 bitmask you find 'a' mas to 1, 'b' maps to 2, and 'z' maps to 26! Ingenious, isn't it?
To ensure compatibility with legacy computer systems values above 127 shall not be used.
I call this encoding Advanced Storage Cypherment Input Ideal - or A.S.C.I.I. Any file utilizing this encipherment is a Tagged eXchange Template. These files may be identified by the use of a
-
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
Everyone in academia uses LaTeX and PostScript, since PDF is silly and HTML doesn't have layout features.
The biggest trick the devil pulled was letting lawyers become politicians so they can write the laws.
Expro: I would like to put up a server to serve up Gutenberg, etc. a page or so at a time for low-end WAP phones
I got bored last Christmas and did this.
www.wapnovel.com (WAP or desktop)
There's also an as yet unused discussion group at:
http://groups.yahoo.com/group/wapnovel
Andrew Oakley - www.aoakley.com
That needs to be modded Funny.
By my reckoning, MS-Word has had more than 15 different formats in 9 years. I gave up MS-products for Lent a few years ago, but back in the day when my new laptop arrived with MS-Word95 (or whatever it was called), I had to go find MS-Word 6 and resave manually every last word document + metadata in RTF format in order to be able to read them in the new program.
Too bad the data format is tied into specific applications. This is an old archival issue that is fortunately being dealt with by establishing open file formats and cross-platform applications (staroffice, openoffice, wordperfect, abiword).
HTML caused the WWW, it will be interesting to see what happens with file formats for productivity suites.
Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
Face it, the only reason that HTML/XML/LaTeX or whatever simple suitable format hasnt been chosen as standard (even though its so bloody obvious that it should) is because they dont have DRM.
:)
DRM Capability: Although end-users prefer not to purchase ebooks protected with DRM (Digital Rights Management), publishers are certainly interested in the DRM capability of the universal ebook format. Thus, the universal ebook format must allow inclusion of DRM protection technologies as needed.
Its obvious that the author of that article has no idea about DRM, any slashdotter will tell them that DRM is pointless, if you can read it you can copy it, if one copy gets made a million get made etc.
Even if the publishing companies decide otherwise, everyone else will probably just rip books into HTML or something. Im sure most companies such as Microsoft or Adobe would love to have invented an all-purpose, DRM equipped publishing medium - tough, HTML/plain-text etc. lives on
This comment does not represent the views or opinions of the user.
Adobe, can you hear me? Business Opportunity Nocking.
My wife has a small publishing/consulting company that has taken us 16 years -- and a lot of investment and pain -- to build. She works her butt off gathering the content which she then publishes as print products and CD-ROM "ebooks".
She is devastated when she hears from someone that they've copied one of her color newsletters; made a "backup" of the CD-ROM ebook and someone else "happens to be reading it so I thought I'd call with a question"; and otherwise copies illegally (no...we don't have the funds to pursue them). She had an opportunity to publish a digital product in Asia and another in Latin America but these markets are notorious for buying *one* and suddenly hundreds or thousands appear (I could digress with a personal story when I was at a software company and saw this first-hand...but it's too long).
PDF is the best standard right now. Platform support for everything out there virtually; security; but there is no meaningful method of DRM that would protect a small businessperson AND make it relatively easy to move ebooks from device-to-device (I know that I would hate to have to remember codes from dozens of publishers; be locked in to one machine for viewing; or other cumbersome methods).
However, no protection = no incentive. I don't care if you're an recording artist seeing your music ripped off or someone like my wife struggling to grow a business. Why should my bride travel to Europe and domestically gathering content; pay correspondents and photographers; and publish a product in ebook format that is super-simple to copy and distribute?
This is why I'm struggling so hard with the whole discussion about ebooks; copyright; DRM and fair use. So some how, some way, we've got to come up with a solution that offers some sort of universal ebook format that content producers can agree on and users can live with.
My $.02....
It's all about the price.
Shadow Puppets (hardcover) by Orson Scott Card is priced on Amazon for USD$18.15
Electronic version USD$25.95 (M$Reader and Adobe)
With the e-version the pubisher has no printing costs, no binding costs, virtualy no shipping costs, no warehousing fees, no sales clerks.
Like most everyone, I prefer my books on paper, but there are times where an e-book version would be convenient. But I am not going to pay 4 times the paperback price for the experience.
If the competing "companies" would take a look at Europe and how far past the US they are with cellular adoption, it would seem obvious that having one standard in the end benefits everyone. The US market is fragmented and the costs are much higher... .02 deposited
wordtrip.com
"Right now there's a plethora of essentially incompatible ebook formats, and this format 'babel' is hampering the growth of the ebook industry."
Bullshit.
The problem is that the few people who actually still read books are not likely to be stupid people. On top of that, the people who are reading electronic formats of books are even less likely to be stupid people.
However, it would take rather dim consumers indeed to not see a problem with paying the exact same cost for an eBook as one would in a brick and mortar bookstore for a paperback... and strangely when I go to these eBook sellers online, I see exactly that. "Oh joy! Instead of paying $7.95 for that paperback over an Barnes & Noble, I can pay just $7.95 to download an electronic copy in a format that I probably won't be able to read again in 10 years because the format and it's reader will have been declared obsolete!"
The unwillingness of eBook publishers to see eBooks as something other than a way to increase sales profits by cutting out the middlemen of printing and shipping expenses is what is hampering eBook adoption.The ultimate in bloat! Imagine; an XML document describing the vector points and bezier curves required to draw each and every character in an entire eBook! If nothing else, it would push the price of multi Tb drives down.
" >
You don't know squat about the SVG format, do you?
If you took 2 minutes to check the SVG-specification you'd see that you are totaly wrong.
Text in SVG is described as... wait for it... ordinary text. The characters are then mapped to glyphs in your SVG-viewer. Just like in MS Doc, or Adobe PDF.
Just to show you the "ultimate bloat" I'll include an example. If you just want an ordinary plain vanilla book this is all it takes.
<?xml version="1.0" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd
<svg width="25cm" height="35cm" viewBox="0 0 2500 3500" xmlns="http://www.w3.org/2000/svg" version="1.1">
<text x="250" y="150" font-family="Verdana" font-size="55" fill="blue">
Insert book text here...
</text>
</svg>
"First lesson," Jon said. "Stick them with the pointy end."
While everyone may use LaTeX, PDF has become more and more popular for web distribution of papers. PS works fine when you're just sending it to the printer, but because Adobe didn't include PS support in Acrobat, Windows users don't bother.
But TeX/LaTeX has the advantage of being pretty much immutable, second only to plain TXT on that count. The standard hasn't changed since, what, 1982? Hopefully we'll be able to process the same documents with the same tools fifty years from now.
I think the important distinction between, say, Word format and TeX is that TeX is a piece of systems programming---it performs a well-defined task in a well-defined matter, much like lex or yacc do. An attempt to add 'features' is nonsensical. (Though functionality can be extended through the use of, say, pdftex.)
--grendel drago
Laws do not persuade just because they threaten. --Seneca
Why? Well, an ASCII text version of a printed book is really more like an analog facsimile than is a version in XML that has been tagged for structural features. Leaving aside issues of non-English characters, illustrations, and unusual typography, ASCII does a relatively poor job of capturing all of the structural conventions that exist in printed books. Books have copyright pages, tables of contents, chapter titles, subtitles, bylines, epigraphs, block quotations, footnotes, running headers and footers, citation lists, etc. ASCII can provide rough format equivalents of some of these, very poor equivalents of others. With an appropriate XML tagset, however, it's a relatively simple matter to tag most of the structural features of a book and then use stylesheets for presentational rendering. That's the whole assumption of the Open eBook specification.
Suppose you're in a world where all printed copies of Huckleberry Finn have been lost. You have two CD-ROMS that somehow you've managed to decode so that you can read the files and interpret their character sets. One of them contains the Project Gutenberg etext of the novel, an ASCII transcription. The other contains an XML encoding tagged according to a DTD from the Text Encoding Initiative, the current best standard for encoding literary (and many other) texts. It has all of the textual content of the PG version, as well as some that's missing (like the table of contents and the copyright page from the transcribed edition, which the PG version unaccountably omits). XML tags mark all the line and page breaks of the original. In addition, there are tags to mark quoted speech, unusual typography, words in foreign languages, and other significant features of the original. The CD-ROM contains the DTD used along with documentation on the tagset.
In this imaginary scenario, even if all of the XML documentation were missing it would be pretty straightforward for 31st-century programmers to strip out the tags and recreate the ASCII transcription. But with the documentation, it's possible to reconstruct something much closer to the original than the plain-vanilla PG version allows. And suppose your 31st-century archaeologist found a trove of TEI-tagged books on CD: with all of the structural tagging and metadata about authorship, publication dates, etc., a 31st-century librarian will be able to plug all of the books into a cataloging system that allows sophisticated searching. If instead you had a trove of plain-ASCII books, the best you could do with the collection would be simple full-text searches.
Leaving aside the sci-fi scenario, the reality is that our documents, over the next few decades, will move from format to format and be used for purposes that we can only guess at right now. Of course plain ASCII, or even proprietary formats, will be better than no documents at all. But the work involved in converting them will be a lot higher than if they are tagged in a well-documented, structured markup language.
Incidentally, there's already at least one project underway to take Project Gutenberg texts and add minimal XHTML or XML markup to capture structure and make them more readable via stylesheets. The Open eBook specification is just a more sophisticated way of doing the same thing.
Though it might help, a universal format isn't what's hampering ebooks. It's price. I refuse to pay full price (and sometimes more!) for an etext of something I can get on paper; especially when I only get the etext.
Halfprice, maybe even quarter price, compared to deadtree is what ebooksellers should be going for...but if I still have to pay fullprice and I don't even get my dead tree, I'll pay the same for something slightly more tangible.
Now I would pay a couple of bucks (ie $2) more for a deadtree book which includes the etext.
-- Waht? Tehr's a preveiw buottn?
>Although end-users prefer not to purchase ebooks
>protected with DRM (Digital Rights Management),
>publishers are certainly interested in the DRM
>capability of the universal ebook format.
So although people would prefer not to buy books with this stuff, we're going to put it in there anyway. Whatever happened to listening to your customers?
With CSS there's not a lot HTML can't do with layouts.
No free, mature implementation of HTML and CSS can render a font not installed on the user's machine from outline data stored in the document. Mozilla has a bug on this open in bugzilla.mozilla.org (bug 52746), but it doesn't look like it's going anywhere. And no, "just replace with Helvetica, which is installed everywhere" is not an option because Helvetica for every non-Latin writing system is not installed on every reading device.
Will I retire or break 10K?
What I mean by that is that for many books with complicated layouts (including my own free books), it's simply not possible to reflow the text automatically. Consider an illustrated science textbook, which is the kind of work I do. There's a lot of hand-tweaking involved in getting everything laid out on the pages in the best possible way. And my books' layouts aren't even that complex compared to a lot of the big commercial textbooks out there. Some slashdotters may have used LaTeX to write academic papers, so they'll know how LaTeX tries hard to flow the text correctly, but ultimately it doesn't always do what you want, and either you or the publisher ends up doing more tweaking.
The solution isn't that complicated: if a publisher wants a book to make an electronic book available in both a a typographically rich version and an adaptable version, they can create both a PDF version and an HTML version. Of course, this is really an answer to a question that the publishers never asked. Most publishers don't want open formats, because open formats won't allow them to continue to steal away the rights of end-users, such as the right of first sale.
Find free books.
That's because a lot of
The main place ebooks beat paperbacks are in density, you can have many ebooks in a single reader, and in searchability and bookmarking. Many ebooks in a single reader doesn't help much if your battery is only good for 4 hours. Searchability doesn't help much either, for pleasure reading, but is very useful for reference books and school books. Bookmarking is good for starting up where you left off, but that is already a solved problem in paperbacks, it isn't a new capability.
The problem is that, at this point in development, ebooks are a *worse* way to read just about anything people buy for pleasure reading.
When the technology develops sufficiently to solve at least most, if not all, of the issues I noted above, I'll look more at ebooks... but until then, I'll stick with paperbacks, because they are *better*.
I'm not so sure about this. Book publishers like the idea of ebooks, because they can build in more limitations than are legal with current physical book publishing. EBooks kill "right of first sale" because they license the work, they don't sell it. EBooks remove the publisher's major gripe about "more than one person reads each copy of a book/magazine". Publishers like ebooks because they can severely limit readers' rights with them, and think they can do it legally, or at least with fewer court hassles.
It's a bad deal for readers/consumers, but hey, that's part of why book publishers like it. It changes the power relationship in the publisher's favor.
Hmm... on reading through this... this has *got* to be -1, Redundant. At least, I hope it is. Karma to burn...
This is my sig. There are many like it but this one is... Oops. Frank, I've got your sig again! Where's mine?
Plain ascii text isn't too bad, if you only write in English, and you don't want your text to look nice on tiny PDA screens.
But sensibly-done "plain" HTML is generally better. For an example, look at Baen.com. In the upper right is a "free" link that points to a bunch of sci-fi works that are online. You can get them in several formats. The HTML is a good choice in most cases, because it's not overly fancy, but produces good rendering in just about any HTML-capable window on any size screen.
So, even if you have a big screen, you can load the text into a narrow window along one side of your screen, and read it while you're waiting for a compile or a test run.
Of course, there's the inevitable problem of junk HTML produced by such things as Microsoft's various editors and word processors. But this isn't HTML's fault; it's the fault of the idiots who foisted such software on unsuspecting customers. And even then, most HTML renderers will display it sensible, so the only real problem it causes is the long download time for all the spurious junk that clutters up the text.
(Baen.com also had a thoughtful and entertaining essay on why they give out a lot of their books for free. It's an interesting summary of the impact of the Internet from an author's and a publisher's perspective.)
Those who do study history are doomed to stand helplessly by while everyone else repeats it.