Universal Ebook Format Debated
Amy Hsieh writes "A well-known ebook industry expert, Jon Noring, recently wrote an interesting article for eBookWeb, formally calling upon the ebook industry to adopt a single universal ebook distribution format. Right now there's a plethora of essentially incompatible ebook formats, and this format 'babel' is hampering the growth of the ebook industry. In the article, Mr. Noring proposes a promising open-standards candidate which appears to meet a list of basic requirements: The Open eBook Forum's OEBPS Specification. Andy Oram, a Linux programming editor for O'Reilly, wrote an interesting reply to the article that should also be read." On the other hand, Noring's proposal has also met with some skepticism elsewhere.
Does this 'babel' format have any relationship with Babelfish? Please don't tell me it's used to translate books into different languages!
Harry's Potter take ups his sword to slain the evil Mould a Wart
Mother, do you think they'll like this sig?
What about .txt?
If the software pirating industry can all agree on plain text "NFO" files with ASCII-painted flames, dragons eating your group's logo, and pot leaves surrounding shout-outs to your boys on efnet, I think the slightly more professional and law-abiding ebook industry can agree on a standard format.
Small potatoes make the steak look bigger.
Is Project Gutenberg and a Palm Pilot.
I don't think any format will get Ebooks to catch on until we have reader hardware that makes reading those books at least as pleasant as reading a paper book.
Here's hoping that all those e-paper efforts will produce something usable soon.
Why should i choose a format that have all possibilities to have DRM included in the future thus allowing only one read. And will require Electricity to read.
This is especially true for for factbooks who are often used as reference and not to be read just one time.
So far Ebooks cant beat the paper version in portability, convenience and ease of use.
Paperbook still seems more favorable to me.
Yup - just like there's a plethora of essentially incompatible word processing formats - hampering the growth of the office/word processing market.
But the industry doesn't matter to one player - only their market share does.
The only way to really win this sort of thing is to persuade all (or at least most) consumers to boycott products that deliberately break compatability with standards.
But how likely is that to happen?
I think this was the mistake of the iTunes Music Store. While not terrible (actually slightly better quality) AAC is not as universal a standard as Mp3 or even Ogg. There are WAYS to encrypt and secure those formats. Napster, just before its demise, had figured out how to secure MP3's that were downloaded from it's system.
Yell & scream & rant & rave... it's no use... you need a shaaaave ~ Bugs Bunny
I had a taste of incompatible e-Book formats when I got my first colour Palm.
Sadly, there were better (open) formats using better compression and rendering, losing out to closed formats with big marketing push.
The format that ultimately prevails will not necessarily be the best. It'll be the format pushed by those with the greatest marketing skills/budget, and the one which gives them the greatest control over how their works are used.
It wouldn't surprise me if authors are already signing e-book distribution deals which forbid them from releasing in rival formats.
One of these days, the masses will choose software and data formats according to quality and freedom.
But something within me suspects that the Pope will convert to Islam, and the Jews will profess the divinity of Christ first.
-- In the beginning was the WORD, and the WORD was UNSIGNED, and the main(){} was without form and void...
There also is this e-book xml format:o ok_descri ption.html
http://haali.cs.msu.ru/pocketpc/FictionB
I use his excellent HaaliReader as a text reader on my pocketpc (fullscreen, landscape mode). There are also html2xml and word2xml tools on his site.
The Project Gutenberg Etexts should so easily used that no one should ever have to care about how to use, read, quote and search them ...
.it is the only text mode that is easy on both the eyes and the computer.
.to IBM, to Mac, to TRS-80. . .
.not just those your access has allowed you to get from Project Gutenberg. The point is that a decade from now we probably won't have the same operating systems, or the same programs and therefore all the various kinds of etexts that are not Plain Vanilla ASCII will be obsolete. We need to have etexts in files a Plain Vanilla search/reader program can deal with; this is not to say there should never be any markup. . .just those forms of markup should be easily convertible into regular, Plain Vanilla ASCII files so their utility does not expire when programs to use them are no longer with is. Remember all the trouble with CONVERT programs to get files changed from old word processor programs into Plain Vanilla ASCII?
.so is very much of the value of most of the various markup systems we have in the world. But until some real standards arrive-- we would be limiting our options a great deal if we do not keep copies of all etexts in Plain Vanilla ASCII as well.
.an operating system, a program, a markup system. . .will not.
.as the .Z compression format does in a similar manner today.
This has created a need to present these Project Gutenberg Etexts in "Plain Vanilla ASCII" as we have come to call it over the years.
The reason for this is simple. .
However, this encourages others to improve our etexts in a variety of ways and to distribute them in a variety of the available media, as follows:
Once an etext is created in Plain Vanilla ASCII, it is the foundation for as many editions as anyone could hope to do in the future. Anyone desiring an etext edition matching, or not matching, a particular paper edition can readily do the changes they like without having to prepare that whole book again. They can use the Project Gutenberg Etext as a foundation, and then build in any direction they like.
Thus any complaints about how we do italics, bold, and the underscoring, or whether we should use this or that markup formula are sent back with encouragement to do it any ways any person wants it, and with the basic work already done, with our compliments.
The same goes for media. We have had a long-standing work ethic of providing our etexts in any medium people wanted: Amiga, Apple, Atari. .
However, now that our etexts are carried in so many BBS's, networks and other locations, it is easier to download the file in a manner that puts them in your format than we can make and mail a disk, so we don't really do that too much.
The major point of all this is that years from now Project Gutenberg Etexts are still going to be viable, but program after program, and operating system after operating system are going to go the way of the dinosaur, as will all those pieces of hardware running them. Of course, this is valid for all Plain Vanilla ASCII etexts. .
Do you want to go through all that again with every book a whole world ever puts into etext?
The value of Plain Vanilla ASCII is obvious. .
We don't have anything against markup. Not vice versa.
Alice in Wonderland, the Bible, Shakespeare, the Koran and many others will be with us as long as civilization. .
This includes the many requests we have for compression in particular formats. There are only two formats we know of that are suitable for transfer to a wide general audience: Plain Vanilla ASCII (.txt files) and ZIPped files of them, (.zip files). Requests for other compression formats must be ignored as they are appropriate only for small portions of our target audience. However, (programmers take note: we will need help) we are planning to put some compression links on our files so they can be transmitted in any of an assortment compression formats on the fly. i.e. we should be able to generate any kind of file asked for, but we can keep only one copy of each etext on our servers. .
Different readers, different platforms, and different applications have different requirements!
Some uses want a format which is compact as possible. Some focus on readibility (switchable fots, etc.) Others -- facimile-style releases -- emphasize that the copy should as closely mimic the original work as possible. Formats can emphasize the syntactic structure of the text (sentences, paragraphs), or the structural qualities (line breaks, pages).
Even in their paper forms, books have different formats for different uses. Libraries prefer hardcovers, with durable bindings. Travlers prefer paperbacks, with small and light pages. Collectors pay extra for special editions, with quality supplies. Some readers prefer large-print copies, abridgements, or books on tape (in a choice of cassette tape or compact disc!)
Any format makes assumptions, and deletions. It's perfectly fine to have a multiplicity of formats. If its useable, and reasonably priced, people will buy it.
For me, the major hindrance to e-books is the price. Since there is no associated cost of the materials (paper/cardboard), printing, physical transportation, stocking space, and delivery, e-books should be [i]cheaper[/i] than physical books. But many of them are priced the same, or even high (you can check this at Amazon.) what's up with that?
Why not just use SVG?
My other account has a 3-digit UID.
... is paper. Seriously.
The nice thing about a book is that it doesn't have a power switch - it's actually relaxing to sit there and read it.
If it were possible to obtain a high speed printer capable of printing out "e-books" in the same form-factor as a normal book (ie double sided pages, standard size, neatly bound) then I for one would pay for *lots* more books (and paper, and ink.)
I'm still amazed that the whole of the business world is happy to accept MSWord .doc as the standard to store virtually all of their documentation. I don't think the film industry would be happy standardizing on .avi or the music industry on .wav, so why doesn't the business word get it's act together and accept a better format than the crappy .doc?
Take a look at this - 1dok.org - an open document format
Is Project Gutenberg and a Palm Pilot.
I would like to put up a server to serve up Gutenberg, etc. a page or so at a time for low-end WAP phones, with simple indexing and serching capabilities. The simpler cell-phone is what I really always have in-hand with good connectivity when I would like to read. Palm Pilots never seem to have enough storage to keep whole books or widespread connectivity.
Ha anyone done this? It should be popular and not too resource-intensive.
Take 2 minutes and read this article from RMS
Right to Readplease proff read !
You have to be careful. Half of you are saying "I won't use this until e-books are as pleasant as paper books" and half of you are saying "why not use the standards that are already there? Just make the device do everything."
Don't you see these are at odds?
To make e-books as pleasant as real books, you're going to want to make them thinner and thinner in profile. You're going to want to make them run on a single lithium cell battery or AAA. You're going to want to drop all of the interface but the forward, back, and bookmarking buttons. You're going to want the computing device to be as close to nothing as possible, so you can put weight into making the device indestructible like a real book. You want to go to the store, buy the title, and have it just work, or go to Amazon and *know* your desired title is published in that format. That's the ideal, in the near term. It isn't a device that will easily accomodate PDFs and HTML and a number of other standards.
The lack of an ebook standard has thusfar kept me from buying any hardware ebook reader. I would be happy to shell out the cash for one if i knew i could use it with all the books out there.
Won't you be my my neighbor?
Why not HTML?
[FromTheMorning]
Yes, of course some spiffy new format will have other advantages. But it's unlikely to gain quick acceptance. Plain text documents are everywhere, as are readers and other software. There are even online publishers selling text files. In fact, ASCII text is arguably the most successful electronic standard there is!
Ceterum censeo subscriptionem esse delendam.
It's all about the killer app -- that's not out there -- to break the inertia of the market. Many of the responses so far have indicated little willingness to give up the print ... and why should they? Why should I? I work for a company on ebook related projects and even I just don't read ebooks. I prefer print and I have no reason to change.
Maybe it's the target market that's the problem. Maybe mass-market consumers are the wrong people to convert first. Maybe it has to be the school/library market or the business market first. Wish I had some answers!
There's a hell of a lot to be said for simplicity.
Which correspondence college offers that and how much does it cost?
/. I'm a Desktop Folder Manangement Professional!
Ebook expert....
Yo
I'm also a MSc in Network Ping Techniques. I can ping with one hand tied behind my back. My Masters thesis was whether gnip would work equally as well as a ping program. Turns out not. Stupid Command not found.
Blah. Karma Killaz!
Someday, I'll have a real sig.
Ogg, used as en encapsulating format, allows you to put ANYTHING (Divx,SVCD,MP3,MP4,WhatTheHell) and have it used as an ogg file.
That's how men Fansub groups makes releases including a 4 subtitles choice with a nice XVID compressed video stream.
Now, I don't say AAC don't do that. Just thatOgg is quite a universal standard.
Also, the point is CHOICE. I want to choose the format I want and play it wherever I want, not Wherever I can..
It takes 40+ muscles to frown, but only four to extend your arm and bitchslap the motherfucker
I thought SGML solved this problem years ago.. (except for the copyright-driven copy-protection schemes that seem to be in vogue with the profit-hounds) However, I don't think copy-protection is absolutely necessary. Publishers have made billions selling regular old paper books for years with no "copy-protection". Even the advent of easy copying with XeroX machines didn't kill the profit. What makes the same content in a new medium suddenly worthy of copy-protection ?
...Latex and Ghostscript revisited.
Can we keep inventing more readers for specific uses?
What's next? A new text reader for Man files that can only be read by one single reader and is hard to port to different text formats?
Dolemite
______________
Save the World! Use a Quote!
I have developed a higly sophisticated format for storing books in a computer file.
.TXT extention.
Each character of the book is to be enciphered to a byte. I reserve the first 32 codes (0-31) for various system function characters. The next 32 codes (32-63) encipher the space character, various punctuation marks, and numerals. The next 32 codes encipher the capital alphabet and a few more punctuation characters. With the simple use of 00111111 binary mask 'A' maps to 1, 'B' maps to 2, and 'Z' maps to 26. Quite clever if I say so myself! Naturally the next 32 codes encipher the lowercase letters in the same manner. Using the very same 00111111 bitmask you find 'a' mas to 1, 'b' maps to 2, and 'z' maps to 26! Ingenious, isn't it?
To ensure compatibility with legacy computer systems values above 127 shall not be used.
I call this encoding Advanced Storage Cypherment Input Ideal - or A.S.C.I.I. Any file utilizing this encipherment is a Tagged eXchange Template. These files may be identified by the use of a
-
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
And how about images?.. Do you suggest ascii graphics ? ;)
Everyone in academia uses LaTeX and PostScript, since PDF is silly and HTML doesn't have layout features.
The biggest trick the devil pulled was letting lawyers become politicians so they can write the laws.
I nominate ROT-13!
How long did it take the music industry to realize this would not work with CDs? not long at all, but they do seem to want to undo this universal compatability. But people get greedy and look for ways to force it to work. It will never work.
:D
And yes, html is more than enough. This is a book, not a website. Its about reading words, nothing more nothing less. if you start up with the pictures and sound, people will ignore you in favor of a movie or TV...
I bet someone will propose flash
Docbook PLEASE.
How the heck it's rendered I dont care. But docbook must be the obvious choice ?
This is ever so slightly off-topic, but why is it that whenever eBooks are mentioned, there's a clamor of people shouting "the paper based book is better becase x y z"?
The arival of almost every other new media since the invention of the printing press, has been heralded as marking the end of the printed word. This hasn't happened in the past and I expect the same will be true of the eBook when it matures.
Historically new media have complimented rather than replaced existing ones. eBooks and Monograph literature both have strengths and weaknesses, and there's plenty of room for both to co-exist.
Just to bring myself back on topic a little, professionally speaking (as a librarian) it would be helpful if the eBook industry were standardised to a single open format. I expect it's more likely that we'll see progressive waves of competing formats develop as the technology improves. Perhaps the Open eBook initiative could better expend its energies ensuring that all eBook formats allow for data to be exported and reformatted in some way? so that materials aren't lost as formats become obsolete...
Can somebody tell me why eBooks are better than audio content? What can possibly be done with electronic text, that cannot be accomplished through audio content? I can understand that audio books are much more expensive to produce, but surely we are nearing the point where synthetic computer voices can "read" the original text, instead of having to employ human voice actors. That being the case, what's the use?
Expro: I would like to put up a server to serve up Gutenberg, etc. a page or so at a time for low-end WAP phones
I got bored last Christmas and did this.
www.wapnovel.com (WAP or desktop)
There's also an as yet unused discussion group at:
http://groups.yahoo.com/group/wapnovel
Andrew Oakley - www.aoakley.com
You need a little correction on your tower of babel story ....
The story came about in Genesis when all people spoke the same language. The people had decided to build a temple to heaven (as in a temple that reached al the way to heaven). They were trying to bypass the way that God had set for them to enter heaven. They were trying to "become like/equal to God". God knew that with one language they could accomplish anything so he decided to confuse their languages. At that point God caused people to speak different languages and the construction on the temple was abandoned because people couldn't work together. God did this because he had a better plan for mankind ... not because of jealousy.
As for the babelfish I can't comment as I haven't read "The Hitchikers Guide to the galaxy" but it sounds like a reference back to the confusion of the languages
We don't need no stinking sig!
That needs to be modded Funny.
By my reckoning, MS-Word has had more than 15 different formats in 9 years. I gave up MS-products for Lent a few years ago, but back in the day when my new laptop arrived with MS-Word95 (or whatever it was called), I had to go find MS-Word 6 and resave manually every last word document + metadata in RTF format in order to be able to read them in the new program.
Too bad the data format is tied into specific applications. This is an old archival issue that is fortunately being dealt with by establishing open file formats and cross-platform applications (staroffice, openoffice, wordperfect, abiword).
HTML caused the WWW, it will be interesting to see what happens with file formats for productivity suites.
Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
Can somebody tell me why eBooks are better than audio content? What can possibly be done with electronic text, that cannot be accomplished through audio content?
I can read a page of a standard paperback book in about 30 seconds for fiction, or between 45 sec. and a minute for non-fiction.
Having a voice read that to me instead would be slow and tiresome.
ASA
All employees must wash hands before seeking equitable relief.
A well-known ebook industry expert, Jon Noring
That sounds a little funky to me... If someone is a well known industry expert, would they really need the phrase "well-known industry expert" before their name?
I am a leaf on the wind. Watch how I soar.
Face it, the only reason that HTML/XML/LaTeX or whatever simple suitable format hasnt been chosen as standard (even though its so bloody obvious that it should) is because they dont have DRM.
:)
DRM Capability: Although end-users prefer not to purchase ebooks protected with DRM (Digital Rights Management), publishers are certainly interested in the DRM capability of the universal ebook format. Thus, the universal ebook format must allow inclusion of DRM protection technologies as needed.
Its obvious that the author of that article has no idea about DRM, any slashdotter will tell them that DRM is pointless, if you can read it you can copy it, if one copy gets made a million get made etc.
Even if the publishing companies decide otherwise, everyone else will probably just rip books into HTML or something. Im sure most companies such as Microsoft or Adobe would love to have invented an all-purpose, DRM equipped publishing medium - tough, HTML/plain-text etc. lives on
This comment does not represent the views or opinions of the user.
Fiction - .txt or .rtf is fine. Many of the people who refuse to consider ebooks are referring to reading fiction titles. The technical aspects of an ebook interfere with their aesthetic enjoyment of the story. As my wife would say, "It's not what I'm used to." To make fiction viable as an ebook you'd have to get the device down to a bare minimun of size and complexity.
Textbooks - having only one physical ebook with all your textbooks loaded would be very handy. Pictures and graphics are required. Some net-content linking might be helpful if you include wireless support. Students would tolerate a larger device and more complex operation than fiction readers.Reference books - sound and video would be appropriate, as well as linking to other content.
A Slashdot post is not the place to do an exhaustive discussion of the subject and I'm no expert, so you can take this idea wherever you wish.You were 80% angel, 10% demon. The rest was hard to explain. - Over The Rhine
"Math in a song is good."-Linford
The last time I remember a big group of people coming together to create a standard for electonic media, it was called DVD.
Maybe if there is no standard, authors will pick the format that allows them the exact type of access control that they want instead of having a format thrust upon them.
Adobe, can you hear me? Business Opportunity Nocking.
My wife has a small publishing/consulting company that has taken us 16 years -- and a lot of investment and pain -- to build. She works her butt off gathering the content which she then publishes as print products and CD-ROM "ebooks".
She is devastated when she hears from someone that they've copied one of her color newsletters; made a "backup" of the CD-ROM ebook and someone else "happens to be reading it so I thought I'd call with a question"; and otherwise copies illegally (no...we don't have the funds to pursue them). She had an opportunity to publish a digital product in Asia and another in Latin America but these markets are notorious for buying *one* and suddenly hundreds or thousands appear (I could digress with a personal story when I was at a software company and saw this first-hand...but it's too long).
PDF is the best standard right now. Platform support for everything out there virtually; security; but there is no meaningful method of DRM that would protect a small businessperson AND make it relatively easy to move ebooks from device-to-device (I know that I would hate to have to remember codes from dozens of publishers; be locked in to one machine for viewing; or other cumbersome methods).
However, no protection = no incentive. I don't care if you're an recording artist seeing your music ripped off or someone like my wife struggling to grow a business. Why should my bride travel to Europe and domestically gathering content; pay correspondents and photographers; and publish a product in ebook format that is super-simple to copy and distribute?
This is why I'm struggling so hard with the whole discussion about ebooks; copyright; DRM and fair use. So some how, some way, we've got to come up with a solution that offers some sort of universal ebook format that content producers can agree on and users can live with.
My $.02....
I think the Adobe .PDF format is probably the best way to go for eBooks.
.PDF files. And we're talking reading .PDF files, not creating one, which takes a lot more CPU processing power.
I mean think about it: even the relatively low-powered CPU's used on PDA's have enough computing oomph to process and display
Its that damn Mickey Mouse copyright rule, now like 96 years since the authors death. So you wont see much Gutenberg seelctiosn from after the 1920s, unless the author has given permission.
It's all about the price.
Shadow Puppets (hardcover) by Orson Scott Card is priced on Amazon for USD$18.15
Electronic version USD$25.95 (M$Reader and Adobe)
With the e-version the pubisher has no printing costs, no binding costs, virtualy no shipping costs, no warehousing fees, no sales clerks.
Like most everyone, I prefer my books on paper, but there are times where an e-book version would be convenient. But I am not going to pay 4 times the paperback price for the experience.
Best example is in a German English Dictionary - if you choose the word "GIFT" and try to look it up, just what should you find? You might not like what you find.
Then there is sytantical choices, ie: "I record the record" Which is a noun? Can you pretag, or not? What about the "People from the land of Er" - does the text to speach engine say "Urbidum? or is this the biblical land?
These are *REAL* problems and no they are not solved, a trade publisher (novels, serials, magazines) has a different view then a reference (dictionary) publisher.
If your goal is printed paper - postscript or PDF is fine, but to do syntatically correct search - it is horrible.
If the competing "companies" would take a look at Europe and how far past the US they are with cellular adoption, it would seem obvious that having one standard in the end benefits everyone. The US market is fragmented and the costs are much higher... .02 deposited
wordtrip.com
"Right now there's a plethora of essentially incompatible ebook formats, and this format 'babel' is hampering the growth of the ebook industry."
Bullshit.
The problem is that the few people who actually still read books are not likely to be stupid people. On top of that, the people who are reading electronic formats of books are even less likely to be stupid people.
However, it would take rather dim consumers indeed to not see a problem with paying the exact same cost for an eBook as one would in a brick and mortar bookstore for a paperback... and strangely when I go to these eBook sellers online, I see exactly that. "Oh joy! Instead of paying $7.95 for that paperback over an Barnes & Noble, I can pay just $7.95 to download an electronic copy in a format that I probably won't be able to read again in 10 years because the format and it's reader will have been declared obsolete!"
The unwillingness of eBook publishers to see eBooks as something other than a way to increase sales profits by cutting out the middlemen of printing and shipping expenses is what is hampering eBook adoption.I later bought the book in plain analog bio-matter format and I will never trust an encrypted eBook that I can't back up ever again.
BTW, the book was crap.
I'm a big consumer of ebooks for my handheld. I've used several readers and several formats over the years, without becoming particularly attached (or annoyed) by any of them.
:)
What's keeping ebooks from taking off is cost. For contemporary titles, they charge as much for the e- version as the paper version. That's nuts. In a year or so, I'll probably have a completely different hardware (which may recognize the ebook format, but possibly not my authorization to use it). I just won't pay the same for electrons as I do for ink and wood pulp.
So I get to read a lot of 19th Century novels
The ultimate in bloat! Imagine; an XML document describing the vector points and bezier curves required to draw each and every character in an entire eBook! If nothing else, it would push the price of multi Tb drives down.
" >
You don't know squat about the SVG format, do you?
If you took 2 minutes to check the SVG-specification you'd see that you are totaly wrong.
Text in SVG is described as... wait for it... ordinary text. The characters are then mapped to glyphs in your SVG-viewer. Just like in MS Doc, or Adobe PDF.
Just to show you the "ultimate bloat" I'll include an example. If you just want an ordinary plain vanilla book this is all it takes.
<?xml version="1.0" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd
<svg width="25cm" height="35cm" viewBox="0 0 2500 3500" xmlns="http://www.w3.org/2000/svg" version="1.1">
<text x="250" y="150" font-family="Verdana" font-size="55" fill="blue">
Insert book text here...
</text>
</svg>
"First lesson," Jon said. "Stick them with the pointy end."
Pah, Ascii. I want them in EBCDIC.
You wonderfully narrow minded, arrogant fool.
And I'm sure you are aware of the artwork capabilities if you browse Slashdot at -1.
Anything I type that I think may have some value in the future, I always save in plain text. I feel sorry for those who 100-years from now will be trying to figure out how to convert FrontPage or MSWord "HTML" documents into a human-readable format, not to mention the DOC format and other proprietary nightmares.
While everyone may use LaTeX, PDF has become more and more popular for web distribution of papers. PS works fine when you're just sending it to the printer, but because Adobe didn't include PS support in Acrobat, Windows users don't bother.
But TeX/LaTeX has the advantage of being pretty much immutable, second only to plain TXT on that count. The standard hasn't changed since, what, 1982? Hopefully we'll be able to process the same documents with the same tools fifty years from now.
I think the important distinction between, say, Word format and TeX is that TeX is a piece of systems programming---it performs a well-defined task in a well-defined matter, much like lex or yacc do. An attempt to add 'features' is nonsensical. (Though functionality can be extended through the use of, say, pdftex.)
--grendel drago
Laws do not persuade just because they threaten. --Seneca
As i sit here on my ipaq i have no books & no good format for them. duh i don't want drm on a $40 50k file - but i want a friendly program - w/ features like autoscroll - w/ the universal format.
-- Whee
Why? Well, an ASCII text version of a printed book is really more like an analog facsimile than is a version in XML that has been tagged for structural features. Leaving aside issues of non-English characters, illustrations, and unusual typography, ASCII does a relatively poor job of capturing all of the structural conventions that exist in printed books. Books have copyright pages, tables of contents, chapter titles, subtitles, bylines, epigraphs, block quotations, footnotes, running headers and footers, citation lists, etc. ASCII can provide rough format equivalents of some of these, very poor equivalents of others. With an appropriate XML tagset, however, it's a relatively simple matter to tag most of the structural features of a book and then use stylesheets for presentational rendering. That's the whole assumption of the Open eBook specification.
Suppose you're in a world where all printed copies of Huckleberry Finn have been lost. You have two CD-ROMS that somehow you've managed to decode so that you can read the files and interpret their character sets. One of them contains the Project Gutenberg etext of the novel, an ASCII transcription. The other contains an XML encoding tagged according to a DTD from the Text Encoding Initiative, the current best standard for encoding literary (and many other) texts. It has all of the textual content of the PG version, as well as some that's missing (like the table of contents and the copyright page from the transcribed edition, which the PG version unaccountably omits). XML tags mark all the line and page breaks of the original. In addition, there are tags to mark quoted speech, unusual typography, words in foreign languages, and other significant features of the original. The CD-ROM contains the DTD used along with documentation on the tagset.
In this imaginary scenario, even if all of the XML documentation were missing it would be pretty straightforward for 31st-century programmers to strip out the tags and recreate the ASCII transcription. But with the documentation, it's possible to reconstruct something much closer to the original than the plain-vanilla PG version allows. And suppose your 31st-century archaeologist found a trove of TEI-tagged books on CD: with all of the structural tagging and metadata about authorship, publication dates, etc., a 31st-century librarian will be able to plug all of the books into a cataloging system that allows sophisticated searching. If instead you had a trove of plain-ASCII books, the best you could do with the collection would be simple full-text searches.
Leaving aside the sci-fi scenario, the reality is that our documents, over the next few decades, will move from format to format and be used for purposes that we can only guess at right now. Of course plain ASCII, or even proprietary formats, will be better than no documents at all. But the work involved in converting them will be a lot higher than if they are tagged in a well-documented, structured markup language.
Incidentally, there's already at least one project underway to take Project Gutenberg texts and add minimal XHTML or XML markup to capture structure and make them more readable via stylesheets. The Open eBook specification is just a more sophisticated way of doing the same thing.
You may not have noticed that libraries are a target for the industry. They have never been satisfied with the status quo on the public good of lending libraries. The DRM/DMCA Catch 22 is bespoke for this purpose and you can bet your ass as soon as they get comfortable with the uptake of TCA and pals, they will pull print publication in a heartbeat. The only remaining value-adds for the publishing industry are metadata, catalogs and editorial. Libraries do the first two better, both more openly and more broadly. And simple software agents can do taste-based referral and demographic aggregation better than either libraries or publishers.
love,
Bunky
illegitimii non ingravare
Documents written by researchers are available online mostly in both PDF and HTML formats. For instance, if you search around medical journals at www.pubmed.com, most full texts that are available are first shown in HTML with a link to download the PDF if you wish.
And I generally get the PDF if it is available - everything is embedded nicely into one file and it looks identical to the printed version.
Thinking about it now, I know some science journals that have the full current journal available on the PDF format. I wonder why more magazines and newspapers don't try and do something like that?! I think I would subscribe to a nice PDF version identical to the printed version of some magazines I get.
I recently tried reading an eBook on the Palm. It was presented similarly to a PDF - fixed pages with fixed text in fixed positions. This totally ignores the sort of advantages an electronic device has over paper!
I can't shrink the font size to get more text on the screen, because I'm viewing one page at a time and it's always got the same text on it. Even worse, I loaded the same eBook in the desktop version of the reader, and I was still viewing the same amount of text at once, in a ludicrously large font.
Fixed layout is bad enough when shoehorning Letter size PDFs onto A4 paper. Taking the same approach to eBooks which will be viewed on a wide variety of devices is just retarded.
Either that or if Pictures Etc MUST be shown use HTML. PDF sux 'cause you can't cut and paste pics and text as easily and you need a special reader.
Eat at Joe's.
Though it might help, a universal format isn't what's hampering ebooks. It's price. I refuse to pay full price (and sometimes more!) for an etext of something I can get on paper; especially when I only get the etext.
Halfprice, maybe even quarter price, compared to deadtree is what ebooksellers should be going for...but if I still have to pay fullprice and I don't even get my dead tree, I'll pay the same for something slightly more tangible.
Now I would pay a couple of bucks (ie $2) more for a deadtree book which includes the etext.
-- Waht? Tehr's a preveiw buottn?
>Although end-users prefer not to purchase ebooks
>protected with DRM (Digital Rights Management),
>publishers are certainly interested in the DRM
>capability of the universal ebook format.
So although people would prefer not to buy books with this stuff, we're going to put it in there anyway. Whatever happened to listening to your customers?
Commoditizing* the hardware with format standards can help, but the hardware needs to become a lot more capable too.
I've yet to see an ebook viewer that can present (at a readable size) all at once even what would be one full page of text in a pbook; every pbook presents *two* at a time. You can do things with two full pages that would be ugly and unusable with only half a page in view.
A pbook will function acceptably from candlelight to full sunshine. It will operate indefinitely without power. It will operate *underwater* if you must.
I still haven't seen a bookmark scheme that works anywhere near as well as sticking my fingers between the pages.
I can afford to have five books open concurrently and spread out around my workspace, which is a frequent need. Who's going to own five ebook viewers at the same time?
-----------------
* Note that word. That's why it won't happen anytime soon.
If you read the entire story above:and followed the links, especially the last link. And followed the links in that article to their final destination, [this is a lot easier if your browser supports tabbed browsing
BTW, the critics site has lot of texts, many of which can be read to young children in the presence of their grandparents, available for downloading in various formats for free as in beer.
With CSS there's not a lot HTML can't do with layouts.
No free, mature implementation of HTML and CSS can render a font not installed on the user's machine from outline data stored in the document. Mozilla has a bug on this open in bugzilla.mozilla.org (bug 52746), but it doesn't look like it's going anywhere. And no, "just replace with Helvetica, which is installed everywhere" is not an option because Helvetica for every non-Latin writing system is not installed on every reading device.
Will I retire or break 10K?
Publishers of novels first published since 1923 are in general not willing to publish in a text format without a digital restrictions management wrapper.
Will I retire or break 10K?
What I mean by that is that for many books with complicated layouts (including my own free books), it's simply not possible to reflow the text automatically. Consider an illustrated science textbook, which is the kind of work I do. There's a lot of hand-tweaking involved in getting everything laid out on the pages in the best possible way. And my books' layouts aren't even that complex compared to a lot of the big commercial textbooks out there. Some slashdotters may have used LaTeX to write academic papers, so they'll know how LaTeX tries hard to flow the text correctly, but ultimately it doesn't always do what you want, and either you or the publisher ends up doing more tweaking.
The solution isn't that complicated: if a publisher wants a book to make an electronic book available in both a a typographically rich version and an adaptable version, they can create both a PDF version and an HTML version. Of course, this is really an answer to a question that the publishers never asked. Most publishers don't want open formats, because open formats won't allow them to continue to steal away the rights of end-users, such as the right of first sale.
Find free books.
Double ROT-13 is much better.
That's because a lot of
The main place ebooks beat paperbacks are in density, you can have many ebooks in a single reader, and in searchability and bookmarking. Many ebooks in a single reader doesn't help much if your battery is only good for 4 hours. Searchability doesn't help much either, for pleasure reading, but is very useful for reference books and school books. Bookmarking is good for starting up where you left off, but that is already a solved problem in paperbacks, it isn't a new capability.
The problem is that, at this point in development, ebooks are a *worse* way to read just about anything people buy for pleasure reading.
When the technology develops sufficiently to solve at least most, if not all, of the issues I noted above, I'll look more at ebooks... but until then, I'll stick with paperbacks, because they are *better*.
I'm not so sure about this. Book publishers like the idea of ebooks, because they can build in more limitations than are legal with current physical book publishing. EBooks kill "right of first sale" because they license the work, they don't sell it. EBooks remove the publisher's major gripe about "more than one person reads each copy of a book/magazine". Publishers like ebooks because they can severely limit readers' rights with them, and think they can do it legally, or at least with fewer court hassles.
It's a bad deal for readers/consumers, but hey, that's part of why book publishers like it. It changes the power relationship in the publisher's favor.
Hmm... on reading through this... this has *got* to be -1, Redundant. At least, I hope it is. Karma to burn...
This is my sig. There are many like it but this one is... Oops. Frank, I've got your sig again! Where's mine?
Can anyone on /. see that we need a standard format for ebooks other than txt? Hey... out there are pleople trying to make a living writting, and releasing their work on a txt doesn't cut it (well maybe if you want to be the bitch of kazaa, edonkey, or every p2p out there)... Maybe there is an automatic hate for DRM here (ms style) but look at the sucess of DRM on openning markets of music that wouldn't have the chance to get known (Apple and his future arragement with small labels)... DRM is not bad per se... is the implementation of it (look at itunes for an example...)
What's wrong with plain ASCII?
Stick Men
they both suck.
Why doesn't anyone use BZIP on windows?
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
Text? You mean ASCII? Sure, that might work for English, but what about other languages? You'd need it to be unicode.
Also, I'm sorry to point this out, but normal books don't just contain text. Many books contain illustrations or diagrams as well. At the very least, most books will contain italics somewhere.
1. As the article notes, we need a format that is typographically rich. That means Unicode and style instructions. Maybe CSS3 or XSL:FO could do this, I haven't looked at the details of either closely enough. But plain text is NOT sufficient.
2. As the article notes, we need a format that is adaptable. Adaptable means that the style information will adapt to the reading environment: both hardware (PDA, ebook reader, desktop monitor, projected on a large screen for an audience, audio, braille, etc.) and wetware (people who are blind, people who are colorblind, people who need large text, etc.). In order to be able to properly apply the styles, you would therefore need semantic markup (meaning an *ML, like DocBook or better TEI) and the ability to create multiple style sheets, or apply user style sheets.
3. As the article mentions, DRM is a necessity (we may not think so, but the authors and publishers do). It seems to me that an open PKC could be used: you exchange keys with the publisher, and you can decrypt the document on screen, but the software doesn't allow you to export unless the publisher's key permits it. But this would probably be pretty hackable, wouldn't it?
Paper, printing, and binding are typically only a very small percentage of the retail price of a book. For black-and-white upper-division college science textbooks, ther paper, printing, and binding cost is typically about $10, while the book retails for about $100-$150. Printing is very cheap once you get past the initial setup costs. A paperback bestseller at $7 probably cost less than a dollar to produce, because of economies of scale.
I think what's holding back the adoption of e-books isn't price, it's quality. A book printed on paper generally provides a much better value for the reader, especially because reading long texts off of a low-resolution computer screen is so unpleasant.
Find free books.
Why not use Latex? It's mostly plain text with command to format it.
-- Quidquid latine dictum sit altum viditur
Well there are a number of reasons why eBooks have failed.
First and foremost is technology. Not just the software but hardware too. You simply cannot beat a stack of dead tree pulp squashed flat for durability and portability. People simply don't want to buy anything that disappears when their hard disk crashes!
Secondly, DRM. Apple are starting to get the DRM idea right by allowing any number of CD burns and copies on a certain number of machines and iPods. Unfortunately, no book seller wants to agree to such liberal terms (with the exception of O'Reilly. Again, if your hard disk goes west you have to reapply for a license, yuk! In general, people regard books as property once they're bought. The IT idea of licensing simply doesn't wash with the masses.
Thirdly, and back to technology again, hardware and standards. Without a single, simple easy to use and very very low energy consuming platform, eBooks simply won't take over the world. TFT isn't good enough and the sheer delicate nature of any electronic gadgetry makes it a no no right from the start. The current standards and level of DRM handcuff-ness is just a joke at the moment. There is no single compelling standard and all tie users too much to one copy or perhaps two if they're lucky... forget it!
I worked for a big bookseller in the UK and we took a long hard look at eBooks (at great cost I may add!) and simply decided that the technology wasn't there yet and that the DRM implications make it prohibitive to implement in any sustainable manner. Perhaps Amazon and B&N can make it work for them, but I suspect that their sales are somewhat less than they would like.
Having said all this, I still think there is a chance.
Flexible displays that can take a pounding in a students back pack would be a good start. Make them stupidly energy efficient, add high capacity static memory (no hard disks!), single, simple, DRM liberal format and bring the price into the sub $100 range. Then we might see something happen. I'm not holding my breath though!
I would love to see a universal format. It needs to be rich enough to fully express a book: pictures, maps, small caps, everything. It should be compressed so the book can fit better on a PDA. It should be able to read books without putting a huge load on the relatively weak CPU of a PDA (this probably rules out PDF). And it needs to degrade gracefully: if a PDA chooses not to implement part of the standard, or if a new version of the standard comes out, the document reader software should be able to skip over the unknown parts of the document.
The best part about a standard format is that you would be able to choose your reader, where today you may not be able to. I love reading Baen ebooks (I have bought many more Baen ebooks than actual paperbacks in the past year) but I hate the MobiPocket reader that I must use for Baen ebooks. But Baen has few choices: simple DOC format isn't rich enough for their purposes, and if they use something else (such as iSilo) then users are just locked in to a different standard.
(Baen does offer mulitple formats, including RTF. I ought to see if iSilo offers a converter that can deal with RTF.)
steveha
lf(1): it's like ls(1) but sorts filenames by extension, tersely
Why invent a new format at all? What's the matter with HTML?
Ogg, used as en encapsulating format, allows you to put ANYTHING (Divx,SVCD,MP3,MP4,WhatTheHell) and have it used as an ogg file.
So does MooV (*.mov), the QuickTime container format that underlies MPEG-4.
That's how men Fansub groups makes releases including a 4 subtitles choice with a nice XVID compressed video stream.
MooV does that too.
Just thatOgg is quite a universal standard.
The Ogg container may be a "widely used format with a public specification", which is just fine by me, but some people reserve "standards" to refer to formats defined by a publication of ISO, IEC, IEEE, ECMA, ANSI, DIN, or some other organization recognized by national governments as a standard-setting body.
Has anybody worked with both Ogg and QuickTime? Which is more capable?
Will I retire or break 10K?
How does one represent pathnames such as /opt/kde/ if /foo/ emphasizes? (Use and you might as well use XHTML.)
How does one represent mathematical formulas such as a*b*c? (Use MathML and you might as well use XHTML.)
I feel sorry for those who 100-years from now will be trying to figure out how to convert FrontPage or MSWord "HTML" documents into a human-readable format
HTML Tidy works for me.
As long as English is readable, the XML, XSLT, and CSS specifications are readable. As long as relevant specifications are readable, documents written to those specifications are readable.
Will I retire or break 10K?
if worse comes to worse, just pick whatever the reader's default sans-serif font is.
Read what I wrote: "Helvetica for every non-Latin writing system is not installed on every reading device." What if the reader's default sans-serif font has no glyphs for a given language? What if the reader's operating system has no support for a given language?
Will I retire or break 10K?
I am sorry that your wife chose a business that had a limited lifespan. 16 years ago was 1987, so it was not obvious at the time that it was limited, but she needs to find a new focus of the business is to survive.
"Publishers" had 3 purposes:
1. Searching: Finding interesting material.
2. Editing: Fixing the material to conform to stndards.
3. Distribution: Spreading the material geographically.
1. Searching for good material is still difficult, but only because most creators do not post their material on the web. Someone will make much money by creating a website where new material can be rated by users. Amazon and others have a good start, but they only rate materials after the materials have been selected by publishers. How can they rate my book when I have not made it available?
If authors release their works on the web, publishers could make money by providing rating systems and headlining a book per month. Can the publishers make money with this type of service? Will they pass some of the money to the authors? Can they make their service so much better than amateur websites that people will pay for it?
2. Editing is still a very useful function, but it is underrated by everyone. Spell-checkers and Grammar-checkers reduce much of the work. Style-checkers are more important.
A friend just self-published a book. He wrote it, then had it printed by a "publisher". I do not know what tools were used, but I asked about editing and he said that the publisher did none. My first criticism was that he used passive voice for the first two chapters and switched to active voice for the rest. Since most people decide whether a book is worth reading during the first few chapters, this could have a major impact on its popularity. A human editor should have caught the issue.
I recently read a book titled "The Mushroom Man". It is in the New Books section of my library, so I assume it followed the normal publication process for books. The entire story is told from the third-person perspective, which keeps the reader from getting involved with the characters. It also has the problem that the main plot is a children's story, but the main subplot is very adult-oriented, with concepts and language that will keep it out of the children's section. This is the author's first book. If the author had advice from an experienced editor, the entire book could have been rewritten from the perspective of one or two characters. The adult subplot should have been dropped. These two concepts would have made the book much more readable, and given it a clearly focused audience.
Another example was "Snow Falling on Cedar" or something like that. It had a good story, but the sections concerning sex and war were the extremely boring. How did that get past an editor? I think the author is famous for his work in other areas, so maybe he was able to bully the editor. Usually war and sex add exceitement to anything; here it was painful.
Publishers could assist authors in fixing books, but who pays? Will they work with an author for free, or will they charge the author? I doubt most beginning authors have the finances to pay for the service.
3. Distibution has been changed by the internet. The internet has reduced the cost of distribution for anything that can be transmitted as bits to almost nothing. In a few decades, publishers will not be able to make any money from distribution. Yes, this includes music and video as well as written words. Publishers may be able to provide central places for downloads, but they cannot prevent others from duplicating their work. I have not even found the questions that could make efforts at distribution profitable.
I am asking more questions than I am providing information. All industries based on providing content that can be reduced to bits are in danger, and need to transform to survive. Some have already disappeared.
A friend worked for AMEX business travel. Her job was to find the bes
I spend my life entertaining my brain.
Baen publishes E-Book in various forms - even plain HTML. Cheaper than paper books. You even get them earlier than the bookstores.
http://baen.com
http://www.webscription.net
Has there ever been a format that hasn't been quickly cracked and put on P2P? That it will appear on P2P isn't a good reason to use DRM, because it will anyway. Hint: fair use == less money, thus DRM.
Litigious bastards
It's called SGML. Been around a while. Whatever they come up with, it should be based on SGML.
Insert witty sig here.
Many publishers and self-published authors require their published content to be distributed with DRM protection. It is certainly possible to build into the native OEBPS Publication wrapper a DRM protection system. Microsoft LIT, as previously mentioned, is an excellent example proving this assertion since LIT is a DRM-protected wrapper of essentially an OEBPS Publication. No more need be said on this.
Hasen't anybody told him LIT has been cracked? Can't he do a web search?
I can't imagine his 'XML searchability' and multi-device interface (How would a program tell the difference between a serial braille printer and a 286 running a term program, for example?) are going to make the document text any more secure to being lifted, honestly.
Untill he figures this out, it seems like the article can be basically summed up as 'You should publish everything in XML, and call it by my nifty new name.'
This is such a key point. EBooks are expensive gadgets. They have to do something that paper books don't. Reflowing the text, for different screens sizes and font sizes isn't a killer app, but leave it out and you've just thrown away a major feature.
PDF is such a nineteenth century format. I find it infuriating.
1. Read replies to this post.
2. Think for a second.
3. Get used to Babel.
Any idea that depends on universal agreement is born doomed to failure.
It would be nice to have an eBook standard. It might happen, but I expected the DVD standards war to be over two years ago and I was foolish to do so.
I guess the best we can hope for is an active community writing conversion software. Which of course means ripoffs. I don't know that it's a winnable battle. I hope so.