Yahoo Competes with Google in Book Scanning
UltimaGuy writes "A consortium backed by Yahoo has launched an ambitious effort to digitize classic books and technical papers and make them freely available on the Web. The company is partnering with the newly formed Open Content Alliance, which aims to offer PDF documents of books to the public at no charge. Consumers will be able to search the contents of the Open Content Alliance's database and download the entire content of any work, such as a scanned copy of a book."
Someone start up a "Open Content Alliance" for music... then we can digitize and share it all we want.
I can't wait to read the whole book on one page.
Yahoo services are often slow, riddled with annoying ads and cluttered.
Oh yay. Competetion. This might stop Google from becoming a monopoly, and make people less concerned about that.
Oh damnit. The human-confirm thing was fucked, so I missed first post >_
I find it interesting that in all the articles I've looked at today about this that only one has mentioned Project Gutenberg. Naturally, I can't recall which source it was...
I'll turn into a supernova and burn up everything. Well I'll turn into a black little hole and you'll turn into string.
The editors should talk to each other more. I mean, I don't mind seeing two different takes on the same story, but I'd be pissed if I had bought the rights to see a story early.... only to find out it was a dupe.
FanFictionRecs.net
I liked the idea the first time I heard it - back when it was called Project Gutenburg. :P
Getting an author's permission before scanning and indexing their copyrighted works? What a novel concept.
...that we don't?
It seems to me that they're throwing money at an unnecessary application. Does Yahoo know something that we don't? I'd venture that they're starting with PD books to shake the bugs out of their platform so the app works well in round 2.
Round 2 (current commercial books) won't occur without a massive copyright law change or support of the Author's Guild.
Hmm.
that Yahoo! picked up the pieces and will succeeded where google failed miserably.
16k ebooks to choose from today, more to come, no Google, no Yahoo.
http://www.gutenberg.org/
An OS solution would be better would it not? 10 years down the road when everything is in PDF format, whose to stop them from charging us to view material in their format?
LINUX ONLINE POKER: Linux Poker
I almost panicked after seeing we had gone so long without a Google-related article.
The opt-in rather than opt-out strategy is really what Google probably should have done, but it'll be interesting to see who comes out as a winner, Yahoo or Google, in all of this.
In the US, books published after 1922 can still be public domain if the author was American, it was originally published in the US, and the copyright was not extended at the end of the original copyright period. Google Library does not seem to be making an exception for this, will OCA? Project Gutenberg does.
Actually this won't "Upstage" google in any way.
FTA:
all the content will be made available so it can be indexed by all the other major search engines, including Google's
Yahoo is just going to scan, scan and scan. We all already prefer google's indexing and searching and cleaner interfaces, so the only thing Yahoo! will accomplish by this is help google print along, sheilding all (other) copyright law suits. Once the stuff is online, we all know that Google-bots will be all over it "like a fly on a pile of very seductive manure (Zapp)"
Excellent.
I just hope publishers realise that in this case neither google or yahoo is trying to be their best friend.
Will Yahoo provide sorted or unsorted lists of books that China's Internet uses view to the thugs that run China?
Project Gutenberg could sure use the help that Yahoo and Google are throwing a these projects.
Why do these companies, especially Google, have to go this route? Why couldn't they just help a pre-established project?
Seems like the crucial difference between Google's efforts and the OCA(Open Content Alliance) is that Google has a "opt-out" policy for copyrighted material, while OCA specifically requires the copyright holder to contact them and essentially allow them to use the material.
The OCA likely won't be sued by the Writer's Guild like Google, however, for searching material Google will likely be better being that Google's search will likely include a massive plethora of copyrighted material, legal or not. Also, it seems that Google themselves will be allowed to use all the material from the OCA into their project as well.
Why can't companies come up with some cooler ideas? Why ape each other? First Google and hten Yahoo, Sure MS will also want to play.
Scott McNealy to Michael: "Suck my Sun!" Michael Dell to Scott : "Lick my Dell!"
There's a slight difference between an 'Internet-based library' and 'searching inside books'.
Man is a slave because freedom is difficult, whereas slavery is easy.
Google Print's goal is to allow people to search book content, WITHOUT giving them the content of the book.
For example, searching "Zoroastrianism" would return a list of book titles on the subject, and links to purchase the books in question. You CANNOT download the content of the book!
The OCA (The group Yahoo just joined) is an opt-in, full content hosting project.
Searching "Zoroastrianism" would return a (much smaller) list of books, with the *full* content of the book available for download with the explicit consent of the publisher/author!
You will be reading the content to Moby Dick on Yahoo and in the top right it will say, "content provided by Google."
Click here or here.
I am getting tired of the big internet companies straight up copying each other. Yes, it means that products slowly get improved over time (eg. yahoo mail -> gmail -> yahoo mail) but it also means that the companies aren't innovating enough. Yahoo is spending time and money on providing a product that is already offered. We would probably be better off if they spent the effort on providing a unique service - like scanned magazines or something.
I think this is only good for short documents....
I think if I read Finnegan's Wake or Hawaii on-screen, my eyes would bleed and tear themselves out of my skull. (not to mention downloading PDFs for days.) In that case, I'd much rather just go buy a paperback for $3. Then I don't have to read on-screen, the pages are conveniently sized and bound, and I can take my book to places I wouldn't bring a laptop. Like a bubble bath, bed, or my commute to work every day.
will it take to download that PDF of War and Peace?
Ignorance is not a crime; neither should it be a way of life
Congress control $ = inmates run the asylum
I hate to see a University pander to commercial interests, while at the same time, welcome commercial interests such as Yahoo. Money talks, and I'm sure UC is being paid a lot, but libraries are supposed to be public resources too, not exclusive profit-centers :-(.
Reading between the lines for this proposal we seem to have another print.google.com, except it will not index a huge number of works whose copyright holders do not "opt in" to the program. The advantage to this is that it may make some copyright holders feel better about the whole thing and, hopefully submit entire works to be viewed by the public. It is also possible that Yahoo is worried about the legal issues and want to wait and see how google weathers any legal challenges.
From a purely technical perspective, this system seems inferior in most ways. It only displays full text and does not give copyright authors the ability to show only an excerpt, or a set number of pages. Although, providing them as PDFs is nice. I wish Google would add that feature for works that are shown in their entirety. In general though, if I'm looking for particular data I don't see why I'd use yahoo which will have a much smaller index of work.
Does anyone else find there is no way to read a PDF with the scroll buttons (mouse wheel, etc.) without the viewer constantly breaking your flow by jumping to the next page?
This goes along with the concept that for an electronic format, I do NOT need a sentence (or even worse, hyphenated word) broken up by two inches of top and bottom margin filled with page numbers, miscellaneous watermarks, repetitive titles, etc.
PS. This being flamebait does not make it false.
ASCII text and HTML can be edited by the user to correct typographical errors, add notes, insert pictures (in HTML versions), and so on. You can't do that with PDF. So I prefer Gutenberg's more versatile formats. Also, plain text and HTML are readable with a much wider variety of software, which may become important in the future when PDF ceases to be so popular.
Google maintains its scanning represents "fair use" allowed under the law because it only allows Web surfers to view excerpts from copyrighted books.
Soon after Google Mail was introduced, somebody created a SourceForge project that lets you use Google Mail as a database. How long until somebody releases a "Bookripper" app that assembles a whole book from search extracts? As I understand it Google displays two pages at a time (or wait, that's Amazon, but I bet they're similar). All you would need to know is a quote from a book's first page as a seed, and you should be able to grab the whole book by doing a series of searches using text from the second page returned by each search. The trick would be to knit the pieces together and eliminate the overlapping text. Seems almost trivial. Another possibility would be to search for random words and look for overlaps between the results, assembling them like a linear jigsaw puzzle until there are no gaps.
If there was ever anything we need competition in, it is search engines. Whether project Gutenberg needed any competition is another question.
I don't see a lot of similarity between this project and the one Google is doing. Open versus proprietary. Free (free as in speech) information versus non-free information.
In the case of other search engines Google has put out of business (Altavista, although the web site still exists, no longer exists as the more-advanced search engine it was using the facilities of others), the competition did not make them improve at all, beyond their insight to make searching a popularity contest instead of an accurate search.
Now this is a right step towards making book contents searcheable online. I will hate to see one company like Google copying and caching all books in its massive cluster of servers. I know that Google kool-aid that "we are about general good" is running deeply in the veins of slashdot types.
Since when was scanning books from libraries and making them available to public for a profit was considered "fair use"? This kind of stuff is done by pirates. Go to the major cities in China and India and you will see piles of copied book in the streets all sold for 1/10th the original price without giving anything back to the authors. The pirates can say that they are doing a favor to the authors by driving them out of obscurity.
The message the alliance is sending out to the authors is
- we are not for profit
- we will scan your book only if you want us to do so
- your book will be indexed based on your approval and copyright agreement with you and the publishers
Compare this to what Google is telling the authors- we will scan your book, fill a form and tell us if you don't want us to do so
- we will take sale comissions from amazon, buy.com, bn.com, etc. without sharing anything with you
- if we show ads, we will share the profits with you
- we will show excerpts of your book, so if a researcher is researching on a topic he can find what you have written about a topic without ever having to buy your book, too bad, heh heh, write a fiction book dude
- we will cache your book in our servers and only we will reserve the right to profit from your scanned book
So much for do no evil. Kudos to yahoo for bringing the open content alliance, gutenberg, and other similar projects to limelight - these are some really nice collections that were hidden by the noise created by 'google print'.I've read through the first few posts, and people really don't have a clue about what this is all about. "Open Content Alliance"... It means what it says. Open f'ing content. Let there be content available to the masses... Is it more important that I can get a snippet from some copyrighted text, or that millions of children can read Alice in Wonderland with all it's wonderful illustrations.
This is beyond PDF or anything like that. Some people want PDF, so Adobe will make them. Some people want decent OCR versions, perhaps to go into Distrubuted Proof readers or into someone's text-only PDA. It's ALL possible. This is NOT an exclusive club, it's an INCLUSIVE community that is dedicated to Open f'ing Content.
Why don't you people get it. By allowing people to have full texts of some of humanities greatest works we are doing more than a few snippets of the latest Ken Follet novel... a lot more.
It's bigger than Yahoo or Google. Yahoo is NOT an also-ran.... The Internet Archive has been scanning books and hosting Milloins Books project texts as well as Project Gutenberg texts for a long time... long before Yahoo or even Google were in the picture. Ignorant comments made here suggest somehow Yahoo is following.
I say Yahoo is leading by embracing a project that by definition is bigger than themselves. Good for them.
With all this competition it won't be long until they start indexing comic books, then I'll finally be able to find that awesome Donald Duck comic I read back in 1996 and couldn't ever find again.
I don't understand this. My favorite book was published in 1956, and the author died just 7 years later. He had no offspring and he outlived his wife. Now would someone please explain to me why someone was allowed to extend the copyright and why the work isn't yet in the public domain?
...just my 2 gil.
Yahoo! Advanced Search at http://search.yahoo.com/web/advanced?ei=UTF-8 allows one to search for Creative Commons licensed content.
"Does anyone else find there is no way to read a PDF with the scroll buttons..."
... regardless of the application and platform used to create it." Blame the creators of that particular pdf file if you don't like the headers, footers and margin size. When I make pdf books to read on the train...I just finished Dream Quest of Unknown Kadath by Lovecraft...I open the original ascii text file in Word, make the top & bottom margins tiny, change the font to something tolerable and export it.
No. I just set it to Continuous. See those four icons in the lower right corner? (assuming you've got a recent version) Play with those. You want the second button from the left
"This goes along with the concept that for an electronic format, I do NOT need a sentence (or even worse, hyphenated word) broken up by two inches of top and bottom margin filled with page numbers, miscellaneous watermarks, repetitive titles, etc."
Well, the whole purpose of PDF is to "preserve the look and integrity of your original documents
That's what I thought too when I read the summary. The difference is that they also have an "opt in" program, wherein any publisher can have their works indexed upon request, without being redistributed in full.
Yahoo is worried that they are falling behind google, and now they are starting their own book scanning project to compete. They have to find some way to market as being better than google, so they are making it a "publisher friendly" project. It's great marketing in that it removes the focus from the fact that they are a year behind, and emphasizes that google is the evil one (although allying yourself with publishers is only slightly better on the evil-meter than allying yourself with the RIAA).
Personally, I think this is great news. The more digitized media we have the better. Now we will have (more or less) the same content from two different sources. Redundancy is a good thing. Hopefully Microsoft will get in on the effort too (Woah! That is a trip! If you've never said something good about Microsoft, you should try it sometime just to see what it feels like!).
Qxe4
duplicate of what? I can't find any other article on /. about this, and the press release was embargoed until this morning.
I am not specifically familiar with US copyright, but copyright in most jurisdictions extends for 50 or 70 years after the death of the author. In your example, the copyright naturally should extend to 2013, or 2033.
There are some exceptions to this. Perhaps most well known is Peter Pan which the UK has granted a perpetual copytright in favour of the Great Ormond Street Hospital.
What will library books in Google look like?
If you are in the United States and you search for Books and Culture by Hamilton Wright Mabie, for instance, you'll be able to page through as much of it as you like, because its 1896 copyright means it's now in the public domain in the United States. These public domain books look very similar to publisher-submitted books except you will be able to click through all the pages of the book. (source).
Google Print does what you say, but it also indexes public domain books and offers these to a user for free.
I predict that the winner of this epic showdown will not be determined by the number of books, or the number of pages...but instead, by who gets sued the most!
If I could ask one gift for my new school year... it would be that, books are too expensive (my parents don't pay for my extracuriccular activities). It is that one thing that keeps me from knowing more.
I worship Yahoo & Google for their efforts on this place.
--may the source be with you--
Will they come in opendocument format? Or proprietary PDF?
Just wondering.
When U.S. works pass into the Public Domain is a good summary of the U.S. issues.
Me, I just want 14+14 back.
Hey, wow, that is completely original. Nobody else could have possibly thought of this idea before.
duuuuuhhhhh....
-everphilski-
I don't have the attention span to read anything more than a newspaper article. When is someone going to work on getting every newspaper ever printed scanned and available to search for free online. I know ProQuest is working on it, but I doubt that they will make it available, even in libraries. It's easy enough to search the New York Times archive from any library, but say I want to search the Indianapolis Star archives. I can't even do it in the library. I still have to use interlibrary-loan and have them send the microfilm from a library in Indiana. I mean, seriously, MICROFILM?!?!? What are we living in the dark ages!!
Actually, I prefer plain txt to pdf if I'm reading from a computer (assuming the book is not illustrated), since I have more control over fonts and colors (and I have read quite a few gutenberg books that way). However, I think the best native format (despite its general user-unfriendliness) would be latex, from which txt, pdf, and html could be generated. On the other hand, I suppose it's much easier to generate txt or pdf from scanned pages than latex.
Noticed on boingboing.net that a Chinese company is marketing a DRM-free version of an ebook reader using an eInk screen.
Although I don't think it's on sale, it is the Holy EBook Reader Grail we've been seeking for ten years.
If we're gonna download ebooks, we should have a reader to read them with, no?
16,000 books is small compared to what Amazon started with, 120,000[1]. It's tiny compared to what they have now, which is probably over a million[2] -- and that's just what's live on their website. Who knows how many they have scanned and ready to go? Contrast this to Google Print, who eschew Amazon's "scan what we have" philosophy and instead go for "scan whatever we can lay our hands on." You'd only guess that they have about 200,000 from a simple search[3], but since their program is fairly young it's reasonable to assume that this represents a fraction of the material they haven't yet made live. Plus it was just a search for the number 1, so presumably there are several books that don't contain that particular numeral.
7 108/102-4368024-5588945
My point, for those of you who were asleep during the first paragraph, is that Project Gutenberg will never come anywhere near to the scope of Google Print, A9, or ultimately Yahoo!.
[1] http://www.amazon.com/exec/obidos/tg/feature/-/50
[2] http://a9.com/4 -- make it show books, not the web
[3] http://print.google.com/print?q=1
"The artists somehow has to be paid too."
Elvis agrees! So does John Lennon. And Buddy Holly.
These artists won't produce any more songs unless they get paid. Oh. Wait.
Yeah, but can't anyone just take the online library text and put it in Gutenberg? I mean, it's public domain content, no one can sue for anything there.
Opera, Proxomitron-Grypen,GPG 0x0A1C6EE3
How can you prohibit scanning of PD books?
You were mistaken. Which is odd, since memory shouldn't be a problem for you
do you have a source for this? do you mean that a UC library tried to stop someone from checking out books and scanning them? or do you mean that they didn't allow the gutenberg folks to setup a scanning shop inside a library? there's a huge difference between those two.
i work at a UC library, and i've certainly never heard of any policies about project gutenberg. i'm not sure what kind of arrangements yahoo made, where the scanning is going to happen, etc. but i would imagine that yahoo agreed to (at least) cover the expense and hassle of any library facilities they're going to be using. project gutenberg might not have that kind of funding.
this is all assuming that this was involving public domain books, where the only leverage that UC libraries would have would be their facilities and lending policies. if you're talking about stuff that UC owns the copyright to, then that would be another kettle of fish. it would not surprise me to learn that a campus counsel or some such wouldn't let a library give away rights to content that UC held the rights to (like a library's special collections holdings).
-esme
THey can't even keep there message boards and finance services running. I doubt that they will be any more succeful at this.
Google is safe.
I give 'em three years and they're toast.
What a waste of time, now we have 3 projects doing the same thing, can they not think of something origional to do? Now effectivly we have 1 foss collection of books, and 2 redundant proprietary collections because they could not work together, oh no then we might have all this scanning done 3 times faster, someone needs to scan them a book to teach them cooperation.
No amount of computer resources can make up for human inefficiency.
they must be using rooms full of trained squirrels.
There is this great new invention called a printer! You can use it to turn on-screen documents into printed hard copy!
But wait, there's more! There are even ones that will print on both sides of the paper, and will automatically print two pages onto one side! So you can get 4 pages onto one A4 sheet, thus having text about the same size as a paperback! Put a couple of binding clips on one side and you have an instant book.
More seriously though... Besides the fact that it is both cheaper and nearly instant, you can easily replace the book if it is lost or damaged, and you can just give it away to someone who expresses interest and later print yourself another copy. Plus you still have the electronic version on your computer, so if you want to search for a certain passage you can do this instantly without having to flip pages.
Some people are like slinkies--basically useless but they bring a smile to your face when pushed down the stairs.
Instead of just providing the texts, why doesn't Yahoo or Google go in-depth with a collection of texts, and provide insights on them? Open Source Shakespeare is an example of what I mean. There's an automatically-generated concordance, you can look at all of a character's lines at once, and see statistics about the plays, etc. Those are actual research tools. Being able to search a text is useful, but you could do that in 1975. I would be more impressed if these big search companies figured out how to do something more useful.
Also, the fact that they are plain text, with no markup, formatting, binary code, whatever in them means that they'll always be accessible to anyone, regardless of software or platform. And that's a good thing, too!
:)
I know about the problems that old file formats can cause. However, I doubt that formats like PDF or JPG will ever get "lost". There's just too much information stored in them, and various free libraries available with source code which read and write them.
And if I'm wrong I won't live to see it.
that's why I said "the pages are conveniently sized and bound."
What I thought was self-evident was that I meant: "and I don't have to print, fold, and somehow bind them together on my own, because it's time-consuming and 1/3 an ink cartridge and a ream of paper is more expensive than three dollars."
You can also just give books to people....sometimes they even like them better than hand-made pre-owned sheaves of inkjet prints.
BUT I do think CtrlF is a way better way to find your favorite passage - especially in a long document.
I do see your points as well, and definitely there will be demand for commercially produced books for some time to come.
However, what I described does not require any folding and binding takes all of about 10 seconds. I've done this more than a few times and it does work out well.
I have a Brother laser printer that cost about US$300. I bought this printer for other reasons, but it is a great book printer too. (Has a duplexer, supports both PCL6 and PS3, built-in standard 10/100 LAN port. Basically it will work on any OS that supports PS or PCL6.)
Anyway, it prints duplexed pages at about 16ppm and the toner is cheap. The Windows driver also lets me easily (one click) print two pages onto one side of a sheet. The result of all this is that I can print a 300 page book perfectly in under 5 minutes using only 75 sheets of A4 paper. I then apply two of those triangular binding clips (the ones with the fold in handles), and it's done!
Total cost of around US$1 including the clips, and total time of about 5 minutes. It's not as pretty as a bound paperback but I'm willing to trade that off for the instant availability and the ability to reprint again any time if needed.
(The fact that I live in Japan definitely plays some role in my choice. English books here are very expensive and only available from major downtown bookstores -- and even then selection is pretty limited. Ordering from Amazon Japan (or US/UK) is possible, but the shipping increases the prices and takes time. A $1 five minute book is a dream!)
Some people are like slinkies--basically useless but they bring a smile to your face when pushed down the stairs.
for starters - I get your points. Secondly, please don't condescend. I know a lot about printers, book design, paper sizes and weights, and printing time/issues with downloads/OS compatibility/ripping a large doc through a small printer/ETC. However, it is still my preference to books.
I live in the U.S, and for what it's worth, it's still WAY cheaper to buy a $3 book. Better still to go to the library and get a pile of books for FREE.
Right now I'm reading "Lonesome Dove." 950 Pages. even at 2 pages per side, that's 240 duplexed pages.
so: downloading a 950-page book, sending a 950-page document through a printer, and clipping 1/2 a ream of paper together would be at your rate at least 20 minutes. and have you tried to carry a binder-clipped sheaf of A4 paper around? it's awkward. granted, you could grab 20 pages at a time, but I'd rather just have the book and be able to go back and look things up or re-read them.
so for my 30 minutes of time (minimum), lack of printer, not wanting to carry around a half ream of paper, availability of free or cheap paperbacks - I'll stick to commercially printed books. plus, that way if the author is still living, that means they are more likely to get a cut. If I spend $5 on a book, and the author gets $1 or 50, I feel it's better than just spending $5 on printing supplies and binder clips.
Here's something Michael Hart wrote about this today. He's
the founder of Project Gutenberg, and inventor of eBooks.
-- Greg
Yet another consortium of multi-billion dollar institutions
has thrown its hat into the eBook/eLibrary ring today, just
9 months before the 35th Anniversary of Project Gutenberg's
placement on the Internet of the first eLibrary element, on
July 4th, 1971.
Last December 14th Google used a multi-million dollar blitz
of television, radio and print media to announce the Google
Print revolution: "Today is the day the world changes," but
so far it has been difficult to get even a handful of books
from their project, some 10 months later.
I am wondering of the news media will give the same kind of
coverage to a second such announcement, which will also put
up an alliance of an Internet search engine giant with some
multi-billion dollar libraries. I will be watching all the
news programs tonight in eager anticipation, as I was doing
last December, but I fear that "once burned/twice cautious"
might take some of the wind out of their sails/sales.
However, this effort has one huge advantage: "The Internet
Archive," run by my friend Brewster Kahle. Brewster is one
person who has a proven ability to put an enormous resource
on the Internet for the whole wide world to use.
This different is such that I am willing to bet that Yahoo!
gets off to a better start in the next 10 months than did a
rather completely false start by Google.
Of course, the real test will be to see how long it takes a
project such as this to reach a million eBooks, since there
are already well over 100,000 eBooks already available free
for the taking on various Internet sites, perhaps 50,000 of
them from the various Project Gutenberg sites.
Here's a hope that a few years from now anyone can have the
advantage of a million book home library, and in even a few
years more to ten million books sitting on one inch of your
own bookshelf next to your computer.
Michael S. Hart
Founder
Project Gutenberg
Huh? Where are you from? I worked at a research library at a large state university, and I have no idea what you're talking about. True, libraries pay extortionate rates for journal subscriptions, but when they purchase monographs, they frequently get them off the used book market, just like you or I would. It costs them extra to get it bound in a durable fashion, and to enter it into their Byzantine catalog system, but I've never, ever heard of libraries having to pay extra for books simply because they were libraries.
Also, ongoing royalties? What country does that happen in? I've never heard of such a thing.
Laws do not persuade just because they threaten. --Seneca
Project Gutenberg doesn't just scan books. Actually, they take a lot of their scans from outside sources, like the Internet Archive's Million Book Project. The work that PG does is largely in proofreading and essentially re-typesetting the book. The output of Yahoo!'s work here will be scads of page images, maybe with dicey OCR. The output of PG's work are plaintext (and sometimes HTML) ebooks.
As a fan of Project Gutenberg, I look forward to more page images being made available, since it means more high-quality eBook scans to choose from when picking a project.
I wonder if they'll be hosting anything in Canada, where the copyrights are only Life+50, instead of Life+70 (Europe) or worse (USA). Last I knew, someone was trying to start up PG Canada...
Laws do not persuade just because they threaten. --Seneca
Indeed. It's bothered me for some time now that it takes a good deal of doing to make a nice LaTeX edition of the book, so that it's nontrivial to go from the eBook to a really high-quality printed page.
Luckily, someone's decided to do something about it. See PGTEI, a very verbose and flexible method for marking up literary works. The full TEI spec is gargantuan, so PGTEI is actually a dialect of a subset called TEI Lite. It's an XML markup scheme which has output filters (it uses XSLT, it seems) for plain vanilla TXT (for longetivity, and on general principle), HTML and PDF. (Probably some others as well.)
You can try it out yourself. Grab some examples, and run them through the online tools.
Post-processors are very set in their ways, but as I've recently joined their ranks, I hope to use PGTEI for my first post-production job. It certainly seems more elegant than generating and tweaking multiple formats by hand.
Laws do not persuade just because they threaten. --Seneca
Project Gutenberg does a lot more than scan books. (Actually, they frequently don't actually scan the books themselves; projects like the Million Book Project do that.) The value that PG provides is in the proofreading and formatting of their eBooks. That said, any massive scanning project which provides page images for PG to pick up is quite a good thing.
Laws do not persuade just because they threaten. --Seneca
Project Gutenberg does proofreading and postproduction, which requires a lot more human eyeballs than scanning a lot of pages. While this archive may be tremendously useful to PG by providing raw material, it's not a duplication of effort.
Laws do not persuade just because they threaten. --Seneca
My first reply poked some fun, but I don't think I have been condescending.
I'm not sure where you are getting the $3 books from unless they are used, "stripped", review copies, or unauthorized print runs. In any of those cases the author is not getting a cut. "Lonesome Dove" is $7.99 on Amazon + shipping.
I support authors as well -- I certainly buy (more than) my share of books. I print some too though, mostly because of the need to get the information immediately. If the book is particularly good and I feel I will need it again I will usually pass the printed copy along to someone else and buy a copy for myself. I rarely read fiction these days -- non-fiction is far more important to consume during my limited available time.
To each their own. As I said before, my geographic location plays a part in my choices too. There are many excellent points about life in Japan (especially if you are not a teacher here), but the easily availability of non-fiction English books is not one of them.
Some people are like slinkies--basically useless but they bring a smile to your face when pushed down the stairs.
Assuming the work was written only by American citizens:
Actually, if no one renewed the copyright (renewal became automatic for works published in 1964 or later), it may be public domain. Read the new and improved Rule 6 HOWTO that the fine folks at Project Gutenberg have put together. You can put together a reasonable case that copyright was not renewed, and heck, maybe you could get PG to pick up the book.
Or you could move to Canada and wait until January 1, 2013, when the author's work will enter the public domain there. (Life+50, and all.)
Out of curiosity, what was the book and the author?
Laws do not persuade just because they threaten. --Seneca
I remember seeing some of Dudeney's puzzles referred to before, but I couldn't remember where. Then the book popped up on my RSS feed (it was released within the last month, I think), and indeed, it was full of fun math puzzles. Man, that was nice.
But they don't just have HTML; see various examples of files released with filetype "TEI", including PDF (through LaTeX), TXT (in a variety of encodings, i.e. Latin-1, US-ASCII and UTF-8) and HTML.
Laws do not persuade just because they threaten. --Seneca
It's more like the Million Books project. Project Gutenberg does a lot more than just scan the books; they proofread and post-produce them.
Laws do not persuade just because they threaten. --Seneca
When Google starts getting real legal problems, they will probably change that. Google may think they can do everything they want because of their name, but they are going to be in for a surprise once they get popular enough and the publishers starts legal actions. Google will have no choice but to remove copyrighted material to avoid this. They're not stupid. The result will be a stripped down service.
This is possible if the book is rare and the owner has physical custody. For libraries, this is usually through a controlled-access "special collection" area. They can and do prohibit scanning or transcribing of books, even if PD. They can require signing a legal agreement (license) with any terms they like, such as requiring royalities or restricting further distribution.
The library can require a legal agreement to view or scan the book, and that is where a lawsuit can occur. Of course, the legal agreement doesn't apply to 3rd parties that haven't signed. It's another example of the erosion of the public domain--it's not just Disney and the music industry that's doing it folks--it's the University of California and other libraries.
Google is leading, Yahoo is following, they will never be number one. :)
But it is good that Yahoo do this, because Google was criticised for doing it, but now Yahoo does it too.
Now it is easier to get hands on information, and there is no wear and tear on digial formats like in books, so data can be preserved longer (hopefully).
Wikibooks also has free content. http://wikibooks.org/
"Think free. Learn free."
Also http://wikisource.org/
Hope this really sparks Free/Open Content!
would not surprise me to learn that a campus counsel or some such wouldn't let a library give away rights to content that UC held the rights to (like a library's special collections holdings)
So in other words, the public domain is locked away. The PD consists of OLD books, which are largely in special collections.
Here's some policies I digged up. It's worse than the policy though. They say write a letter explaining your needs and they ignore you.
It is rather obvious that Google is not trying to rip off authors.
Rather, it is trying to send the publishers to the trash bin of
history. Two years from now, if I have a book to publish, I can make a deal
with Google to put it online and they will do on demand printing and shipping for me.
Then I sit back and cash my royalties check every 6 months.
Out of print books will generate revenue this way also. As it
turns out, 99.8% of titles ever printed are currently out of print and
unprofitable for another print run. The Authors Guild Backinprint.com
has only 8,000 authors and is profitable, imagine the money to be made with
millions of authors.
here
the UCSD policy you cite says:
which sounds to me like a non-commercial project like gutenberg would probably not have to pay the access fees. the other UCSD policy mostly talks about limiting duplication because it stresses the rare and original materials that the special collection is setup to preserve.
it doesn't surprise me that gutenberg didn't fall into their ordinary categories (commercial or academic). so any request to copy an entire volume would probably result in an interal discussion about how the special collections materials are preserved, accessed, possibilities for digitization, etc. in particular, librarians are very wary of digitizing things piecemeal without sustainable plans for organizing and maintaining them afterwards. (because it usually leads to doing the same work over again later).
if you gave me a list of materials you wanted to scan, i could talk to some people about what plans are being developed to digitize things. especially if the materials were things that only UCSD had, we could probably give priority to digitizing the materials where people had expressed actual interest in the content.
-esme
I thought you were... but to make this entirely argumentative is more than i care to do.
cover price on lonesome dove is 5.95 - there's a bookstore around here that sells new books at 20% off cover, and used books for 1/2 cover price. 950 awesome pages = $3.
Sorry, I wasn't clear. Say they let Yahoo scan the books,because Yahoo decides to pay for the OCA. Can't anyone just copy or OCR the Yahoo PDF or whatever to gutenberg as the text is public domain?
Opera, Proxomitron-Grypen,GPG 0x0A1C6EE3
I find it funny (in an ironic way only) that the University of California is allowing its public domain books to be scanned by Yahoo. At the same time, UC libraries prohibit scanning for Project Gutenberg or other true "open" content projects unless they receive $$$$ in royalities... libraries are supposed to be public resources...
University library != public library.
Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
University library != public library
Library on Public State University that receives state and federal money to be a public institution == ?
Being funded by tax dollars does not make an institution 100% open. Try bringing some friends to your local IRS office for a picnic in the lobby. Or maybe your local police station--hell, it says "to protect AND SERVE" right on the cars, right? Walk in there and as one of them to bring you a drink. Or hell, see how far you can get at a nearby military base. They've got lots of cool stuff there, and you're paying for it, right?
Uni libraries are usually restricted to those who pay $$$ to attend said uni. Some will let anyone in, but most won't let anyone but students and faculty check out materials. Those materials need to be there for the students. It wouldn't do to let someone come along and check out 10,000 volumes. And even if it were, who's gonna pay the attendant to check out all those books? And keep track of which are where? And get them back after a certain time? The world doesn't run on nothing. There are administrative costs and lots of other things involved. Just because you can check out a book for free doesn't mean you can check out a thousand at once. Things like that don't scale. Walk into a McDonalds, buy an ice tea, and put some sugar in it. Works, right? Now ask the lady at the counter for all the sugar in the place. Different result, right?
Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
Thanks to a strong possibility that Australia will adopt Life+50, PG Australia's works may have to be moved to New Zealand or Canada. Hence PG Canada. (Though the site design is currently horrible. What's wrong with adapting gutenberg.org or pge.rastko.net? gutenberg.org.au made this same mistake and I'll never understand it.)
Of course, you know that whoever inherited A.A. Milne's estate will be fighting tooth and nail to get Canada to up their terms prior to January 1, 2007, when Winnie the Pooh will enter the public domain. (Same for C.S. Lewis and January 1, 2013, and J.R.R. Tolkien and January 1, 2023.)
Laws do not persuade just because they threaten. --Seneca
Err, to clarify a bit, TEI is a source format which can automatically generate PDF, TXT (in various encodings), HTML and so forth. If a text has been released as TEI, then it will almost certainly be available in all of those formats.
Laws do not persuade just because they threaten. --Seneca
1. s/the different is/the difference is/
2. Disclaimer: a G. Newby works for the Project Gutenberg Literary Archive Foundation.
3. It is not too hard to predict this consortium will fare better, as one of its members, the Internet Archive, has been collecting scans of books in its Million Books Project and Canadian Libraries archive for months, and is thus able to make a running start.
(Disclaimer, I am a PG volunteer.)
"My favorite book was published in 1956, and the author died just 7 years later. He had no offspring and he outlived his wife. Now would someone please explain to me why someone was allowed to extend the copyright and why the work isn't yet in the public domain?"
Without knowing which book you are talking about, it is difficult to give an answer.
Generally though, when somebody dies, there is an heir. That heir would then hold any copyrights the deceased may have owned.
If you are talking about Heaven and Hell by Aldous Huxley or either of The Last Battle and Till We Have Faces by C.S. Lewis, a complicating matter may be that these authors were not American. Unless they published these books also in the US, a different set of rules apply that basically come down to Life+75.
I doubt that the estates of Huxley and Lewis would not have renewed these works, although of course you never know. I seem to have read that Project Gutenberg is producing one of Marion Zimmer Bradley's works that wasn't registered or renewed. Science Fiction may yield good finds anyway, because SF stories were often published in magazine, and the author may have transferred rights, or may have forgotten to register a copyright.
Wow! Yes, I was talking about "Till We Have Faces" by Lewis. I checked the book and the copyright has been renewed. So that means the copyright expires in 2038 (well, for now...) That's a long time.
...just my 2 gil.