Amazon Plan Would Allow Text Search Of Books
emmastory writes "The New York Times is running a story (free registration required) about a new development at Amazon - they plan to assemble "a searchable online archive with the texts of tens of thousands of books of nonfiction." Users would only be able to read a certain portion of the text from any one book, but it sounds promising nonetheless. The Times article suggests that this is part of a larger strategy to compete with Google and Yahoo by making Amazon an authoritative source of information on everything book-related."
If this happens, maybe we'll finally be able to find books based on their actual content instead of the (usually pretty crappy) writups that Amazon does on them.
Shouldn't somebody patent this process before Bezos does??
Have you noticed that they now offer web searching as well, and are also generating third-party ads based upon what you're looking for?
This development may bite them back - when I look for something on Amazon now, I often find in their ads that other people have the item cheaper. Amazon may get a nickel or quarter for the referral, but they lose the dollars from the markup.
Get off my launchpad!
... someone writes a distributed bot to query targeting a specific book and sections to finally retrieve the entire book. If it's a distributed app, then it would be tougher for Amazon to block. You could even have it only go after certain parts of the books at different times to make it tougher. Now not to say that this is a good use of effort, but that never stopped anyone from doing such a thing before :)
I remember when doing a search on Amazon for "Database Admin" returned the number 1 response of "The fine art of vaginal fisting" and the reviews that it prompted ... pushing this book up into the top 100 bestsellers. Now what would the ability to read some text from books do ;-)
I always find it annoying when reading a paper boo when I can't Ctrl-F to find a certain segment.
Now I can just hop online to amazon, do the search, it will tell me what page it's on, and I can go read it!
no comment
And minimum wage laborers in 3rd world countries find themselves scanning books into computers and correcting the text using crappy OCR technology for 12 hours a day. This is one job I'd be happy to export to India.
This is my sig. There are many like it but this one is mine.
Although this might (although it's very sure it will) allow people to copy books far more easily than the current scanning (which still needs a lot of spellchecking and proofreading, as text identification is still an unreliable technology, even when scanning uniform text from printed books) and (even worse) manual typing (or screenshotting in the case of ebooks)...
Nevermind the argument, this is a bad idea.
Would this be like OReilly's Safari online books on steroids? Safari is my favorite bookstore for a while now.
---- join dshield.org Distributed Intrusion Detec
Looks like they'll be going with a proprietary solution. Even though the article seems to indicate that Amazon is launching this new service as a response to Google's "Froogle" shopping search product, wouldn't partnering with Google make more sense for them?
See... I would pay up to about 50 dollars a month to have free access to reading those books online... I guess the problem would be printing them out and redistributing them. Perhaps maybe just manuals... I am so sick of shelling out 50 bucks so I can read 5 pages about some topic knowing I will never read the rest of the book. Love the web ... information is free ... hate the web ... information is not reliable and all over the place. :(
I really need to learn to proof read :) it's a good thing I'm not a boo(k) author!
no comment
and if you look for "TEH", will you be redirected to Salshdot ?
Trolling using another account since 2005.
Any returns of C or C++ code might get SCO's law team on your ass..
Trolling is a art,
doesn't this infringe on basically every copyright that the publishing industry has?
I write code.
Isn't this a violation of the privacy of all the people who have biographies for sale at amazon? John Ashcroft could search the text and find out anything they want about Abraham Lincoln! This article should be listed under "Your Rights Online".
No registration, print friendly.
I predict the most used keyword will be :
sex
This would be awesome for students. I've always wished I could just execute a search function through a book to find what I was looking for. It can be a p.i.t.a. to use indexes and thumb around until you find what you need.
The real issue is that Amazon's system doesn't do moderation very well, and as a result the reviews get spammed with people who really really like something.
Or, you get situations where teachers apparently tell their classes to submit reviews on Amazon for a book, and you have 30 reviews that say nothing.
And, of course, being a bookseller, there is a strong motivation for them to bias things so that positive reviews outweigh negative ones.
Jon Acheson
All opinions expressed herein are my own, and not those of my employers, who are appalled.
Seriously, though, I think Google would still be king of search...
Amazon would put snippets of book contents online and Google would then rank said snippets according to the number of times they have been linked to by Amazon aficionados.
If I want to know the 'net opinion on, say, The Lucifer Principle, I'll simply go to Google, which will link to the relevant snippets as ranked by the Internet. Interesting, no?
The right to offend is far more important than the right not to be offended. (Rowan Atkinson)
Remember when MP3.com cached a whole tonne of MP3 files on their servers? And even though they weren't selling them and you could only access them if you provided the original cd (or an exact copy) at one time, it was still decided not to be legal?
Caching the entire contents of books sounds a little beyond fair use. The concept is cool, but they're going to need some publishers behind them. Maybe they think the name 'Amazon' will keep lawsuits away, but it won't.
Ok, one book in raw text mode = (like) 100Kbytes? 200K?
Alrwight. Now imagine a DVD burner. Ok. Now imagine 100,000 books inside a DVD. Not long before you will be able to have *all* the books ever written in a couple of DVDs (or whatever the next generation of optical disks at 100GB will be (from sony)). And what about DRM? Shouldn't books have DRM?
Seriously though, the problem is that you need a clerk to sit down and manually scan all those books.
As in "TEH GHEY"?
This is sure to be the next Amazon.com patent: US-Patent 20030722.47blahblahblah "Ability to search bodies of published texts using RFC 2549".
Powell's World of Books has an EXCELLENT technical bookstore, and their catalog is on-line. Powells.com
Some wealthy do-gooder could pay amazon to use this feature to the public's benefit, linking words such as "porn" to self-help books about sex-addiction and "bomb-making" to a similar book about dealing with pent-up anger...
Sure, your honour, I only OCR'd and put my entire book collection up on Kazaa so that people could search for passages before buying them from me. Same with my mp3s and DVDs, now that I think of it.
Let's look at the fair use provisions in the 1976 copyright act:
the fair use of a copyrighted work [...] for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright.
Purposes such as selling isn't covered, but let's read on, because as with most things written by lawyers for the benefits of lawyers, it's not that clear cut.
In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include :
(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
(2) the nature of the copyrighted work;
(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
(4) the effect of the use upon the potential market for or value of the copyrighted work.
Well, you work it out. It's a copy of the entire work. That it's offered one piece at a time can't be a defence by itself, otherwise those fragments I upload and download to and from various people over eDonkey would be fine by same argument. The duplication is clearly of commercial nature (for Amazon's benefit), but on the other hand, it's arguably increasing the potential market for the copyrighted work.
That last one is a very, very interesting provision. If Amazon can argue that making entire copies and distributing parts of them - potentially all of them - for their profit is just increasing the market for the original work by way of advertising and promoting it, why can't I argue that for my eDonkey use?
If you think this argument is trite, have a look at www.sharereactor.com, which indexes content on eDonkey. You see the "Buy this at Amazon.com" links right there? What is eDonkey doing that's significantly different from Amazon? Are Amazon obtaining each and every rights owners' permission to perform this duplication? I doubt it, so the differences seems to be these:
It's easier to obtain all the fragments from eDonkey (but not much easier, it can take upwards of a week to completely download a large file). And sharereactor is not for profit, whereas Amazon is primarily interested in their own profit.
You work out where the morality and legality lies.
If you were blocking sigs, you wouldn't have to read this.
so i someone wrote a script that sequentially searches for most popular words you can end up with the whole text?
I wonder how much Amazon is going to charge/demand for discount to publishers in order to be in this index?
What about small publishers/self published?
What about exclusivity... of other booksellers want to do the same and Amazon demands it?
This will then prompt publishers to include several pages at the beginning of every book with nothing but "sex sex sex sex sex sex..."
Use Google to do searches of Amazon to do searches of the text of books?
That's gotta fit into your schema somewhere
The NIH has a good start with something of this nature. The NCBI (part of the National Library of Medicine) has a fully-searchable set of about 20 books. The books are generally cover biology topics, but represent some of the standard texts used in college courses. They call the project Bookshelf and it is entirely free. Several books contain direct links to gene sequences, etc.
but then again that's because I'm writing it. :)
..Jeff Keegan
seven syllables explain TiVo: kee gan dot org slash ti vo
"Of course this *could* be great for college paper researchers, looking for a quote or two to stick in a research paper. Depends on how much meat you can really get at."
College is great in this respect. No matter how crazy, ill-conceived, or outlandish your premise is, there are a thousand nut-jobs out there with nice quotations to support it. This would make it even easier to back that dribble up. Especially late the night before it's due, when you need to support that last flimsy claim in order for your paper to make sense.
GeekNights!
Late Night Radio for Geeks!
This sounds like a good project that they could get some gov't funding for.
Besides the obvious copywrite problems, if the gov't was to get involved and Amazon (or whoever) was allowed to permit searching an entire book for concepts / keywords but not be able to view the entire book without paying for it this would both increase sales and usefulness.
If this was the origional model for online music, think of all the problems that would have been avoided. Perhaps a second look at this type of archiving will help the movie industry as bandwidth increases.
rejected (19) accepted (0)
Is there a psychological term related to getting your stories rejected on slashdot?
the book does exist and has raving reviews!!!
3 of 4 people found the following review helpful:
slim in size but big on info, April 20, 2003 Reviewer: Magdalene Meretrix (see more about me) from Idaho This book is very slim -- there are only about 100 pages in it and much space is taken up with line drawings. It's understandable that the book is so slender since there really aren't volumes of information to impart on the subject, but I really wish the book had been longer. The book does covers the information very well and thoroughly. There is no way to make the information sections of the book longer without artificially stretching it out. As it is, vaginal fisting is a topic best suited to an article, not a full-length book. There is a section near the end of the book with poems and stories about fisting, written by people other than the author. I would have enjoyed it if that section had been expanded, even if it went so far as to take up half the total volume of the book. I was hungry for more information about fisting and would have liked to have seen more on the table at my feast of information.
But even if the reader is disappointed by the quantity of written material, they will not be disappointed by the quality. The author quite obviously knows what she is talking about and has produced a very clear and concise guide to an exotic activity that is one of the less understood forms of pleasure sharing available to adventurous and exploratory couples. Addington discusses safety information, hand positions, necessary and desirable supplies to have on hand, and even more obscure topics such as fisting after a hysterectomy. There is one personal account by a woman who tried fisting but did not enjoy it. I would have liked to have seen more varied accounts, especially stories about difficulties (whether overcome or not) and problems with fisting. I was also surprised that no one, not even the author, mentioned the cathartic or healing experiences that some people can have during a fisting experience. Most of the descriptions of what it feels like to be fisted focused on the sensation of being very full and on a very spiritual level of trust and intimacy. In my experience, this is just one hue of the spectrum of sensations and emotions that can accompany fisting.
Having personal experience with this subject, I can say that Addington has covered the physical territory very well and produced a book that is a good information source for beginning explorations in this intense, cathartic, orgasmic activity. I feel comfortable recommending this book to anyone who is curious about adding this activity to their sex life. Those already participating in fisting will probably not gain anything new from this book (other than the few poems, line drawings and one-page personal accounts) but those who have never been introduced to fisting by a friend or lover will learn quite a bit in the pages of Addington's book.
Was this review helpful to you? 9 of 11 people found the following review helpful: Fully Illustrated Sexual Teaching, November 25, 2002 Reviewer: cousinpaco (see more about me) from Cincinnati, OH United States As an up-and-coming investment banker, I work under extreme pressure. My partner also makes her living in a stressful position. Together, we like to relieve our tensions by exploring ways to spice up our sex-life. It's difficult to find guidance and inspiration for such private matters, but Amazon.com has offered a wealth of options. I was looking for a new idea to try in the bedroom, but I couldn't put my finger on it.
Then I found a wonderful book.
When I first came across the cover, I thought it was the abandoned idea for Spinal Tap's latest album "Smell the Glove." To my pleasant surprise, it was a wonderfully written, illustrated manual on the art of vaginal fisting.
Ms. Addington presents sensitive, tender explanations and answers some delicate questions, which went a long way toward making us both feel much more at ease about our predilictions.
And how does indexing tens of thousands of books make Amazon any sort of "authoritative" source for book-related information? Perhaps this would make them a source of "current" book info.
Sorry, but indexing just a tiny fraction of the millions of extant manuscripts doesn't make one an authority.
Now, put the rest of the book online, pay the author directly, and ya got something!
I'm surprised nobodys mentioned Project Gutenberg - I mean, they've been OCRing public doman books for a long time now, and there are thousands of texts available... not in some crappy interface that Amazon will use, but in wonderful, sweet, ascii text format. Couple this with some good regular expressions and you're in business... want to see how many times Sherlock Holmes talked about using cocaine? It's elementary!
Free books, search by title and page number and you have your book.
Only 900 searches for the new harry potter book, sounds like a good deal!
How authors will react is another question.
Isn't this what happens in the RealWorld? You walk into a bookstore, open it up, read a few pages and make a decision on whether or not you want to buy it?
I think publishers and authors would be rather short-sighted to not allow potential customers shop online the same way they shop in brick and mortar stores.
Ryan O'Rourke
search a little, store a little. Search a little store a little more.
Pretty soon you'll have the entire book.
They'll have an app out to search the pieces out and stich them together into one complete book..
Yeah, this will work, thanks for the free ebooks Amazon..
Most real searches for info just lead to sites that are trying to sell whatever. Real research is inside the books. This may help to find information not just sales pitches.
*Accessing http://www.amazon.com/search*
Enter your search criteria:______________
*Enter search "Moby Dick"*
Search Complete:
Moby Dick
by: Herman Melville
Call me...
Would You Like to Read More? This title can be purchased for $14.95 through our...
*Back Button*
Enter your search criteria_____________
*Enter search "Tale of Two Cities"*
Search Complete:
A Tale of Two Cities
by: Charles Dickens
It was the best of times, it was the...
Would You Like to Read More? This title can be purchased for $29.95 through our...
*Back Button-Back Button-Back Button-Close*
Amazon plans book-text search
The Times article suggests that this is part of a larger strategy to compete with Google and Yahoo by making Amazon an authoritative source of information on everything book-related."
Like i'd really go to Google's or Yahoo's website to buy a book/dvd/something online
Ave Molech Setting
While you're doing that, I'll be at Borders with my cell phone.
Remember the days when Republicans were the party of fiscal responsibility?
...no one here is EVER a lawyer. IANAL should go without saying.
Just imagine if Amazon did some deal with the Library of Congress that allowed them to scan in nearly every book published in the United States. Once the information is digitally stored, it could be utilized in other ways as well:
- Libraries around the country could offer consoles on which you could read any book through a secure connection of some type, preventing unauthorized copying, which would prevent book publishers from agreeing to this. You could essentially read any book, even if the library doesn't have it.
- Bookstores, schools and other organizations might get in on this network and offer the same service.
This service doesn't even have to be free. I'd pay a subscription fee to have access to this information, as would the bookstores and whatnot.Boo hoo hoo. Call the whaambulance.
want to see how many times Sherlock Holmes talked about using cocaine? It's elementary!
;)
Now I'm actually curious... How many?
Waaaaaaaaaaaaaitaminit. POEMS?!?! WTF?
Fisting poetry?
Goatse links now on topic?
What the fucking fuck?
Amazon will patent (and pay for the relevant legislation) the "illegal to remove cookie" that stays on your hard drive and tells amazon's search engine how many times you've searched within a given book. Once you reach, say, 10 pages (or a percentage of the book) it won't let you search through at book anymore. And if you dare remove the cookie from your computer or block it in any such way, the DMCA police will be at your door. This seems like the most logical, simple solution :P Ah, and book publishers are allowed to hack into people's computers to 1) make sure the cookie is there and 2) destroy the persons's computer if it isn't.
god bless 'merikuh
Stupid people make stupid things profitable.
Yes. And the President also neglected to mention that various US agencies, to include the CIA had reason to believe that the British intelligence was inaccurate. If the President didn't know that, he should have.
We're left with two choices: either we get to believe that the President and his team are really incompetent, or that he's surpassed Clinton in his ability to use legalistic shenanigans to avoid telling the truth, yet not technically lying.
Sean
Welcome to the big, real, world.
If you want your cheese sandwich on whitebread, it's over there in the corner.
I have close ties with some of the world's largest publishers, and some of my friends are execs in that industry.
Every single major publisher of importance has been adamant about never going along with this, despite the pleadings of certain factions within Amazon (read: pet project). For the publishers, this would be pure downside with no upside and therefore this is DOA. Amazon requires the permission of the publishers and the publishers are committed to making sure it never happens.
It is curious that Amazon is still trying to market this idea to the public when the major publisher's have already firmly rejected the concept without the slightest hint that they'll ever change their stance. This was shot down months ago.
if, one considers the tragic event at alexandria. and if, one projects to the day when one can purchase the 'complete library of congress data block', (with monthly updates...).
then, this amazon sales pitch DOES have some public good
else, i'm going to the book store, because i need answers, indexes, and toc's, not hype; good luck amazon.com, i hope the best.
p.s.
is it only me, or has anyone noticed the close pricing of books online with shipping, and what you can buy at a bookstore? Mu-ha-ha-ha-ha-ha-ha-ha-ha-ha-ha-ha-ha
Wasn't the fisting (hey, I get ads for that in my spam every day!), it was the idea of writing poetry about it. Eeeew! :)
I work at a public library. A year or so ago, I was at a conference
held by our library catalog software vendor, and at one point they
asked the open-ended question, what feature did we most want. I
raised my hand without hesitation and said, "The ability to search
the full text of every item in the library." They laughed, which
I pretty much expected, because I realise the difficulty of making
such a thing happen, but it's true: that's the feature I want.
If Amazon helps get the ball rolling toward that end, then I say,
Go Amazon. They can even patent it, I don't care, as long as they'll
license it to library software vendors and other interested parties.
Seriously, we've been waiting for this feature for a LONG TIME, and
it hasn't been happening. Star Trek made us drool over this feature
a long time ago, but nobody stepped up to implement it in the real
world. It's about time! My opinion of Amazon just shot up a couple
of notches because they're even thinking seriously about finally
really doing this; when/if they actually roll it out, they'll be
the closest corporate thing to my personal hero.
And yeah, just nonfiction for now, but once the proof-of-concept
is done, I suspect it'll prove so useful that lots of genres of
book will be added. Though, I have to admit, it would be _most_
useful for nonfiction.
Cut that out, or I will ship you to Norilsk in a box.
.. if they could just plug in a student's paper and have it analyzed for plagiarism.
Not just profs, either - there's a net.kook on the rec.pets.dogs ngs scamming people with a plagiarized dog training book, but nobody has the time to track down which bits came from where to refute him
Amazon, of course, kicked it back, claiming that I was being negative about the artist. I wasn't. I was actually trying to push sales towards his other work, that people might enjoy. This would actually make more sales for them in the long run, of course, but somehow they thought I was rude:
Was I criticizing the artist, or the content?
Was Amazon protecting the artist, or protecting sales?
Get off my launchpad!
example, my linear alebra book this quarter.
(2/5 stars. and boy do i agree with that.
hooray! it's a sex wiki
Probably this will be mainly for "teaser" purposes (think movie teasers) rather than something that actually allows researching. Like their "Look Inside" feature, which only shows the first few pages of a book. Still cool, though.
I'd *love* to be able to have a searchable index of all my books and magazines, even if it didn't include context like Amazon does, and just returned title, volume/issue, and page numbers.
I had a friend that had a Honorverse book and I borrowed the CD [contains lots of full books by participating authors] in it. That led me to buy the Weber, as well as Ringo's, books [haven't made it all the way through the CD] to stock my library. I now have a nice collection on my laptop and hardcopies. Baen's policy has made me a loyal consumer of their company. After pulling my hair in frustration [heh] at certain authors I usually buy from them, as I know I'm getting a good read.
-- Some days you're the dog; some days you're the hydrant.
The Questia online library at www.questia.com already has 45,000 non-fiction books digitized and indexed for full-text searching, along with more than 350,000 journal, magazine, and newspaper articles. While there aren't any computer tech books in the collection, there is plenty in the liberal arts and social science areas.
This idea has already been done for Stephen King books here.
Excellent! I can't wait until Amazon does this, and then Google caches the entire archive!
- no way will i allow amazon to offer text searches of my titles...
- f*ck amazon...
"Users would only be able to read a certain portion of the text from any one book"
So, if it is just pre-cut snippets from the book that can be searched, big deal. But if it only shows you the portion in which the text was contained, it wil be a matter of hours before a program is written to deliberate search for every 40th word (or whatever the threshhold is for Amazon to 'snip' the text) and copy that text on and on until it compiles the whole book in electronic format. Then, it's distributed.
"Artificial Intelligence usually beats real stupidity."
Personally, I thought about this passage:
//H
but those who have never been introduced to fisting by a friend or lover will learn quite a bit
'No, I did not have intercourse with that woman, I only fisted her'.
I'm too stupid to preview.
I did read the article and got the same idea. As someone mentioned above, you would have to use a number of different Amazon accounts to retrieve different portions of the book, but I don't see why it couldn't be done piecemeal.
With his recent allegations of plagiarism from a Japanese book, Mr. Dylan won't ever have writer's block again with this service.
If I hadn't posted already, oh, man, my kingdom for a mod point :)
Going rate is about Rs 14/- per 1000 words or something. Not exactly enough to support a family of four even in India, but fantastic as a side-income in college.
More than mere navel gazing.
Agreed. Some moderators need to look up the word "redundant" because it's been obvious to me that some of them don't know what it means. Those who do know what it means need to be a little less hard-arsed about applying it. It's not fair to label a similar post made 2 minutes after the first one "Redundant" -- not all of us are sitting atop T-1's or type 300 words a minute, after all! If it's the 3rd or 4th such post, or it's made an hour or two after the first one, that might qualify as "Redundant".
Moderators, look at the posting times and please try to be more reasonable about this, won't you? Thanks.
Il n'y a pas de Planet B.