Slashdot Mirror


Amazon Launches Full Text Book Search

m00nun1t writes "Amazon have launched a new service that allows you to search the full text of books. This sounds like an incredibly useful function as well as technically impressive at this scale. I wonder if a patent is in the works." Or if a patent is already owned.

28 of 241 comments (clear)

  1. Yeah, but.. by michaelhood · · Score: 4, Funny

    can you do it with one click?

    1. Re:Yeah, but.. by KDan · · Score: 4, Interesting

      Some web sites have 100's of A4 pages, but google still returns in a jiffy. I'm pretty sure their book collection is well indexed, if they're offering this service. Probably with the google engine, too.

      Daniel

      --
      Carpe Diem
  2. abuse by technix4beos · · Score: 4, Interesting

    I can almost hear the screams of joy from the underground book pirates.

    How easy can this service be abused, with automatic webbots doing the searching?

    I can imagine there might be filters, time limits, and max searchs/day limits for something of this scale, no?

    --
    user@host$ diff /dev/urandom /dev/uspto
    1. Re:abuse by Maskirovka · · Score: 4, Interesting
      How easy can this service be abused, with automatic webbots doing the searching?
      You can only browse two pages in either direction per search. You also have to be logged in. I suppose someone could script a system to create thousands of account, then use an army of zombie machines to OCR the pages from a variety of different IPs. That is assuming that Amazon has EVERY page of every book available to the service, which I doubt.

      It would probably by easier to coble together a robot built around a laptop with an ocr equiped camera and book manipulation software and set it loose in a big library at night. For 50 years.

    2. Re:abuse by Enoch+Root · · Score: 5, Informative

      You 'almost', but not quite, hear the book pirates, most probably because they don't formally exist. ebooks are widely available in unencrypted format, and the latest releases, while in secure formats such as Secure MS Reader or Adobe, are probably much easier to crack than creating a bot to collect a book online page by page.

      ebooks are a pretty healthy alternative to normal books, but I don't see the publishers worrying too much about piracy. Perhaps it's because the average script kiddie who will spend 2 days downloading Matrix Reloaded from Usenet is just not the type to try and crack open a book, much less crack an ebook.

    3. Re:abuse by wfberg · · Score: 3, Insightful

      How easy can this service be abused, with automatic webbots doing the searching?

      Not so easily. It's easy to see why. The books will be scanned in using OCR. These days a fast and convenient and almost error-free process. But not entirely error-free. Good enough to find documents that are highly relevant to a particular keyword (if "hydraulics" occurs 9 times, what are the odds of OCR getting it wrong all 9 times?) but not good enough for entirely automated book-to-text.

      If amazon would display highlighted portions of the books contents if would probably not exceed a few lines, just like google doesn't present entire webpages in it's result screen). If they did want to show more, they'd have to show an image of the scanned in page anyway, since OCR errors would not be very pretty. (A lot of digital archiving products use a similar approach; they index PDF files that contain the OCR'ed text, invisible to the end-user, and the scanned pages as content which the end-user looks at).

      Besides, to search for each page of a book, you'd have to search for a keyword on each page of that book. Such keywords would most easily be extracted by scanning in the book via OCR anyway!

      --
      SCO employee? Check out the bounty
  3. It works!!! by jabbadabbadoo · · Score: 5, Funny
    1) I typed 'porn'
    2) It returned a lot of results

    Conclusion: It works!!!

  4. Fine grain searches take the adventure away by Dancin_Santa · · Score: 4, Insightful

    Back in the early days of the web, when Yahoo was still a catalog of links and not some super news/search/auction/ebusiness/do-it-all website that it is now, searches were much more fun.

    You really never knew what would turn up as you traversed the Yahoo directory structure. You start searching for blues music and you'd end up with a list of 15 or so good links with .wav samples and more than likely an artist you'd never heard of before. That was the best part, getting introduced to things you hadn't even thought to look for.

    As search techniques are becoming more refined, we are now able to do specific word searches on websites and now books. That's fine if you know exactly what you are looking for. For example if you want to get that book about 'replicants' you'll find Blade Runner, but you won't find anything else. You won't get any information except exactly the thing you are looking for.

    And I think that that is where the problem with this kind of search lies for books/music/etc. If you want to find a song or a book, it most likely isn't going to be a specific word you remember, it will be the tune or the plot, both of which are not searchable.

    I don't see this improvement in Amazon's search system as that much of an improvement. A better improvement could be made to the 'We thought you'd like' feature. Instead of finding only what I'm looking for, I'd like to find other things I might also be interested in.

    1. Re:Fine grain searches take the adventure away by Zardoz44 · · Score: 3, Informative
      Have you actually ever been to Amazon? What you say it lacks is what I like best about it.

      1. After a search, it gives you a list of "Customers who bought this also bought:". For instance, see this.

      2. They have the concept of "Listmania" which allows every user to create a list of their own recommended products. If your search aligns with their list, Amazon will suggest that you look at it. Search for something you want and keep an eye open for the listmania section.

      Doesn't this meet your criteria for "I'd like to find other thinks I might also be interested in.". And on top of that, I suppose the "browse" option is too complicated?

      This new feature of searching the full text only allows you to find related items in a different way. If you have a better idea on how to search their site that they don't provide, send them a suggestion. It is in their best interest to let you find things you want.

  5. Potential tool for discovering plagiarism? by Anonymous Coward · · Score: 4, Insightful

    I remember a teacher once telling a class I was in that our essays may be compared to other essays published online to check for plagiarism.

    Granted, Amazon.com's feature will only (for now) include 150,000 books, but this may very well be another way to catch plagiarizers. Just type in a suspicious phrase and see if there are any 'hits'.

  6. No Searching Inside O'Reilly Books by theodp · · Score: 4, Interesting

    Even though he said he was 'blown away' by Amazon's new Search Inside the Book feature, Tim O'Reilly has decided not to participate in the program for now. 'If they end up being a Google for published content...we need to think better about what publishers get out of it,' he said.

    1. Re:No Searching Inside O'Reilly Books by Zeddicus_Z · · Score: 4, Interesting

      As a Safari subscriber, I'd say it's probably because Full Text Search of online book content is also present at O'Reilly's own Safari online tech book site. You've been able to do the same thing Amazon is now crowing about, on every book Safari has, since launch quite some time ago (year or two perhaps?)

      Safari is more of a "service" (i.e. renting access to book content) than a "feature" of a retail website, which is all Amazon's "innovation" seems to be.

      Basically the only real different between the two (aside from what is cited above) is that Amazon just lets you know the content is mentioned, and shows you a page or two. Safari gives you the entire book. That and that Amazon has a much wider range of books in non-tech genres

      --
      Janie took my gun...
  7. Here's a quote relevant to the parent post by Anonymous Coward · · Score: 3, Interesting

    There's books about everything:

    Encyclopedia of New Media : An Essential Reference to Communication and Technology -- Steve Jones (Editor); Hardcover

    Excerpt from page 0: ". . . post-ranking system used by members the of Web message board Slashdot.org, began as a result of community self- restraint in the face of unrelenting trolls (pointlessly hostile posters). In addition, some cyberspace forums now require . . ."

    See more references to slashdot troll in this book.

  8. Re:Amazon... by will_die · · Score: 4, Interesting

    It is really nice, I was using amazon right as they switched it one.
    I was searching for books on Object Role Modeling(ORM), I had first done a search for ORM and did not find anything of interest. They then switched it on while I did a search of 'Object Role Modeling', this poped up a few books with the text where it was being used.

  9. No more out-of-print books by Bushcat · · Score: 3, Insightful
    As the digital index builds up, we will rapidly come across the situation where the electronic book is searchable, but the printed form is out of print. If this service ultimately allows single copies to be printed for delivery, it will be an outstanding demonstration of print-on-demand technology as advocated by the Print On Demand Initiative and others.

    I'd love to be able to browse a giant back catalog, knowing that an original or facsimile copy could definitely be delivered to me.

  10. One click search. by burtonator · · Score: 3, Funny

    In other news... Amazon announced that the USPTO has granted them a patent on their proprietary "one click search" technology.

    When questioned for comment Google CEO Eric Schmidt said "ug".

  11. New age youth by Anonymous Coward · · Score: 3, Funny

    Youth in the old days: lookup 'vagina' in a dictionary.
    Youth nowadays: lookup 'vagina' in all books on this planet.

  12. Wow! by plasticmillion · · Score: 4, Informative
    I'm impressed. A couple of days I want onto Amazon to find books about Singular Value Decompositions (a mathematical technique that can be used for efficient statistical analysis of large groups of documents, among other things). I wasn't particularly surprised when it returned 0 results, since anyone who puts the term "Singular Value Decomposition" in their book's title obviously doesn't know much about marketing. Of course I don't actually give a damn if the term is in the title or not; I just want to know if the books talks about this technique.

    I tried the search again today and got nearly 5,000 results, with the capability to actually look inside the book and see if the reference is useful to me. Very impressive indeed, patent or no patent.

  13. Various worthwhile uses by emcron · · Score: 5, Informative


    Bash Amazon all you want, but this is a very useful technology.

    In five minutes I was able to find three books that talked about findings first listed in two of my own published scientific papers, yet these books did not cite me, or anyone else, as the source of that information. My lawyer is currently preparing three letters.

    I also found two other books in which the author used verbatim quotes and original theories from various interviews I have given, yet both authors passed off the statements as their own. My lawyer is now preparing five letters.

    Aside from being used to protect my own research rights, I have found the search system useful for finding topics of interest discussed in certain books which are not referenced in any of the descriptions about the books. I just ordered three books I would not otherwise have ever purchased.

    While I don't think highly of all of Amazon's practices, I must hand it to them for whatever technical undertaking created this search feature.

  14. You can see whole pages by AlecC · · Score: 4, Informative

    You can read the page it is on and +/- two pages.

    This is equivalent of the facility you have in a physical bookstore to open a book and browse a few pages before purchasing. I can see it might be very useful, if they get the majority of books in a field accessible like this.

    I wanted a PHP book the other day, and it is very difficult to decidew which one of the plethora available I wanted. So I went to my physoical bookstore. Smaller choice, but I could open each and get an impression of whther ther were slow, detail by detail, dummies books or the sort of high-speed summary I wanted.

    --
    Consciousness is an illusion caused by an excess of self consciousness.
  15. Anyone else notice this? by mike_lynn · · Score: 3, Insightful

    You have to have an account to view the pages. Fine, great. But then it brought up this screen:

    By publishers' agreement, we are pleased to offer Amazon.com customers with a valid credit card the ability to view copyrighted pages.
    Your account will not be charged.
    This one-time process enables you to view limited copyrighted material through our Search Inside the Book feature.


    So they'll let you browse the search pages, if you can prove your identity on record and provide them with financial information. No thanks.

  16. Scanner problems by thrill12 · · Score: 4, Interesting

    Neat idea, but some excerpts come out all wrong:
    See this for example...
    Mass-OCR'ing has it's drawbacks..

    --
    Slashdot: stuff for news, nerds that matter, matter for news, stuff that nerd
  17. Wired article: "The Great Library of Amazonia" by Enigmia+Man · · Score: 5, Informative

    Article in December Wired talks about Amazon's book scanning, how they legally do it, who does it, how many books so far, and protections.

  18. Now we just need... by s88 · · Score: 4, Funny

    A full text search of slashdot, so the editors can search for duplicate articles before they post.

    Scott

  19. Re:Indexing mechanism by real+bio · · Score: 3, Interesting

    Yes, but searching pages scanned/OCR'ed and highlighting the keywords has been a feature of Google search for a long time:

    Google Catalogs (Beta)

    It's very probable that they licensed the Catalog Search technology from Google.

    --

    ---
    Support Mozilla. Buy the CD.
  20. Why..This would be like searching through the LOC! by op00to · · Score: 3, Funny

    What a feat of computing genius! Using computers to search through large bodies of text!!!! Has ANYONE ever done this before?!

  21. Re:Those crazy Brits by fiannaFailMan · · Score: 3, Funny

    Well at least they don't refer to a liquid as 'gas' like the Americans do when talking about petrol.

    --
    Drill baby drill - on Mars
  22. Re:abuse - I've abused it. Sort of. by dnquark137 · · Score: 3, Informative

    I was stuck when working on a problem set; I Googled for a while and found out that there's a bunch of helpful info in one particular problems and solutions book. Curious about the book, I went on Amazon, and lo and behold, I can actually read the book. So, I look at the table of contents, find the relevant section, and search for the heading of that section. I can now read two pages from it. Not a problem; just pick a phrase on the second page and use it as a search query. Lather, rinse, repeat.

    That, of course, would be impractical to do for more than ~4 pages (which was what I needed), but you get the point.

    In a couple of hours I joined a few other guys working on the set, and it turned out they had just bought the book. There was a big "Doh!" when I showed them my printouts.

    Now, if I actually found the book genuinely useful as a result of this experience, I'd buy a hardcopy. But I for now I think I'll stick with the current method. And I suspect many people might do just that: oftentimes there are references that aren't crucial to have, but convenient to turn to on a few occasions. The book search feature is perfect for those.