How Google Is Solving Its Book Problem

← Back to Stories (view on slashdot.org)

How Google Is Solving Its Book Problem

Posted by samzenpus on Wednesday November 3, 2010 @11:57PM from the beyond-the-card-catalog dept.

Pickens writes "Alexis Madrigal writes in the Atlantic that Google's famous PageRank algorithm can't be deployed to search through the 15 million books that Google has already scanned because books don't link to each other in the way that webpages do. Instead Google's new book search algorithm called 'Rich Results' looks at word frequency, how closely your query matches the title of a book, web search frequency, recent book sales, the number of libraries that hold the title, how often an older book has been reprinted, and 100 other signals. 'There is less data about books than web pages, but there is more structure to it, and there's less spam to contend with,' writes Madrigal. Yet the focus on optimizing an experience from vast amounts of data remains. 'You want it to have the standard Google quality as much as possible,' says Matthew Gray, lead software engineer for Google Books. '[You want it to be] a merger of relevance and utility based on all these things.'"

11 of 58 comments (clear)

Min score:

Reason:

Sort:

Re:Scientific books by jank1887 · 2010-11-04 00:30 · Score: 2, Informative

they already do that via Google Scholar. Scientific paper searches often (maybe not often enough) bring up textbook references. I know searching through regular Google does quite frequently.
Re:Why can't the text of these books be clearer? by AdmiralXyz · 2010-11-04 00:37 · Score: 4, Informative

It's because the book-scanning process is completely automated. I can't find a look to it, but a remember a Slashdot or Wired article about Google's automatic book-scanning machine. Basically it's too difficult to adjust for perfect focus for every book.

I wouldn't worry about it though: Google is doing OCR on all these books, and they'll presumably replace the images with plain-text equivalents at some point (more searchable, portable, etc.) That's my hope, anyway.

--
Dislike the Electoral College? Lobby your state to join the National Popular Vote Interstate Compact.
Re:How does one write ... by Anonymous Coward · 2010-11-04 00:41 · Score: 1, Informative

With a fountain pen.
Re:Books Contribute to Global Warming by WillAdams · 2010-11-04 00:43 · Score: 4, Informative

You're not taking into consideration the energy required to make the book, or to transport it to the marketplace. The amount of carbon sequestered in the physical pages of a book is insignificant in comparison.
The production of a book releases 8.85 lbs. of CO_2:
http://latimesblogs.latimes.com/emeraldcity/2008/06/paper-vs-paperl.html
Here's a page which indicates most CO_2 production is for energy:
http://www.eia.doe.gov/oiaf/1605/ggrpt/carbon.html
And here's a page which indicates that CO_2 production is a much larger problem for the manufacturing of electronics:
http://www.energybulletin.net/node/49730
w/ a ratio of 12 to 1 for energy usage to weight, so my PRS-505 weighs roughly 9 ozs., so presumably required 108 ounces of fuel to manufacture (on-going energy usage is trivial and not considered)
http://www.epa.gov/oms/climate/420f05001.htm
gives us a figure of 19.4 pounds of CO_2 per gallon of gasoline which equals roughly 16.36875 pounds of CO_2 to make the ebook reader.
So getting two books for the Sony should make it roughly break even, and each printed book beyond that which is not purchased should result in a net reduction of CO_2 emissions, since the energybulletin.net page indicates that the embodied energy usage for electronics is much greater than the lifetime usage.

--
Sphinx of black quartz, judge my vow.
Re:Rainbows End by Anonymous Coward · 2010-11-04 00:49 · Score: 1, Informative

No. Speculation on Google's process based on a patent filing.
I seem to recall an article that was more than speculation, but I couldn't find it while searching. The 2003 entry for the Google books history also points toward it being a non-destructive process.
Re:Rainbows End by Samantha+Wright · 2010-11-04 01:06 · Score: 4, Informative

Wait! I'm undoing all my mod points because I just realised that no, you're quite wrong. The printing process wouldn't be the same for the older books, and some of them have survived hundreds of years before we came along and scanned them.

However, the story about books being cut up for scanning was about microfilm. I think it was an institution in Texas whose library was cutting them up mentioned as an aside in a submission about how they were converting their library into a lounge and computer lab.

--
Bio questions? Ask me to start a Q&A journal. Computer analogies available for most topics!
Re:Rainbows End by Merpy · 2010-11-04 01:49 · Score: 5, Informative

Google doesn't destroy the books, they've got a patent on "unbending" the pages. http://news.cnet.com/8301-11386_3-10232931-76.html
" 'There is less data about books than web pages.. by unitron · 2010-11-04 01:59 · Score: 2, Informative

Shouldn't that be "are fewer data"?

--
I see even classic Slashdot is now pretty much unusable on dial up anymore.
Re:Why can't the text of these books be clearer? by icebraining · 2010-11-04 02:01 · Score: 2, Informative

One way they do it is through reCaptcha. When you're typing them, you're also helping the OCR process.

--
Dilbert RSS feed
Re:Rainbows End by ElizabethGreene · 2010-11-04 04:27 · Score: 2, Informative

But do they really have to shred all the books just to scan them?
No. A book scanning machine is capable of scanning a book non-destructively. My unsubstantiated guess is that they are less harmful to the book than your average reader.
You can build one if you'd like. Instructable The automated page turners on the commercial models are awesome. Youtube video
Re:Link == citation by Anonymous Coward · 2010-11-04 06:23 · Score: 1, Informative

CS Lewis and Tolkien weren't really known for their citations.
(Now Tolkien might well be known for his appendices., but that is totally different.)