How Google Is Solving Its Book Problem
Pickens writes "Alexis Madrigal writes in the Atlantic that Google's famous PageRank algorithm can't be deployed to search through the 15 million books that Google has already scanned because books don't link to each other in the way that webpages do. Instead Google's new book search algorithm called 'Rich Results' looks at word frequency, how closely your query matches the title of a book, web search frequency, recent book sales, the number of libraries that hold the title, how often an older book has been reprinted, and 100 other signals. 'There is less data about books than web pages, but there is more structure to it, and there's less spam to contend with,' writes Madrigal. Yet the focus on optimizing an experience from vast amounts of data remains. 'You want it to have the standard Google quality as much as possible,' says Matthew Gray, lead software engineer for Google Books. '[You want it to be] a merger of relevance and utility based on all these things.'"
But do they really have to shred all the books just to scan them?
in the Atlantic?
For a second I thought they were merely using VSM: http://en.wikipedia.org/wiki/Vector_space_model . As I read further, I was happily proven wrong. :)
I rarely respond to comments. Also, don't ask for clarifications: a brain and Google are faster, believe me!
I think it should work well for scientific monographies as they contain a lot of references to each other, but don't usually get reprinted. [citattion needed]
I have always wondered why the text in these books is not clear. The blurry fonts make my eyes hurt and surely, Google can create a better interface for the main page. Just 1 million dollars can do so much if some expert were hired to revamp the site. Come on Google!
I'm not sure Google can correlate the kinds of data they are talking about because their book metadata (author, title, edition, etc.) is so inaccurate. I often find Google books based on text search that can't be located in author or title searches.
Because paper books sequester carbon.
I hope they aren't trying to get experts-exchange as 8 of my top 10 book results.
You're not taking into consideration the energy required to make the book, or to transport it to the marketplace. The amount of carbon sequestered in the physical pages of a book is insignificant in comparison.
The production of a book releases 8.85 lbs. of CO_2:
http://latimesblogs.latimes.com/emeraldcity/2008/06/paper-vs-paperl.html
Here's a page which indicates most CO_2 production is for energy:
http://www.eia.doe.gov/oiaf/1605/ggrpt/carbon.html
And here's a page which indicates that CO_2 production is a much larger problem for the manufacturing of electronics:
http://www.energybulletin.net/node/49730
w/ a ratio of 12 to 1 for energy usage to weight, so my PRS-505 weighs roughly 9 ozs., so presumably required 108 ounces of fuel to manufacture (on-going energy usage is trivial and not considered)
http://www.epa.gov/oms/climate/420f05001.htm
gives us a figure of 19.4 pounds of CO_2 per gallon of gasoline which equals roughly 16.36875 pounds of CO_2 to make the ebook reader.
So getting two books for the Sony should make it roughly break even, and each printed book beyond that which is not purchased should result in a net reduction of CO_2 emissions, since the energybulletin.net page indicates that the embodied energy usage for electronics is much greater than the lifetime usage.
Sphinx of black quartz, judge my vow.
My printing press is fueled by the frantic posts of trolled know-it-alls.
Shouldn't that be "are fewer data"?
I see even classic Slashdot is now pretty much unusable on dial up anymore.
Did you include the energy cost of the manufacturing and disposal of the batteries that will power your e-reader? How many batteries and e-readers do you expect to consume during the lifespan of a typical book?
Books don't link to each other?
What are citations and footnotes?
I'm not a lawyer, but I play one on the Internet. Blog
Batteries are included in the initial production weight and the battery is a small fraction of that weight --- an e-ink screen reader uses so little power that one needs to recharge every week or so, so batteries last for _years_ --- if one does replace the battery the old one contains materials which are valuable enough to warrant recycling, so the environmental impact is minimal as stated in my post.
An ebook reader which used typical batteries would be a really bad idea and if there are any such, I hope they get loaded w/ rechargeable batteries.
William
Sphinx of black quartz, judge my vow.
The Tegra2 kit I messed with was 1GHz with 1GB of RAM and it wasn't optimized but run Ubuntu great. can't wait
LoB
"Anyone who stands out in the middle of a road looks like roadkill to me." --Linus
To be pedantic yes, it should be.
However, in the common lexicon data has become more of a indefinite noun, few would actually use the singular datum at any point. Thus it becomes natural to talk about it in indefinite terms (is less data) rather than the correct definite terms (are fewer data).