How Google Is Solving Its Book Problem
Pickens writes "Alexis Madrigal writes in the Atlantic that Google's famous PageRank algorithm can't be deployed to search through the 15 million books that Google has already scanned because books don't link to each other in the way that webpages do. Instead Google's new book search algorithm called 'Rich Results' looks at word frequency, how closely your query matches the title of a book, web search frequency, recent book sales, the number of libraries that hold the title, how often an older book has been reprinted, and 100 other signals. 'There is less data about books than web pages, but there is more structure to it, and there's less spam to contend with,' writes Madrigal. Yet the focus on optimizing an experience from vast amounts of data remains. 'You want it to have the standard Google quality as much as possible,' says Matthew Gray, lead software engineer for Google Books. '[You want it to be] a merger of relevance and utility based on all these things.'"
It's because the book-scanning process is completely automated.
I doubt it, it is not exactly hard to get a book that is at a rather fixed distance into focus. Anyway, the reason why the fonts are blurry isn't the focus to begin with, the images that Google shows are simply extremely low resolution. Why they are in such a low resolution I have no idea.
I would guess (read, hope) that while the process means books which are commonly available might be handled in the quick yet destructive manner, books which are more rare or have historic significance beyond the data would be treated much more carefully (at the lower end of the scale, someone with a hand scanner maybe, at the upper end perhaps even people manually transcribing). Ultimately, though, while I think it's a crime for a book to be destroyed, if it's a choice between it mouldering away in a basement somewhere until it falls apart or Google destroying it early in the interest of preserving the data, surely it's better that the ideas rather than the physical object are preserved (I appreciate in reality it's not just a black and white either/or choice).
I suspect this is as much to do with the uptake in ebook readers as any change to the search indexing. Previously, if you were searching for this book you probably had a very specific interest in it and often wanted to buy a copy, now the people searching are more likely looking for free reading material, so the ranks have adjusted to accommodate that (since "people looking for free stuff" is a much wider market than "people with interest in a particular book", so it's easy to swing the ranking in favour of the former).