Google Books As "Train Wreck" For Scholars
Following up on our earlier discussion, here's more detail on Geoffrey Nunberg's argument that Google Books could prove detrimental to academics and other scholars. Recently Nunberg gave a talk at a conference claiming that the metadata in Google Books is riddled with errors and is classified in a scheme unfit for scholarly use. This blog post was fleshed out somewhat a few days later in the Chronicle of Higher Education. Quoting from the latter: "Start with publication dates. To take Google's word for it, 1899 was a literary annus mirabilis, which saw the publication of Raymond Chandler's Killer in the Rain, The Portable Dorothy Parker, [and] Stephen King's Christine... A search on 'internet' in books written before 1950 and turns up 527 hits. ... [Google blames some errors on the originating libraries.] ...the libraries can't be responsible for books mislabeled as Health and Fitness and Antiques and Collectibles, for the simple reason that those categories are drawn from the Book Industry Standards and Communications codes, which are used by the publishers to tell booksellers where to put books on the shelves. ... In short, Google has taken a group of the world's great research collections and returned them in the form of a suburban-mall bookstore." The head of metadata for Google Books, Jon Orwant, has responded in detail to Numberg's complaints in a comment on the original blog post — and says his team has already fixed the errors that Nunberg so helpfully pointed out.
...when you have Search? Pick your own keywords.
Do not mock my vision of impractical footwear
And this is no exception. Before google books you had access to books from various libraries, books you owned, books you could loan from friends (*shock* *gasp* copyright infringement), books you could buy and books from non-google online sources. Now you have access to all of those and additionally google books. Even if google books is 99% "piece of shit" (which in my experience is simply not true, but nevertheless) you still have the 1% potentially useful material available that wasn't available before, so you win.
like shelving 'Life of an Iceberg' under biographies, but by and large they strive to be and are correct. If they mess up, some other library will fix the error. Libraries' cataloging data is usually centralized by OCLC so that the data is uniform throughput the country as other libraries pull from this central source for their own catalogs. Libraries also use a recognized and standardized subject scheme with a controlled vocabulary, not just a bunch of meta tags. Cataloging librarians are a rare and little-recognized breed of people who spend their entire professional lives trying to make it easier to gain access to material. The result is an organized body of knowledge--not just a heap of books on the floor in no particular order, like the Internet--and Google. For Google to blame libraries for their troubles is like blaming the Machinist Mates on the Titanic for crashing the ship into an iceberg. There, full circle. How did that happen?
How about a moderation of -1 pedantic.
The inline replies are written with a smug sense of self-entitlement as though he and other "scholars" are the only legitimate users of Google Books. It's NOT about you - you are not going to create enough adsense hits to make this whole thing worthwhile (or turn a profit).
This is much like Google itself.
Google's brilliance, and woe, is its sloppy imprecision.
You type in a query. It returns a bunch of stuff. Quite a lot of it is irrelevant and as perceived as not meeting the requirements of the search, but you don't mind because all you care about is that it finds what you want, not that it finds other stuff. Unfortunately, Google is so good that it tricks you into believing that it always finds everything that matches your query. But, of course, there's no way to find out what it _missed_.
I've personally noticed and been puzzled by the publication dates. I'd noticed it particularly with periodicals. What seems to be the case here is that Google is very prone to give the date that a journal began publication as the publication date of every article that has ever appeared in that journal.
Wikipedia editors are well aware of the dangers of using Google hit counts as data. It's amusing to see that there are 1,930,000 hits on "Ghandi" compared to 22,900,000 for "Gandhi" and conclude that Gandhi's name is misspelled 10% of the time... or to notice, as I have, that that percentage is increasing and project the year in which "Ghandi" must inevitably become the accepted spelling... but it is, as they say, "for amusement purposes only."
"How to Do Nothing," kids activities, back in print!
Yes, having all of the world's literature available for instant full text search sounds
disastrous for scholars.
Where are we going and why are we in a handbasket?
They pushed the copyright law to over hundred years (just to make sure they will make money of writers even after they are dead), now comes our big brother Google to the ring to resurrect all the OUT OF COPYRIGHT books -- meaning those dead books that publishers no longer exclusively distribute. What an offense against the poor publishers. Google is creating a real e-Library of enormous proportions of virtually free books, what a threat. I bet I am not alone who wants to see the Newton's books on physics e-published again and searchable.
Sorry if I sound bitter, but I spent a lot of time reading this crap, and very little of it was as insightful or interesting as even my classmates' comments.
That sounds like more of a you problem than an academia problem. If you don't enjoy using a work's minutiae to accuse perfectly innocent authors of misogyny, innuendo, (to add a couple you forgot) blatant colonialism or latent homosexuality, what the fuck were you doing in an English Lit program? The rest of us live for that shit.
As someone who should not have majored in English Literature in college
There. I fixed it for you.
Mod my comments down. It'll be fun.
Which is incredibly helpful for anybody interested in printed materials before 1966...
The concern is really the Faustian bargain that Google has been willing to strike with trade groups (like the Author's Guild settlement). Google has conceded the point that these groups should be facilitated in their great land grab of out-of-print books, in return for Google's right to index them.
It is reasonable to question whether those bargains are fair, especially since we have projects like the Internet Archive, which wouldn't make such a concession. It's also reasonable to question whether Google and a trade group even have the legal standing to strike that sort of deal.
I don't read him as saying, "any book that can be found in the holdings of a major research library is only of interest to scholars." at all. Rather, I read him as sayin that the systems that libraries use to organize books be they Dewey Decimal, Library of Congress, or some other system were created to help organize books for users to use them. The BISAC classifications were developed to help companies sell books. Why use that rather than what the libraries -- the source of these books -- uses?
http://www.popularculturegaming.com -- my blog about the culture of videogame players