Google Books As "Train Wreck" For Scholars

← Back to Stories (view on slashdot.org)

Google Books As "Train Wreck" For Scholars

Posted by kdawson on Monday September 7, 2009 @11:52AM from the mishmash-wrapped-in-a-muddle dept.

Following up on our earlier discussion, here's more detail on Geoffrey Nunberg's argument that Google Books could prove detrimental to academics and other scholars. Recently Nunberg gave a talk at a conference claiming that the metadata in Google Books is riddled with errors and is classified in a scheme unfit for scholarly use. This blog post was fleshed out somewhat a few days later in the Chronicle of Higher Education. Quoting from the latter: "Start with publication dates. To take Google's word for it, 1899 was a literary annus mirabilis, which saw the publication of Raymond Chandler's Killer in the Rain, The Portable Dorothy Parker, [and] Stephen King's Christine... A search on 'internet' in books written before 1950 and turns up 527 hits. ... [Google blames some errors on the originating libraries.] ...the libraries can't be responsible for books mislabeled as Health and Fitness and Antiques and Collectibles, for the simple reason that those categories are drawn from the Book Industry Standards and Communications codes, which are used by the publishers to tell booksellers where to put books on the shelves. ... In short, Google has taken a group of the world's great research collections and returned them in the form of a suburban-mall bookstore." The head of metadata for Google Books, Jon Orwant, has responded in detail to Numberg's complaints in a comment on the original blog post — and says his team has already fixed the errors that Nunberg so helpfully pointed out.

42 of 160 comments (clear)

Min score:

Reason:

Sort:

Who needs metadata any more by Nefarious+Wheel · 2009-09-07 11:54 · Score: 3, Insightful

...when you have Search? Pick your own keywords.

--
Do not mock my vision of impractical footwear
1. Re:Who needs metadata any more by timeOday · 2009-09-07 12:03 · Score: 5, Interesting
  
  To read the article, it is mostly a problem for people who are essentially studying trends in metadata itself, such as the emergence of some particular word over time. The "oddball" categorizations, I agree, why would anybody browse the "technology" section of a collection with millions of titles?
  The odd thing about complaining about this is, what are they comparing to? A hypothetical perfect online database that doesn't exist anyways? The article says google got it wrong in some cases where, e.g. the Harvard Library got it right. OK, that's an issue for all of us deciding whether to search on our nearest computer, or at the Harvard library.
  To me, google's project was a long time coming - somebody had to scan the world's back catalog. Maybe it would be better if governments had done it, but (and this is the point) they didn't. Google is.
2. Re:Who needs metadata any more by Artraze · 2009-09-07 12:25 · Score: 4, Insightful
  
  > The odd thing about complaining about this is, what are they comparing
  > to? A hypothetical perfect online database that doesn't exist anyways?
  That's exactly why this article is little more than some long winded trolling. So the metadata is wrong... As long as the books themselves are perfectly fine (which they seem to be), you can always check the metadata your self. I must think that as far as Google is concerned (and 99+% of its users) the metadata isn't nearly as important as the data itself. Once the data is collected you can always fix the rest.
  Expect a new "tagging game" in the next year or two to manually correct these error.
3. Re:Who needs metadata any more by Potor · 2009-09-07 12:49 · Score: 5, Interesting
  
  Exactly. And the whole argument totally ignores the fact that these books are now easily available.
  Shock horror: I am a liberal arts scholar. And Google Books has helped me incredibly in a project I am doing on a 18th century scholar. I have original texts in various editions at my fingertips, wonderful reference books (including a dozen 18th and 19th century Latin grammars), and serious secondary literature. Not all of these are fully posted on Google Books, but now I know what books to check out of the library, or even buy.
  As an arts scholar, I love Google books.
4. Re:Who needs metadata any more by martin-boundary · 2009-09-07 13:31 · Score: 4, Insightful
  
  The odd thing about complaining about this is, what are they comparing to?
  
  How about good old fashioned legwork? It *is* possible to make sure that the metadata is consistent with the facts, but that involves doing actual research and verification such as academics have been doing for hundreds of years.
  
  To me, google's project was a long time coming - somebody had to scan the world's back catalog.
  
  Then you have very low standards indeed. There's absolutely no reason why a single entity had to / has to scan all the world's back catalog on their own as fast as they can. It's pure commercial greed, and leads to the garbage we have on the net today.
  What is needed is an open standard for scanned works, with minimum resolution, minimum quality, and minimum verified metadata such as subject, author, publisher, year etc. All those are trivially listed on the title page of every book. All one has to do is open the damn book and flip a few pages, but that appears to be too hard for some people.
  This is a long term project for humanity. There's absolutely no point in having crappy scans with garbage metadata available quickly today, when it could be available correctly with good quality in say five years. It's also a perfect case for crowdsourcing, with some real standards to ensure quality.
  The current dreck that's online only causes duplication and waste. Take a look someday at archive.org (for example), and see how many copies of the same book are available, if it's a popular book. You'll typically find 5-10 scanned versions, by Google, Microsoft, and various local library projects, in black and white or colour none of which is truly good quality: broken characters, pages with dark margins, missing pages, typos or incorrect titles, wrong authors etc.
  Why did they bother?
5. Re:Who needs metadata any more by Anonymous Coward · 2009-09-07 18:23 · Score: 2, Interesting
  
  Why did they bother?
  1. I call absolute BS on the poor scanning quality. I have looked at 50+ books on Google Books, and not once noticed a problem with the scanning. Certainly a hell of a lot better than *I* would have done.
  2. The cost and time and legal battles required to do the scanning pretty much make it impossible unless a private corporation is leading the charge. What good does it do to try to rely on random-ass people to scan every book in existence, and every book as it comes into existence as fast as it comes? Good luck with that. And what makes you think they'd do a better job than a company that's devoted huge amounts of work to mastering the single repetitive task required to do it practically, and that can apply that practice to every single book?
  3. If you're worried about Google being evil / being too powerful / blah blah, fine, but since you don't mention that, I think you have to honestly believe they just suck. Perhaps you'd rather the US government spend 10s or 100s of millions of dollars to do it instead, because they really need to spend more money right now, and we can trust THEM to do it well.
  4. What does poor metadata have to do with anything? The task of scanning is completely separate from the OCR that goes into metadata. As Google improves their OCR, the metadata will fix itself. Or, you know, since this is a manageable task, maybe people can contribute on their own. Like the authors of this article did, and which Google gladly accepted.
  Since there are ACTUAL problems with Google Books (you know, like the ethical ones), maybe you should complain about those instead of this nonsense.
6. Re:Who needs metadata any more by RandomUsername99 · 2009-09-07 18:29 · Score: 3, Informative
  
  I worked for the Harvard Law School Library and saw such a work in progress for the documents used in the Nazi war crimes tribunal at Nuremberg. The process of putting this together was extrordinarily expensive and even with the HLSL donating the Server, Traffic, labor to maintain the back end code (which it still does), etc. the project ran out of funding 13,904 scans in and is currently seeking funding.
  Although the metadata surrounding the scans of these books would not have to be nearly as detailed, it's worth noting that google is not a non-profit organization with a set of gigantic grants for book preservation. They needed to put together something that would make enough money to at least fund its own existence immediately.
  Why did they bother? Is it enough that it's useful to many people even if it's not useful to everyone?
  One could certainly put together the electronic preservation project of everyone's dreams... I wouldn't be surprised if some very smart people somewhere in academia have already designed it. Sooo if you would be so kind as to cut them a check so it doesn't have to be up to a company who's worried about it being a financially solvent program from a business perspective, I bet they'd start tomorrow.
7. Re:Who needs metadata any more by introspekt.i · 2009-09-07 18:37 · Score: 3, Interesting
  
  You act like the technology and processes use to generate this catalog are going to remain deficient indefinitely. You ignore the fact that consumer demand for better (metadata|accuracy|whathaveyou) will drive improvements in the technology. In the meantime, we get access to the early iterations of the technology and the benefits it can provide today.
  
  What is needed is an open standard for scanned works, with minimum resolution, minimum quality, and minimum verified metadata such as subject, author, publisher, year etc.
  Necessity is the mother invention. Wait for one to pop up, or go make one up. Nobody's stopping you.
  
  All those are trivially listed on the title page of every book. All one has to do is open the damn book and flip a few pages, but that appears to be too hard for some people.
  Opening the covers of every possible resource you use is quite easy when you have a discrete, present set of resources to thumb through. What if your resources aren't present, are high in number, or (lo!) are undefined...because you don't even know what exactly it is you're looking for?
  
  This is a long term project for humanity. There's absolutely no point in having crappy scans with garbage metadata available quickly today, when it could be available correctly with good quality in say five years.
  I think you're absolutely wrong. It's naive to assume we can just have an instant rubber-meets-the-road system available in x years without rigorous testing and input on the part of users. No point? Hah! This is absolutely the best way to go about things! Let the system work itself out with angry users pushing technicians to improve archives to have the best working system in the end. The Google system is hardly "done" and it's only going to get better with time.
  
  The current dreck that's online only causes duplication and waste. Take a look someday at archive.org (for example), and see how many copies of the same book are available, if it's a popular book.
  God forbid we have multiple copies of popular books in different archives.
  
  black and white or colour none of which is truly good quality: broken characters, pages with dark margins, missing pages, typos or incorrect titles, wrong authors etc.
  Quality is relative. Why prohibit use because we lack perfection?
  
  Why did they bother?
  Why did you bother? Why did I bother? Why does anybody bother? Probably because we all feel like it.
8. Re:Who needs metadata any more by natehoy · 2009-09-08 01:55 · Score: 2, Interesting
  
  Given a project of this magnitude, there are inevitably going to be bad scans, and bad data, and other issues.
  And, just as inevitably, the problem areas are going to be updated and replaced with good ones when they become available.
  "There's no point in having crappy scans with garbage metadata today" would be indisputably true if every book out there was a crappy scan with garbage metadata. Instead, what we have a starting point with some good scans and some bad ones, but there's no point holding back the entire project just because some of the books have bad scans or metadata. You go live with what you have, then add/correct as needed.
  Remember, too, that none of these books replace what is available in your local library, they supplement it. If your local library has a copy of a book you want, it's still there. If they don't, Google Books will probably have it. Chances are, their scan will be good, but let's assume it's not. Isn't a barely readable version better than no version whatsoever?
  This isn't a NASA mission. If a book ends up being a crappy scan, it won't explode on re-entry killing its reader.
  This is, however, a for-profit venture. As such, it cannot wait until every page of every tome is pristine before it goes live.
  Sometimes, you go live with what you've got, even if it's not perfect, because it's not only in the best interests of profit, but because there's a benefit to having the product out there. Google Books will start as a supplemental database, and where there are good scans of books with good metadata, this will make books more available and accessible to all. Books will be missing from its catalog, and books will be unreadable at times, and books will be misfiled, but the same is true of any library.
  Google Earth went live long before detailed imagery was readily available for a lot of the world, so those who lived in an area of the world that lacked detailed imagery saw low-res imagery (green fuzzies, with a vague idea of where really big things might be) where the pictures should be. As the imagery became available, they added it to the basemap. But Google Earth made detailed cartography available to the masses in a way that it had never been available before. And, hopefully, Google Books will be able to do the same with the written word.
  
  --
  "This post contains words, known to the State of California to cause thought. Wash brain thoroughly after reading."
Error free system? by Bacon+Bits · 2009-09-07 12:01 · Score: 2, Informative

So, the argument is that the new system is bad because it may have errors or bad data?
Were card catalogs immune to this? It's a database. It's only as good as what you put into it. A bad database is not useful. It just means someone needs to do it better. Honestly, if anything this seems like an argument that the database shouldn't be proprietary. It should be open to everyone so that someone can always make a better version of the metadata with the same base data.
"It's a piece of shit" shouldn't be the same argument as "nobody should even try it". The Wright brothers didn't exactly start out with a 747 or an F-35.

--
The road to tyranny has always been paved with claims of necessity.
1. Re:Error free system? by Bacon+Bits · 2009-09-07 17:39 · Score: 2, Interesting
  
  The Wrights didn't start out building toy birds, true. They first tried to use the data from some Russian or European who had modeled wings after birds. They found that the lift his data predicted was so far off from what they observed in their gliders that they could no longer assume that the data hadn't just been made up. Then they went and built a small scale wind tunnel and designed small model wings which could be reformed and shaped and angled easily and a scale which could be used to measure lift from the wing model. So, no, they didn't start out building toy birds. They effectively ended up doing that when they discovered how little data there was on the subject of a wing. They took a step back to toy bird models.
  http://www.hulu.com/watch/23333/nova-wright-brothers%E2%80%99-flying-machine
  
  --
  The road to tyranny has always been paved with claims of necessity.
Re:all the books in the world by Bacon+Bits · 2009-09-07 12:16 · Score: 4, Funny

They haven't finished counting Stephen King's books yet.

--
The road to tyranny has always been paved with claims of necessity.
"scholarly" information by Aurisor · 2009-09-07 12:22 · Score: 2, Interesting

As someone who majored in English Literature in college, I can tell you that academics love getting their panties in a bunch over what is Scholarly Publication and what is not. Some teachers will actually have special assignments that have to be written entirely using Scholarly sources, or in response to a Scholarly article.
Before the advent of the internet, I can see how it might have been useful to have an in-group comprised of people who had some sort of qualifications to write about something, but it seems antiquated in light of the ease with which we can independently verify claims.
Usually, if someone's going to write something that's actually useful, they'll write an actual book. Soon thereafter, a bunch of "Scholars" will come along and write a bunch of journal articles and tell us all about how the useful work was one of three things: misogynistic, code for a religious statement, or arcane, carefully-hidden innuendo.
Sorry if I sound bitter, but I spent a lot of time reading this crap, and very little of it was as insightful or interesting as even my classmates' comments.
1. Re:"scholarly" information by ahoehn · 2009-09-07 13:08 · Score: 4, Insightful
  
  Sorry if I sound bitter, but I spent a lot of time reading this crap, and very little of it was as insightful or interesting as even my classmates' comments.
  That sounds like more of a you problem than an academia problem. If you don't enjoy using a work's minutiae to accuse perfectly innocent authors of misogyny, innuendo, (to add a couple you forgot) blatant colonialism or latent homosexuality, what the fuck were you doing in an English Lit program? The rest of us live for that shit.
  
  As someone who should not have majored in English Literature in college
  There. I fixed it for you.
  
  --
  Mod my comments down. It'll be fun.
2. Re:"scholarly" information by moosesocks · 2009-09-07 14:12 · Score: 3, Interesting
  
  Actually, the GP's got a good point. Back in college, I took a number of humanities courses whenever I could squeeze them into my schedule.
  I can say from firsthand experience that there are a lot of "scholarly" articles that are complete and total crap. When writing papers, I'd frequently peruse JStor for pertinent articles about my topic, keeping an eye out for particularly good articles, as well as the heinously bad ones. Picking apart and systematically disproving a bad paper published in a "good" journal was an easy ticket to an 'A' on the paper.
  These papers, of course, were certainly the exception. Most scholarly papers I encounter are humbling in their brilliance. However, I've seen more than a few bad journal articles, as well as quite a few blog entries that would be worthy of scholarly publication. It's hard to make any generalizations about the validity of certain sources of information.
  Unfortunately, Physics wasn't quite as easy to bullshit (Random aside: The physical sciences certainly have their fair share of bad journal articles, especially in light of the fact that printed media is a terrible means by which to communicate scientific results. It's a cruel irony that the www was invented to enable collaboration and information exchange between scientists, but is rarely (if ever) used for that purpose. Also, any use of the word 'trivial,' or its synonyms needs to be punishable by death.)
  PS. Don't judge our writing abilities based upon out slashdot comments. I'm sure the GP had his own reasons for majoring in English, even though literary discourse is often trite and contrived.
  
  --
  -- If you try to fail and succeed, which have you done? - Uli's moose
Anonymous Coward by Anonymous Coward · 2009-09-07 12:28 · Score: 5, Interesting

Google has scanned many volumes of the Laws of Indiana, which go back to 1816. These are the session laws of the Indiana General Assembly and have never been copyrighted. However, Google has arbitrarily decided not to make most post-1922 volumes it has digitized, and even some pre-1922 volumes (e.g. 1877, 1893, 1895, 1909, 1917 and 1918), available, using the claim of copyright.
Google has done all the decision-making here. Anyone who might object to the classification of one of these volumes as copyrighted and thus available in "snippet-view only" presumably would have the burden of proving the contrary. (And where would you even start? Who would you contact? I have seen nothing on this.)
Once (or if) the settlement is approved early this fall, Google's "rights" attach to these volumes. If I understand correctly, at that point any individual who wishes to access one of these volumes of Indiana's session laws not already in "full view" will have to pay for it, and for the money will obtain only individual rights, NOT the right to make it freely available to others.
Broader implications: Finally, this analysis has been limited to volumes of Indiana session laws, but surely similar situations exist more broadly.
For more on this, see this Aug. 2, 2009 Indiana Law Blog entry: http://indianalawblog.com/archives/2009/08/courts_my_probl.html
Something is usually better than nothing by Anonymous Coward · 2009-09-07 12:30 · Score: 5, Insightful

And this is no exception. Before google books you had access to books from various libraries, books you owned, books you could loan from friends (*shock* *gasp* copyright infringement), books you could buy and books from non-google online sources. Now you have access to all of those and additionally google books. Even if google books is 99% "piece of shit" (which in my experience is simply not true, but nevertheless) you still have the 1% potentially useful material available that wasn't available before, so you win.
1. Re:Something is usually better than nothing by julesh · 2009-09-07 19:40 · Score: 2, Insightful
  
  The problem is that the existence of google books makes it harder for others working on similar systems (and there are others, this isn't just a pipedream) to become established. A Google Books court-approved class-action copyright settlement would make it harder for somebody else to reach a similar agreement (because the public interest argument will be harder to make). Essentially, this is a field where the first person to do it is likely to end up with a monopoly, and Google have done it badly, thus precluding other people from doing it properly.
Sure, libraries make mistakes by mschuyler · 2009-09-07 12:30 · Score: 2, Insightful

like shelving 'Life of an Iceberg' under biographies, but by and large they strive to be and are correct. If they mess up, some other library will fix the error. Libraries' cataloging data is usually centralized by OCLC so that the data is uniform throughput the country as other libraries pull from this central source for their own catalogs. Libraries also use a recognized and standardized subject scheme with a controlled vocabulary, not just a bunch of meta tags. Cataloging librarians are a rare and little-recognized breed of people who spend their entire professional lives trying to make it easier to gain access to material. The result is an organized body of knowledge--not just a heap of books on the floor in no particular order, like the Internet--and Google. For Google to blame libraries for their troubles is like blaming the Machinist Mates on the Titanic for crashing the ship into an iceberg. There, full circle. How did that happen?

--
How about a moderation of -1 pedantic.
Why Isn't Google Books A Library? by LifesABeach · 2009-09-07 12:31 · Score: 4, Interesting

With all the class act talent that Google hires right out of college, why can't Google create its own Public Library on the Internet? Chrome could be the entry way to any book that is in the Public Domain, or by the Authors written permission. Turning the page of a book could be as simple as the [Back], or [Next] button. The "Card Catalog" would be a No-Brainer. No Library goes through these many hops. There's even translation to other languages, Brail, and Audio; from my viewpoint, this SHOULD be the challenge, not what word category is or isn't. If it's a case of "buy the book", then to buy 10 copies of "Gone with the Wind", and ONLY allow up to 10 readers to ONLY read "Gone with the Wind". Google could even have a "Google Online Library Card"; this is were the company hums "Ka-Ching".
1. Re:Why Isn't Google Books A Library? by QuantumG · 2009-09-07 12:41 · Score: 2, Funny
  
  So you haven't read any of the stories that have appeared on Slashdot in regards to Google's plans for their Books service eh?
  
  --
  How we know is more important than what we know.
2. Re:Why Isn't Google Books A Library? by riffzifnab · 2009-09-07 13:21 · Score: 3, Funny
  
  With all the class act talent that Google hires right out of college, why can't Google create its own Public Library on the Internet? Chrome could be the entry way to any book that is in the Public Domain, or by the Authors written permission. Turning the page of a book could be as simple as the [Back], or [Next] button. The "Card Catalog" would be a No-Brainer. No Library goes through these many hops. There's even translation to other languages, Brail, and Audio; from my viewpoint, this SHOULD be the challenge, not what word category is or isn't. If it's a case of "buy the book", then to buy 10 copies of "Gone with the Wind", and ONLY allow up to 10 readers to ONLY read "Gone with the Wind". Google could even have a "Google Online Library Card"; this is were the company hums "Ka-Ching".
  I think that's the idea, perhaps you should go check it out: http://books.google.com
Obnoxious by burgundysizzle · 2009-09-07 12:33 · Score: 3, Insightful

The inline replies are written with a smug sense of self-entitlement as though he and other "scholars" are the only legitimate users of Google Books. It's NOT about you - you are not going to create enough adsense hits to make this whole thing worthwhile (or turn a profit).
1. Re:Obnoxious by Volante3192 · 2009-09-07 12:45 · Score: 5, Insightful
  
  Definatly. It's like, "Oh, look, I found an error. If I had done this, that error wouldn't be there!!" And to that I respond, then do it yourself. YOU go tack metadata onto the 100 million books they have, you smug egocentric bastard.
  And, of course, he completely ignores the 999,999 proper entries compared to the 1 error. Google seems to know there's lots of problems here, and they're not going to get it right the first pass. But having a first pass at all is better than nothing.
2. Re:Obnoxious by fuzzyfuzzyfungus · 2009-09-07 14:18 · Score: 2, Insightful
  
  If you were a scholar, writing for an audience of other scholars, why wouldn't you write about the concerns of scholars and from their perspective? I'm sure he knows exactly why Google is doing what it's doing; but that doesn't mean that he can't point out the downsides.
  
  It's like saying that Slashdot is obnoxious because it is "written with a smug sense of self-entitlement as though he and other 'geeks' are the only legitimate users of the Internet". This is true; but that is because it is a geek website where geeks write about geek stuff. Obviously we know why Comcast is capping and packet shaping; but that doesn't mean we can't whine about the downsides for us.
The argument that should have been made here... by Looce · 2009-09-07 12:36 · Score: 2, Interesting

... is that academics can't rely on Google Books to make their bibliographies, because the publication date and authorship information, which are used in all citation styles (MLA, Harvard, etc.) are incorrect on Google Books for an apparently large amount of books. Categories aren't used in citations, they're used by searchers.
Jon Orwant of Google said that 1899 was a placeholder year for unknown publication dates, as provided by some of their metadata providers... which leads me to ask if they sanitise their data or do any research into publication dates themselves!
1. Re:The argument that should have been made here... by Anonymous Coward · 2009-09-07 13:06 · Score: 2, Informative
  
  WorldCat.org
  Find it on Google Books, look it up on there; Google Scholar if it is an article. I am a historian, and when I check citations (for journals or my own work), that is how I get it done.
Re:all the books in the world by Kalriath · 2009-09-07 12:38 · Score: 2

They forgot to count the Wheel of Time.

--
For a site about things like basic rights, Slashdot users sure do like to censor "dissent".
Google's brilliant vagueness by dpbsmith · 2009-09-07 12:44 · Score: 4, Insightful

This is much like Google itself.
Google's brilliance, and woe, is its sloppy imprecision.
You type in a query. It returns a bunch of stuff. Quite a lot of it is irrelevant and as perceived as not meeting the requirements of the search, but you don't mind because all you care about is that it finds what you want, not that it finds other stuff. Unfortunately, Google is so good that it tricks you into believing that it always finds everything that matches your query. But, of course, there's no way to find out what it _missed_.
I've personally noticed and been puzzled by the publication dates. I'd noticed it particularly with periodicals. What seems to be the case here is that Google is very prone to give the date that a journal began publication as the publication date of every article that has ever appeared in that journal.
Wikipedia editors are well aware of the dangers of using Google hit counts as data. It's amusing to see that there are 1,930,000 hits on "Ghandi" compared to 22,900,000 for "Gandhi" and conclude that Gandhi's name is misspelled 10% of the time... or to notice, as I have, that that percentage is increasing and project the year in which "Ghandi" must inevitably become the accepted spelling... but it is, as they say, "for amusement purposes only."

--
"How to Do Nothing," kids activities, back in print!
Too much information? by presidenteloco · 2009-09-07 12:47 · Score: 5, Insightful

Yes, having all of the world's literature available for instant full text search sounds
disastrous for scholars.

--

Where are we going and why are we in a handbasket?
Card catalogs by dpbsmith · 2009-09-07 12:49 · Score: 5, Interesting

Tangential, but "card catalogs." Ha! I once had a compelling need to look up an article in the Occasional Papers of the Bingham Oceanographic Collection. So I went to the card catalog.
It wasn't under O. It wasn't under P. It wasn't under B. It wasn't under C.
It was under N.
Why? Because, naturally, as of course everybody knows, the Bingham Oceanographic Collection is part of the Peabody Museum. Which is part of Yale. Which (drum roll...)... ...is in New Haven.
The great thing here is that you can't even say there was an error in the card catalog, unless filing something under a heading that is perfectly correct, but under which nobody would dream of looking for it, is considered an error.

--
"How to Do Nothing," kids activities, back in print!
1. Re:Card catalogs by Peter+H.S. · 2009-09-07 15:03 · Score: 4, Informative
  
  Well, organizing books by listing them in which city they are from (printed) is among the oldest way of cataloging printed books. The practice goes back to Gutenberg and the so called "incunabula" period where book dealers/printers/publishers (often the same persons) would make book catalogs out a certain city. So if you needed a certain edition of a title, you would have to track it by such book catalogs, since the Leipzig edition would be different from the Mainz edition.
  It is of course sad that once such common knowledge among scholars now seems forgotten, probably not a hindrance when working with modern sources, but still necessary to know when working with old stuff, just like knowing that words/names starting with J were filed under I etc.
  Many academics still puts the printing city in their sources, though many seems to have forgotten why they do so.
  You just happened to stumble into a book /journal catalog organized by a centuries old and previously very well known method. The error wasn't in the card catalog or the way it was organized, but in that no one ever told you about these ancient methods in your library course.
  --
  Regards
Book publishers endangered, cry me a river by moon3 · 2009-09-07 12:49 · Score: 4, Insightful

They pushed the copyright law to over hundred years (just to make sure they will make money of writers even after they are dead), now comes our big brother Google to the ring to resurrect all the OUT OF COPYRIGHT books -- meaning those dead books that publishers no longer exclusively distribute. What an offense against the poor publishers. Google is creating a real e-Library of enormous proportions of virtually free books, what a threat. I bet I am not alone who wants to see the Newton's books on physics e-published again and searchable.
Spurious Argument by mikethicke · 2009-09-07 13:25 · Score: 2, Interesting

As an aspiring academic half way through a philosophy Ph. D., I find Nunberg's argument pretty absurd. Google books is a godsend for academics, and would be much more so if there was full access to their entire catalog rather than "limited previews" for most books. I have used Google books countless times to quickly check out whether a book is relevant to my research, or to get the gist of an author's argument without having to trudge down to the library. I know many others who do this as well. In all this time I've never even looked at Google's metadata. No decent academic would rely on such information, as there are far more reliable methods: such as actually checking what's written in the book, which yes, Google scans in.
Scholars have lawns too you know by syousef · 2009-09-07 14:12 · Score: 2, Interesting

This could be the stupidest and most disingenuous argument I've encountered all year. I guess I'll never know since the metadata is not at my finger tips. This might be a good argument for getting the metadata right. It isn't a good argument for tossing the virtual books out with the bathwater.
So no I won't get off your lawn. We're better off without scholars who'd rather hoard information. Begone!

--
These posts express my own personal views, not those of my employer
Re:The ISBN are still the same by fuzzyfuzzyfungus · 2009-09-07 14:23 · Score: 2, Insightful

Which is incredibly helpful for anybody interested in printed materials before 1966...
Re:Incredible arrogance of the "scholar" by bigbigbison · 2009-09-07 15:46 · Score: 2, Insightful

I don't read him as saying, "any book that can be found in the holdings of a major research library is only of interest to scholars." at all. Rather, I read him as sayin that the systems that libraries use to organize books be they Dewey Decimal, Library of Congress, or some other system were created to help organize books for users to use them. The BISAC classifications were developed to help companies sell books. Why use that rather than what the libraries -- the source of these books -- uses?

--
http://www.popularculturegaming.com -- my blog about the culture of videogame players
Re:Incredible arrogance of the "scholar" by grcumb · 2009-09-07 16:26 · Score: 3, Interesting

And I think he's entirely off-base. Nose-in-the-air "Scholars" like this gentleman fail to recognize that Google's efforts are about making material available to "the rest of us" who don't have access to those major research libraries. And categorical indexing of material makes complete and total sense if you expect to have non-PhD sorts searching for it.
You're fighting the wrong battle here. It's easy to find any number of legitimately nasty things about 'Scholars' and 'Academics' and elitism in general. But arguing for proper classification in Google Books is not one of them.
For several years I was an avid amateur of Information Retrieval. Classification (and other useful organisational models) of information into related collections is essential when you don't know what keywords you're looking for. This is especially important with historical works, where the use of 21st Century names, terms and other common keywords is next to useless.
Google search is useful when you know what you're searching for. But knowing what to look for in Google Books is an entirely different matter. Categorisation matters here.
By using a classification system that is designed for book sellers, Google's chosen a very poor set of criteria. Not only will most of the titles be poorly characterised (and thus harder to find), the effort required to find them increases with their rarity or uniqueness. These aren't always a measure of importance or interest, but often enough, they are.
Asking Google to consider a proven, effective and well-understood categorisation system is not being snooty; it's an effort to suggest - as we geeks often do - that there might actually be a correct way to perform this task.
Sometimes what looks like 'arrogance' is actually the state of being right about something when no one else will listen.

--
Crumb's Corollary: Never bring a knife to a bun fight.
if you don't like it... by caitsith01 · 2009-09-07 18:52 · Score: 2, Informative

Why did they bother?
Why did you bother to comment on it? If you don't like it - don't use it.
You are clearly ignorant of the key problem with the Google books settlement (as it currently stands), which is that Google and only Google will be given the right to reproduce orphaned works. I assume the morons tagging this "caveat emptor" are also ignorant of this.
So your glib remark should more correctly read, "if you don't like it, never have access to millions of pages of orphaned copyright works again because Google has an exclusive licence to reproduce them electronically". Which doesn't quite work as well, really, does it?

--
Read Pynchon.
There is no reason by ikkonoishi · 2009-09-08 00:14 · Score: 2, Funny

There is no reason for you to post this comment here when you could have put together a properly formed and documented essay in a couple of months. There is was no reason for Newton to come up with his theory of gravity when in a few centuries Einstein would come up with a more complete theory.
This is a long term project for humanity. We damn well better start now rather than waiting to do it right. Badly data can be cross compared and corrected. Data which has not been digitized at all is completely useless (Towards the purpose of having digitized data). In the time it took you to complain about it you could have pulled up a few scans, and done some good old fashioned legwork in the form of copying it out in ASCII and redrawing the illustrations like clerks of old.
"as long as the books are fine.." but many are not by waterbear · 2009-09-08 01:19 · Score: 2, Interesting

As long as the books themselves are perfectly fine (which they seem to be),
Well, some are really good and well scanned, but others are a mess. From some organizations that do the scanning, you get missing pages and mangled pages. You get pages where the person doing the scanning sometimes put their hand between the page and the glass, so you can read the rings on their fingers but not the text on the page. (Books scanned at NY Public Library for example.) If ever there is a fold-out, you get at max half of it.
The Google Books organization doesn't seem to want to know, there is a mechanism for reporting single page defects but when 50 defects occur in a book it gets hard to work through them all using the button-clicks: I tried it for two books and also sent a message to Google Books, there was an automated reply and no action after several months.
So much for 'As long as the books themselves are perfectly fine ....', I'm afraid.
-wb-
Another Day of Microsoft Trolling? by ajs · 2009-09-08 06:02 · Score: 2, Interesting

I hate to be so cynical, but there was a huge uptick in negative articles on Slashdot about Google as soon as Microsoft started their anti-Google PR effort in DC. Now I see at least one anti-Google article on Slashdot every day. Is Slashdot falling for an extensive trolling effort from MS?
More info available from previous Slashdot article...