Slashdot Mirror


Google Books As "Train Wreck" For Scholars

Following up on our earlier discussion, here's more detail on Geoffrey Nunberg's argument that Google Books could prove detrimental to academics and other scholars. Recently Nunberg gave a talk at a conference claiming that the metadata in Google Books is riddled with errors and is classified in a scheme unfit for scholarly use. This blog post was fleshed out somewhat a few days later in the Chronicle of Higher Education. Quoting from the latter: "Start with publication dates. To take Google's word for it, 1899 was a literary annus mirabilis, which saw the publication of Raymond Chandler's Killer in the Rain, The Portable Dorothy Parker, [and] Stephen King's Christine... A search on 'internet' in books written before 1950 and turns up 527 hits. ... [Google blames some errors on the originating libraries.] ...the libraries can't be responsible for books mislabeled as Health and Fitness and Antiques and Collectibles, for the simple reason that those categories are drawn from the Book Industry Standards and Communications codes, which are used by the publishers to tell booksellers where to put books on the shelves. ... In short, Google has taken a group of the world's great research collections and returned them in the form of a suburban-mall bookstore." The head of metadata for Google Books, Jon Orwant, has responded in detail to Numberg's complaints in a comment on the original blog post — and says his team has already fixed the errors that Nunberg so helpfully pointed out.

160 comments

  1. Who needs metadata any more by Nefarious+Wheel · · Score: 3, Insightful

    ...when you have Search? Pick your own keywords.

    --
    Do not mock my vision of impractical footwear
    1. Re:Who needs metadata any more by timeOday · · Score: 5, Interesting
      To read the article, it is mostly a problem for people who are essentially studying trends in metadata itself, such as the emergence of some particular word over time. The "oddball" categorizations, I agree, why would anybody browse the "technology" section of a collection with millions of titles?

      The odd thing about complaining about this is, what are they comparing to? A hypothetical perfect online database that doesn't exist anyways? The article says google got it wrong in some cases where, e.g. the Harvard Library got it right. OK, that's an issue for all of us deciding whether to search on our nearest computer, or at the Harvard library.

      To me, google's project was a long time coming - somebody had to scan the world's back catalog. Maybe it would be better if governments had done it, but (and this is the point) they didn't. Google is.

    2. Re:Who needs metadata any more by Artraze · · Score: 4, Insightful

      > The odd thing about complaining about this is, what are they comparing
      > to? A hypothetical perfect online database that doesn't exist anyways?

      That's exactly why this article is little more than some long winded trolling. So the metadata is wrong... As long as the books themselves are perfectly fine (which they seem to be), you can always check the metadata your self. I must think that as far as Google is concerned (and 99+% of its users) the metadata isn't nearly as important as the data itself. Once the data is collected you can always fix the rest.

      Expect a new "tagging game" in the next year or two to manually correct these error.

    3. Re:Who needs metadata any more by Anonymous Coward · · Score: 0

      Given who this affects, would a restricted wiki concept where librarians can make corrections while citing their sources be a reasonable way to address this shortcoming? I realize manually editing 100 million + pieces of meta-data is a very intensive project, but if it is really around 5% that have issues and the issues are obvious to those familiar with the material, it shouldn't be too bad - maybe make a for pay book available for free to anyone making said correction?

    4. Re:Who needs metadata any more by Potor · · Score: 5, Interesting

      Exactly. And the whole argument totally ignores the fact that these books are now easily available.

      Shock horror: I am a liberal arts scholar. And Google Books has helped me incredibly in a project I am doing on a 18th century scholar. I have original texts in various editions at my fingertips, wonderful reference books (including a dozen 18th and 19th century Latin grammars), and serious secondary literature. Not all of these are fully posted on Google Books, but now I know what books to check out of the library, or even buy.

      As an arts scholar, I love Google books.

    5. Re:Who needs metadata any more by lobiusmoop · · Score: 1

      "somebody had to scan the world's back catalog."

      Interestingly, Vannevar Bush proposed doing this in 1945. Shame it's taken so long to come to fruition.

      --
      "I bless every day that I continue to live, for every day is pure profit."
    6. Re:Who needs metadata any more by martin-boundary · · Score: 4, Insightful

      The odd thing about complaining about this is, what are they comparing to?

      How about good old fashioned legwork? It *is* possible to make sure that the metadata is consistent with the facts, but that involves doing actual research and verification such as academics have been doing for hundreds of years.

      To me, google's project was a long time coming - somebody had to scan the world's back catalog.

      Then you have very low standards indeed. There's absolutely no reason why a single entity had to / has to scan all the world's back catalog on their own as fast as they can. It's pure commercial greed, and leads to the garbage we have on the net today.

      What is needed is an open standard for scanned works, with minimum resolution, minimum quality, and minimum verified metadata such as subject, author, publisher, year etc. All those are trivially listed on the title page of every book. All one has to do is open the damn book and flip a few pages, but that appears to be too hard for some people.

      This is a long term project for humanity. There's absolutely no point in having crappy scans with garbage metadata available quickly today, when it could be available correctly with good quality in say five years. It's also a perfect case for crowdsourcing, with some real standards to ensure quality.

      The current dreck that's online only causes duplication and waste. Take a look someday at archive.org (for example), and see how many copies of the same book are available, if it's a popular book. You'll typically find 5-10 scanned versions, by Google, Microsoft, and various local library projects, in black and white or colour none of which is truly good quality: broken characters, pages with dark margins, missing pages, typos or incorrect titles, wrong authors etc.

      Why did they bother?

    7. Re:Who needs metadata any more by riffzifnab · · Score: 1

      I wouldn't say the whole article is a troll (the "omg Google book monopoly" stuff sure). It did bring to light some errors and even got them fixed, that's worth something.

    8. Re:Who needs metadata any more by martin-boundary · · Score: 1

      Who needs metadata any more ...when you have Search? Pick your own keywords.

      This is missing the point. The metadata is being used by search engines for indexing, so when the metadata is incorrect, you'll get incorrect results filling up your keyword search.

      In a typical search (on Google or any other search engine), you input a few keywords. Those keywords tend to match a very large number of documents, so there needs to be a method of ranking them so that you see the most likely ones first. The ranking function always uses metadata of one sort or another, so if the metadata is incorrect, you'll get a suboptimal ranking function.

      You can see this effect with spam in websearches. The spammers identify the metadata that Google uses for ranking (links, page title, content structure...), and deliberately fill the metadata with incorrect information.

      The same is true for books when the number of books is large. If the metadata is incorrect, then you won't get the exact book you're looking for in the top, you'll have to page through a long list to get there.

    9. Re:Who needs metadata any more by The_Quinn · · Score: 0, Troll

      Why did they bother?

      Why did you bother to comment on it? If you don't like it - don't use it.

    10. Re:Who needs metadata any more by martin-boundary · · Score: 0

      Why did you bother to comment on it?

      For the same reason you bothered to reply to me. I felt that the parent comment does not give a full point of view, and this inspired me to rant a bit to redress the balance.

    11. Re:Who needs metadata any more by Anonymous Coward · · Score: 1, Interesting

      I'm an architect who often works on buildings from the 19th century and I cannot sufficiently express the joy, wonder, and happiness I feel browsing the material Google has made available.

      I am bewildered (but not surprised that it comes from an academic) that someone would suggest that this information, formerly molding away in some special collections department (or being shredded into fodder), should continue to be sealed off from the world because a quickly-obsolescing categorization scheme has not been applied to it with sufficient care: as other posters have noted, since the entire text can be searched for keywords, the meta-tags are largely irrelevant, and since were talking about complete scans of original sources, the desired data is there, embedded in the source!

      Anyone who call this new, incredibly rich (yet free!) database "dreck" and "garbage" is an idiot; one need only look at the current state of academia to see where that scow truly sails.

    12. Re:Who needs metadata any more by pdabbadabba · · Score: 1

      You ask "Why did they bother?" as though their archive is of no use whatsoever in its current state and we should all just wait for the completion of the "long term project" (that nobody, to my knowledge and as you define it, is working on). On the contrary, Google's project is extremely useful if you are interested in something other than the borked metadata like, I don't know, the words actually written in the books.

    13. Re:Who needs metadata any more by Anonymous Coward · · Score: 1, Funny

      Exactly. And the whole argument totally ignores the fact that these books are now easily available.

      Shock horror: I am a liberal arts scholar. And Google Books has helped me incredibly in a project I am doing on a 18th century scholar. I have original texts in various editions at my fingertips, wonderful reference books (including a dozen 18th and 19th century Latin grammars), and serious secondary literature. Not all of these are fully posted on Google Books, but now I know what books to check out of the library, or even buy.

      As an arts scholar, I love Google books.

      Yes! That's the true value of Google Books! What would the world do without another liberal arts scholar doing being lazier than the prior generation? [Note: I didn't say smarter. I said, lazier.] The parent poster talking about standards is being smart and hits it out of the park for even lazy art scholars to get value out of a truly valuable research tool.

      Google books is not that tool

    14. Re:Who needs metadata any more by martin-boundary · · Score: 1
      No, I ask why did they bother to not do it right?(*) I think that question is appropriate whenever the work will have to be substantially reprocessed or even scanned again by someone else in the future. That criticism applies to a whole lot of books published before the 1950s.

      The fact that for any one book, you can read and quote those parts that are scanned well, and you can search those words that don't happen to be OCR'd incorrectly, and you can research for yourself what the title and author and year is anyway, that's great. But it's not super duper great, and (imho) should not be construed as such.

      (*) Yes, we know why. The books are usually scanned by students who are paid a bare minimum, and who don't bother to check their work. The scanning machines are not calibrated correctly. The scans are processed automatically by scripts that take care of 80% of cases and fail the remaining 20%. The metadata goes through systems that fail to be 8-bit clean. Etc. Etc. Pick any two of fast, cheap, or good.

    15. Re:Who needs metadata any more by chthonicdaemon · · Score: 1

      Worse is better. I would rather have a barely-legible scan of a book right now than a perfect copy in five years when my research is already old. There's a time value to the availability of data. I would like to think that the standards you speak of could be achieved, but all the evidence we have shows us it's the opposite. How many web sites comply to standards? How many well-ripped MP3s have you downloaded? Heck, how many well-written books (complying with all the language and grammar standards) are there as a fraction of all books?

      Now, I can get behind the idea that one company shouldn't have a monopoly on the ability to put these books online. In fact, I don't believe copyright is a particularly good idea, so I can get behind the idea that we should all be able to scan the books we have and put them online (and I've contributed to project Gutenberg), but that doesn't mean Google's efforts are completely worthless.

      --
      Languages aren't inherently fast -- implementations are efficient
    16. Re:Who needs metadata any more by pdabbadabba · · Score: 1

      Well, it sounds like you're just saying that it's a shame that they weren't able to do a better job. And surely that's true. So, if that is indeed what you're saying, we have no disagreement.

      It's just that it sounded to me like you were saying "Why do something if you can't do it perfectly?" and that seemed to me like an obvious mistake. I'm glad to hear, then, that I misunderstood.

    17. Re:Who needs metadata any more by zubiaur · · Score: 1
      As you said, it is possible to make metadata consistent, in fact, it is not all that hard, the hardest part(digitalizing)however is already done, or at least, in process of getting done.

      The google method of digitalizing is at least adequate, it doesn't require much human intervention, is a fast, easy to do process and renders copies of adequate quality, I have been using GB for quite some time and I have yet to found an illegible or missing page.

      An open standard for scanned works is nice and everything, but is it within a company interest? how could google, microsoft or whoever is into scanning work benefit from them? wouldn't it be better for THEM to define the standard? why should they bother if they can still make a buck and the costumer/user be happy? In many cases good enough is enough.

      Going back to the metadata issue, can you see yourself correcting some metadata? adding the year a book was published into a simple text box? I can clearly see myself doing that, however, can you see yourself scanning a book or two, or a hundred with consistent quality?. The hardest part is getting done, our literate robotic overlords are doing it four us, if google provided a way for users to add that metadata it would be awesome, someone coming close to what google is doing wouldn't hurt either.

    18. Re:Who needs metadata any more by Anonymous Coward · · Score: 2, Interesting

      Why did they bother?

      1. I call absolute BS on the poor scanning quality. I have looked at 50+ books on Google Books, and not once noticed a problem with the scanning. Certainly a hell of a lot better than *I* would have done.

      2. The cost and time and legal battles required to do the scanning pretty much make it impossible unless a private corporation is leading the charge. What good does it do to try to rely on random-ass people to scan every book in existence, and every book as it comes into existence as fast as it comes? Good luck with that. And what makes you think they'd do a better job than a company that's devoted huge amounts of work to mastering the single repetitive task required to do it practically, and that can apply that practice to every single book?

      3. If you're worried about Google being evil / being too powerful / blah blah, fine, but since you don't mention that, I think you have to honestly believe they just suck. Perhaps you'd rather the US government spend 10s or 100s of millions of dollars to do it instead, because they really need to spend more money right now, and we can trust THEM to do it well.

      4. What does poor metadata have to do with anything? The task of scanning is completely separate from the OCR that goes into metadata. As Google improves their OCR, the metadata will fix itself. Or, you know, since this is a manageable task, maybe people can contribute on their own. Like the authors of this article did, and which Google gladly accepted.

      Since there are ACTUAL problems with Google Books (you know, like the ethical ones), maybe you should complain about those instead of this nonsense.

    19. Re:Who needs metadata any more by RandomUsername99 · · Score: 3, Informative

      I worked for the Harvard Law School Library and saw such a work in progress for the documents used in the Nazi war crimes tribunal at Nuremberg. The process of putting this together was extrordinarily expensive and even with the HLSL donating the Server, Traffic, labor to maintain the back end code (which it still does), etc. the project ran out of funding 13,904 scans in and is currently seeking funding.

      Although the metadata surrounding the scans of these books would not have to be nearly as detailed, it's worth noting that google is not a non-profit organization with a set of gigantic grants for book preservation. They needed to put together something that would make enough money to at least fund its own existence immediately.

      Why did they bother? Is it enough that it's useful to many people even if it's not useful to everyone?

      One could certainly put together the electronic preservation project of everyone's dreams... I wouldn't be surprised if some very smart people somewhere in academia have already designed it. Sooo if you would be so kind as to cut them a check so it doesn't have to be up to a company who's worried about it being a financially solvent program from a business perspective, I bet they'd start tomorrow.

    20. Re:Who needs metadata any more by introspekt.i · · Score: 3, Interesting
      You act like the technology and processes use to generate this catalog are going to remain deficient indefinitely. You ignore the fact that consumer demand for better (metadata|accuracy|whathaveyou) will drive improvements in the technology. In the meantime, we get access to the early iterations of the technology and the benefits it can provide today.

      What is needed is an open standard for scanned works, with minimum resolution, minimum quality, and minimum verified metadata such as subject, author, publisher, year etc.

      Necessity is the mother invention. Wait for one to pop up, or go make one up. Nobody's stopping you.

      All those are trivially listed on the title page of every book. All one has to do is open the damn book and flip a few pages, but that appears to be too hard for some people.

      Opening the covers of every possible resource you use is quite easy when you have a discrete, present set of resources to thumb through. What if your resources aren't present, are high in number, or (lo!) are undefined...because you don't even know what exactly it is you're looking for?

      This is a long term project for humanity. There's absolutely no point in having crappy scans with garbage metadata available quickly today, when it could be available correctly with good quality in say five years.

      I think you're absolutely wrong. It's naive to assume we can just have an instant rubber-meets-the-road system available in x years without rigorous testing and input on the part of users. No point? Hah! This is absolutely the best way to go about things! Let the system work itself out with angry users pushing technicians to improve archives to have the best working system in the end. The Google system is hardly "done" and it's only going to get better with time.

      The current dreck that's online only causes duplication and waste. Take a look someday at archive.org (for example), and see how many copies of the same book are available, if it's a popular book.

      God forbid we have multiple copies of popular books in different archives.

      black and white or colour none of which is truly good quality: broken characters, pages with dark margins, missing pages, typos or incorrect titles, wrong authors etc.

      Quality is relative. Why prohibit use because we lack perfection?

      Why did they bother?

      Why did you bother? Why did I bother? Why does anybody bother? Probably because we all feel like it.

    21. Re:Who needs metadata any more by Anonymous Coward · · Score: 0

      You sir are living in the ideal world, utopianism will never get anyone anywhere, better to get started and have something rather than empty ideals which translate into nothing, now at least a whole load of folks have access to these works online and that's useful, which is the purpose of the project.

    22. Re:Who needs metadata any more by dbcad7 · · Score: 1

      Well, it would seem that if it is a matter of scan quality, then it should be somewhat easy from here to throw some computing power into OCR and clean things up.. It would take up a lot less space as well, I imagine. Of course pictures, and illustrations are going to be difficult and probably never to anyones satisfaction... Perhaps, just perhaps, having done the first step of capturing the scanned data makes the 5 year job of converting them to the way you want, possible.

      --
      waiting for ad.doubleclick.net
    23. Re:Who needs metadata any more by WWWWolf · · Score: 1

      ...when you have Search? Pick your own keywords.

      Why do they build nuclear reactors, when you can get electricity from the wall socket these days?

      Seriously, though: Correct, systematic and well-defined metadata makes searching more effective. Lack of metadata means you're going to comb through the results yourself looking for the stuff that matches the criteria that the search engine doesn't let you enter.

    24. Re:Who needs metadata any more by Lincolnshire+Poacher · · Score: 1

      > ...when you have Search? Pick your own keywords.

      Unfortunately there are some major problems with searching, such as ``A OR B'' returning fewer results than when searching separately for A, B.

      http://www.gale.cengage.com/reference/peter/googlebooks.htm

    25. Re:Who needs metadata any more by Anonymous Coward · · Score: 0

      Yeah, but i find this trolling positive, in the sense that it led to discussion and fixes.
      And as you said, a tagging game might even come out of it and could get people more interested in certain books.

      Trolling for discussion is a fine art.

    26. Re:Who needs metadata any more by smallfries · · Score: 1

      Are you aware that OCR is not perfect? You ask why Google did not do it right as if they had chosen a cheaper / faster option. The software that they used has an error rate of one in million characters. This is the best that is currently available. The problem is that with the sheer amount of text Google has scanned they expect about a million errors.

      You seem quite insistent that they've messed up somewhere. How would you have done it better?

      --
      Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
    27. Re:Who needs metadata any more by hoooocheymomma · · Score: 1

      a project I am doing on a 18th century scholar

      Hm... I'm not so sure I'd call you a scholar...

    28. Re:Who needs metadata any more by hxnwix · · Score: 1

      All those are trivially listed on the title page of every book. All one has to do is open the damn book and flip a few pages, but that appears to be too hard for some people.

      Exactly! Just flip a few pages in the scanned book, and...

      Or is that too hard for you?

      Why did they bother?

      Why not? Because the results are not perfect? Jesus, man... Point out errors as you find them to Google or Microsoft if you truly want them fixed.

    29. Re:Who needs metadata any more by Anonymous Coward · · Score: 0

      As an arts scholar, I love Google books.

      That's super. Now finish making my latte, I'm late for work.

    30. Re:Who needs metadata any more by Attila+Dimedici · · Score: 1

      . There's absolutely no reason why a single entity had to / has to scan all the world's back catalog on their own as fast as they can.

      First, you make a good point. The danger of Google doing this is that once they have done it (no matter how poorly), if it is comprehensive, it significantly reduces the incentive for another organization to do it. This is compounded by the agreement that Google reached with the Authors' Guild, which makes it legally problematic for another organization to do it.
      It doesn't mean that Google should not have done it, but it does mean that it is important for people to point out the shortcomings of Google's effort. By loudly complaining about the shortcomings of what Google has done here, the author(s) push Google to fix the problem and/or make it easier for someone to gain the funding to create an online collection that addresses their concerns.

      --
      The truth is that all men having power ought to be mistrusted. James Madison
    31. Re:Who needs metadata any more by natehoy · · Score: 2, Interesting

      Given a project of this magnitude, there are inevitably going to be bad scans, and bad data, and other issues.

      And, just as inevitably, the problem areas are going to be updated and replaced with good ones when they become available.

      "There's no point in having crappy scans with garbage metadata today" would be indisputably true if every book out there was a crappy scan with garbage metadata. Instead, what we have a starting point with some good scans and some bad ones, but there's no point holding back the entire project just because some of the books have bad scans or metadata. You go live with what you have, then add/correct as needed.

      Remember, too, that none of these books replace what is available in your local library, they supplement it. If your local library has a copy of a book you want, it's still there. If they don't, Google Books will probably have it. Chances are, their scan will be good, but let's assume it's not. Isn't a barely readable version better than no version whatsoever?

      This isn't a NASA mission. If a book ends up being a crappy scan, it won't explode on re-entry killing its reader.

      This is, however, a for-profit venture. As such, it cannot wait until every page of every tome is pristine before it goes live.

      Sometimes, you go live with what you've got, even if it's not perfect, because it's not only in the best interests of profit, but because there's a benefit to having the product out there. Google Books will start as a supplemental database, and where there are good scans of books with good metadata, this will make books more available and accessible to all. Books will be missing from its catalog, and books will be unreadable at times, and books will be misfiled, but the same is true of any library.

      Google Earth went live long before detailed imagery was readily available for a lot of the world, so those who lived in an area of the world that lacked detailed imagery saw low-res imagery (green fuzzies, with a vague idea of where really big things might be) where the pictures should be. As the imagery became available, they added it to the basemap. But Google Earth made detailed cartography available to the masses in a way that it had never been available before. And, hopefully, Google Books will be able to do the same with the written word.

      --
      "This post contains words, known to the State of California to cause thought. Wash brain thoroughly after reading."
    32. Re:Who needs metadata any more by InsaneGeek · · Score: 1

      Actually I'd say that one of two things should happen... Google is allowed to do this but they have to hand over the all end result data to the US government for it's free use by any other individual/organization in the US after a 2-3 year exclusive embargo; or the US government should fund doing this and again allow anybody in the US to use the results.

    33. Re:Who needs metadata any more by AndersOSU · · Score: 1

      Wow contradictions.

      There's absolutely no reason why a single entity had to / has to scan all the world's back catalog on their own as fast as they can

      The reason one entity is doing this is the orphaned copyright problem. Google was sued and settled, and now seems to have the right to distribute these copyrighted works. Anyone could do the same, but they'd have to be prepared for the legal battle. Perhaps this is a question best settled through the legislative process, but that's not the way things stand today. Besides, if there wasn't a single entity doing the scans, wouldn't we end up with, "5-10 scanned versions, by Google, Microsoft, and various local library projects, in black and white or colour none of which is truly good quality: broken characters, pages with dark margins, missing pages, typos or incorrect titles, wrong authors etc."?

      As for your other complaints, they boil down to - someone making data available, which is helpful, but it's not good enough for my (specialized) purpose... To which, I'll let you answer yourself:

      How about good old fashioned legwork? It *is* possible to make sure that the metadata is consistent with the facts, but that involves doing actual research and verification such as academics have been doing for hundreds of years.

    34. Re:Who needs metadata any more by foobarbaz · · Score: 1

      > it could be available correctly with good quality in say five years

      Computers will be better in five years, too. Please turn yours off until then.

    35. Re:Who needs metadata any more by ajs · · Score: 1

      How about good old fashioned legwork? It *is* possible to make sure that the metadata is consistent with the facts, but that involves doing actual research and verification such as academics have been doing for hundreds of years.

      Read the text from the last link in the Slashdot blurb. That's Google's response (and the original complaint's author's responses inline). In it, Google clearly lays out each of the errors cited (some as batches) and what sorts of errors they stem from. However, the really telling part is the numbers. They have over a trillion metadata records for hundreds of millions of books. In those trillions of records, they claim to have millions of errors. Think about that for a second....

      For a database that hasn't even been officially launched, that's an astounding thing. If true, it's far, far better than anyone would have guessed it would be at this stage.

      Google's also pointed out that they're working hard on this, and that they're mostly succeeding as a result of filtering out bogus input from a number of external sources. We'll be finding errors in Google's metadata forever. There's no way around that (it's simply a problem of diminishing returns), but on the whole, there is no other database like this, and I would expect that the majority of those that do exist and are smaller have similar error rates.

    36. Re:Who needs metadata any more by lennier · · Score: 1

      "This isn't a NASA mission. If a book ends up being a crappy scan, it won't explode on re-entry killing its reader."

      That would probably make reading cool again though.

      --
      You are not a brain: http://books.google.com/books?id=2oV61CeDx-YC
    37. Re:Who needs metadata any more by natehoy · · Score: 1

      ...unless you end up with the book with the bad O-rings.

      --
      "This post contains words, known to the State of California to cause thought. Wash brain thoroughly after reading."
    38. Re:Who needs metadata any more by URL+Scruggs · · Score: 1

      Glad to see you're thinking big... Get all of the world's knowledge in a database and keep it to yourselves. Buckminster Fuller would be rolling in his grave if he heard this.

    39. Re:Who needs metadata any more by jc42 · · Score: 1

      not? Because the results are not perfect? Jesus, man... Point out errors as you find them to Google or Microsoft if you truly want them fixed.

      Yup, just as the authors of the article did. And google seems to have fixed those errors now.

      Of course, we could also complain about the people who are so publicly pointing out all the errors in google's data. But the public comments here and in various other "scholarly" forums have served quite nicely to bring the problem to the attention of a lot of people. Many of the readers will just criticize google and/or the authors of such articles. But many other people will dig into the data, find more errors, and tell google about them.

      The main problem is that, as a few people have pointed out, when you've found 50 or 500 errors, google's reporting mechanism is overly complex and takes far too many clicks. But maybe if some of us do the work and complain about the time it takes to send in the error reports, google's programmers will come up with a more time-efficient way to mark errors on our screens and send in a batch of corrections in one click.

      This whole thing is obviously just a Good Start. It can obviously be improved. That will happen mostly because people cooperate with google.

      But publicizing the problem is also a good idea. Otherwise, the few of us who are willing to help fix the problems would likely never known about the problems.

      One obvious suggestion is to do the error correction via a wiki-like mechanism. I'd guess that some folks at google are looking into it. And, as anyone who has contributed to wikipedia knows, this approach has its own obvious and well-known problems. But with proper controls, it could help to cut down the error rate by a few orders of magnitude.

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
  2. Redundant article. by commodore64_love · · Score: 0

    I already read that quote about "inaccurate metadata" and "1899 was a literary annus mirabilis" half an hour ago when the first article was posted.

    --
    "I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
    1. Re:Redundant article. by Volante3192 · · Score: 1

      Yes. This is a followup including a link now to the Google blog addressing the metadata issues with the original links there for reference.

      Did you not read the last line of the summary?

  3. Detrimental? by 2.7182 · · Score: 1

    How? If you don't like it just ignore it.

  4. Error free system? by Bacon+Bits · · Score: 2, Informative

    So, the argument is that the new system is bad because it may have errors or bad data?

    Were card catalogs immune to this? It's a database. It's only as good as what you put into it. A bad database is not useful. It just means someone needs to do it better. Honestly, if anything this seems like an argument that the database shouldn't be proprietary. It should be open to everyone so that someone can always make a better version of the metadata with the same base data.

    "It's a piece of shit" shouldn't be the same argument as "nobody should even try it". The Wright brothers didn't exactly start out with a 747 or an F-35.

    --
    The road to tyranny has always been paved with claims of necessity.
    1. Re:Error free system? by jellomizer · · Score: 0, Flamebait

      These are educated scholars. They are not expected to be able to think for themselves or find new and interesting things to take advantage. If it is old tried and true and just as long as someone else said it in the past then it is insightful.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    2. Re:Error free system? by fuzzyfuzzyfungus · · Score: 1

      I think the argument that TFA is making is not merely "it's a piece of shit"; but "it's a piece of shit, and a regression(in terms of metadata) because their method is designed to meet very different objectives". They may or may not be correct(I certainly suspect that scholarly use was not Google's #1 priority, when they hope to get to that and how much they are willing to spend to achieve it I don't know); but it is a much more serious charge.

      The Wright brothers didn't start out with a 747 or an F-35; but they didn't start out by building toy bird models, either. TFA's thesis seems to be not that Google books is merely inaccurate; but that their methods are unsuitable to scholarly use and aimed at a quite different end.

    3. Re:Error free system? by hot+soldering+iron · · Score: 1

      Then I guess he is free to start his own collection aimed at scholarly use. The he can be on the receiving end of criticisms that he didn't design his system for normal humans instead of academics.

      --
      When you want something built, come see me. If you want correct grammar and spelling, get a F*ing liberal arts student.
    4. Re:Error free system? by Bacon+Bits · · Score: 2, Interesting

      The Wrights didn't start out building toy birds, true. They first tried to use the data from some Russian or European who had modeled wings after birds. They found that the lift his data predicted was so far off from what they observed in their gliders that they could no longer assume that the data hadn't just been made up. Then they went and built a small scale wind tunnel and designed small model wings which could be reformed and shaped and angled easily and a scale which could be used to measure lift from the wing model. So, no, they didn't start out building toy birds. They effectively ended up doing that when they discovered how little data there was on the subject of a wing. They took a step back to toy bird models.

      http://www.hulu.com/watch/23333/nova-wright-brothers%E2%80%99-flying-machine

      --
      The road to tyranny has always been paved with claims of necessity.
  5. all the books in the world by QuantumG · · Score: 1, Interesting

    We are trying to correctly amalgamate information about all the books in the world. (Which numbered precisely 168,178,719 when we counted them last Friday.)
          - Jon Orwant (Google)

    why does that number seem incredibly low to me?

    --
    How we know is more important than what we know.
    1. Re:all the books in the world by RichardDeVries · · Score: 1

      Because it's a joke?

      --
      Error 001
      Security Scan and Virus Detection do not work with your operating system.
    2. Re:all the books in the world by Bacon+Bits · · Score: 4, Funny

      They haven't finished counting Stephen King's books yet.

      --
      The road to tyranny has always been paved with claims of necessity.
    3. Re:all the books in the world by Anonymous Coward · · Score: 1, Interesting

      Harvard's library is about 16 million books, the library of congress has about 32 million books, so that total seems reasonable. If this is just a merge the card catalogs of the world, I'm actually surprised that the number is not smaller. I admit, there are probably some books that are in no database/card catalog, maybe sitting in a cave in Tibet or somewhere is the middle east, but those that we can find and identify? This seems about right - I would have guessed 100-200 million.

    4. Re:all the books in the world by Anonymous Coward · · Score: 0

      And Dean Koontz novels too.

    5. Re:all the books in the world by Kalriath · · Score: 2

      They forgot to count the Wheel of Time.

      --
      For a site about things like basic rights, Slashdot users sure do like to censor "dissent".
    6. Re:all the books in the world by maino82 · · Score: 1

      They started in the A's and quickly got to Asimov's books... only about half way done with them at this point.

    7. Re:all the books in the world by Anonymous Coward · · Score: 0

      And Barbara Cartland books. Gee I hope they scan the covers well.

    8. Re:all the books in the world by mathx314 · · Score: 1

      It's a bit hard to judge, naturally, but it doesn't seem that far-fetched. There's what, 6.5 billion people in the world? This means that, on average, there's ~35.68 people/book. That doesn't include books written by dead people, but there's more people alive than dead so it doesn't seem unreasonable to guess that that's about 70 people/book. Remember that's books published and cataloged, not books written. In fact, the idea of 1.4% of all people being published authors sounds a bit high to me.

    9. Re:all the books in the world by j-beda · · Score: 1

      ...there's more people alive than dead ...

      Google seems to think at http://answers.google.com/answers/threadview/id/764806.html with some links to sources that there are about 100 billion dead people - even the lower estimates put the dead population well above the live population.

      Oh, and shouldn't that be "...there are more people..." ?

    10. Re:all the books in the world by Jedi+Alec · · Score: 1

      why does that number seem incredibly low to me?

      Robert Jordan died.

      --

      People replying to my sig annoy me. That's why I change it all the time.
  6. Perhaps... by Anonymous Coward · · Score: 0

    Folks are afraid their citations might actually be checked in context? Or that equal access to public domain content gives the professional little more than a buttload of competition?

    I have found Google Books invaluable for genealogy research, though I admit that their metadata and the file names are messed up. If you find several different volumes of a set, you have to rename them when you save them and be careful not to overwrite.

    One huge gripe is that the PDFs do not include the OCR'd text so one can search within it. This is a huge oversight. I hope they will correct that someday.

    Still, Google Books is the best solution that has come along. I hope they continue to improve it.

    1. Re:Perhaps... by Anonymous Coward · · Score: 0

      Still, Google Books is the best solution that has come along.

      Seriously. Google has put up a ginormous crapload of Free Stuff. My fairly narrow area of interest is 14th-15th century central European history, and I have downloaded GIGABYTES of relevant books, including highly useful pre-World War I maps and such that are long long long since out of print and otherwise unobtainable. Of course it can be improved. Of course Google should do a better job of sharing raw data with potential competitors, particularly in the cases of books that are still under copyright. But goddamn, it's one of the best things Google has done.

  7. "scholarly" information by Aurisor · · Score: 2, Interesting

    As someone who majored in English Literature in college, I can tell you that academics love getting their panties in a bunch over what is Scholarly Publication and what is not. Some teachers will actually have special assignments that have to be written entirely using Scholarly sources, or in response to a Scholarly article.

    Before the advent of the internet, I can see how it might have been useful to have an in-group comprised of people who had some sort of qualifications to write about something, but it seems antiquated in light of the ease with which we can independently verify claims.

    Usually, if someone's going to write something that's actually useful, they'll write an actual book. Soon thereafter, a bunch of "Scholars" will come along and write a bunch of journal articles and tell us all about how the useful work was one of three things: misogynistic, code for a religious statement, or arcane, carefully-hidden innuendo.

    Sorry if I sound bitter, but I spent a lot of time reading this crap, and very little of it was as insightful or interesting as even my classmates' comments.

    1. Re:"scholarly" information by ahoehn · · Score: 4, Insightful

      Sorry if I sound bitter, but I spent a lot of time reading this crap, and very little of it was as insightful or interesting as even my classmates' comments.

      That sounds like more of a you problem than an academia problem. If you don't enjoy using a work's minutiae to accuse perfectly innocent authors of misogyny, innuendo, (to add a couple you forgot) blatant colonialism or latent homosexuality, what the fuck were you doing in an English Lit program? The rest of us live for that shit.

      As someone who should not have majored in English Literature in college

      There. I fixed it for you.

      --
      Mod my comments down. It'll be fun.
    2. Re:"scholarly" information by moosesocks · · Score: 3, Interesting

      Actually, the GP's got a good point. Back in college, I took a number of humanities courses whenever I could squeeze them into my schedule.

      I can say from firsthand experience that there are a lot of "scholarly" articles that are complete and total crap. When writing papers, I'd frequently peruse JStor for pertinent articles about my topic, keeping an eye out for particularly good articles, as well as the heinously bad ones. Picking apart and systematically disproving a bad paper published in a "good" journal was an easy ticket to an 'A' on the paper.

      These papers, of course, were certainly the exception. Most scholarly papers I encounter are humbling in their brilliance. However, I've seen more than a few bad journal articles, as well as quite a few blog entries that would be worthy of scholarly publication. It's hard to make any generalizations about the validity of certain sources of information.

      Unfortunately, Physics wasn't quite as easy to bullshit (Random aside: The physical sciences certainly have their fair share of bad journal articles, especially in light of the fact that printed media is a terrible means by which to communicate scientific results. It's a cruel irony that the www was invented to enable collaboration and information exchange between scientists, but is rarely (if ever) used for that purpose. Also, any use of the word 'trivial,' or its synonyms needs to be punishable by death.)

      PS. Don't judge our writing abilities based upon out slashdot comments. I'm sure the GP had his own reasons for majoring in English, even though literary discourse is often trite and contrived.

      --
      -- If you try to fail and succeed, which have you done? - Uli's moose
    3. Re:"scholarly" information by Petrushka · · Score: 1

      I can tell you that academics love getting their panties in a bunch over what is Scholarly Publication and what is not. Some teachers will actually have special assignments that have to be written entirely using Scholarly sources, or in response to a Scholarly article.

      There are two separate things going on there, and you've mixed them up slightly.

      First: in the kind of scenario you raise, "scholarly publication" acts as a mechanism for filtering information. There's a lot of information in the world; stuff that appears in "scholarly publications" should, if that criterion is well-designed, have a better average quality. As filtering mechanisms go it's imperfect: sometimes stuff that has passed peer review is still fishy, and sometimes good stuff gets excluded, as you yourself have observed. Still, tools for picking and choosing are valuable. I'd add that an assignment that specifies that you're not allowed to use anything that doesn't pass the criterion is an assignment that won't teach you anything about how to exercise this kind of judgment yourself. (Usually I'd say that's a bad thing.)

      Second: academics, even when they have powerful search tools and the competence to tell for themselves which stuff is good and which stuff is bad, still have to think about the criteria for what counts as "scholarly publication", because they get hired or fired on the basis of it. In some countries (e.g. the UK, Australia) there are very specific government-enforced criteria and processes for grading academics; in Australia, they use a whitelist of which journals you'll get credit for publishing in, and an article in an "A" rated journal will get you X amount of credit, while an article in a journal that doesn't appear in the whitelist will get you precisely no credit (and will actually count against you, if you are stupid enough to list it in your CV). This is obviously terrible, inexcusably lazy, and guaranteed to cause long-term harm, but it's an environment that academics still have to work in or else not work at all. Academics get graded in a similar way in the US, except that the criteria are less standardised.

    4. Re:"scholarly" information by Aurisor · · Score: 1

      If you don't enjoy using a work's minutiae to accuse perfectly innocent authors of misogyny, innuendo, (to add a couple you forgot) blatant colonialism or latent homosexuality, what the fuck were you doing in an English Lit program?

      Umm...racking up easy A's for Law School?

    5. Re:"scholarly" information by Kirijini · · Score: 1

      ...academics love getting their panties in a bunch over...

      It doesn't matter how you end that statement. It's true. But, that's their job - academics overthink and overanalyze everything they can.

      ...over what is Scholarly Publication and what is not.

      There's a very good reason for that. Scholarship involves putting your reputation on the line. "Scholarly" works are those in which the author says: "This is a contribution to human knowledge and understanding of the world around us." In contrast, popular literature is produced for a very different reason - to make money, or because the author is passionate about the subject, or for fun (and, in all cases of course, that work is published by a publisher to make money).

      There's a big difference there, when you're using a work as source material for your own scholarly work. You want to rely on people who are staking their own reputations (and usually, professional/academic careers) on the validity of their work.

      ...it seems antiquated in light of the ease with which we can independently verify claims.

      How do you independently verify a comparative analysis of two different research methods? or an attempt to harmonize two competing theories in a certain field of study? Or even just the simple application of an established theory to a certain set of facts?

      You can verify the facts (maybe, if you know where to look), and you can verify the theory is as they say it is (if you have the foundation to understand it), but you can't verify their analysis. The strength of that analysis may be self-evident, or it may rely on the author's reputation.

      ...I spent a lot of time reading this crap, and very little of it was as insightful or interesting as even my classmates' comments.

      No shit! You were studying humanities. You shoulda studied a social science if you were looking for insightful or interesting academic work. Woulda prepped you better for law school too.

    6. Re:"scholarly" information by ahoehn · · Score: 1

      Umm...racking up easy A's for Law School?

      Touche good sir. I would have also accepted, "picking the major with the greatest percentage of sexually curious coeds" and "picking a major where facts are far less important than the way in which they are presented."

      --
      Mod my comments down. It'll be fun.
    7. Re:"scholarly" information by arethuza · · Score: 1
      "This is a contribution to human knowledge and understanding of the world around us."

      More like "I need to publish stuff to get promoted to get more status & money" - and yes, I have worked in academia and played that game until I thoroughly sick of it and left to found a tech company.

  8. Anonymous Coward by Anonymous Coward · · Score: 5, Interesting

    Google has scanned many volumes of the Laws of Indiana, which go back to 1816. These are the session laws of the Indiana General Assembly and have never been copyrighted. However, Google has arbitrarily decided not to make most post-1922 volumes it has digitized, and even some pre-1922 volumes (e.g. 1877, 1893, 1895, 1909, 1917 and 1918), available, using the claim of copyright.

    Google has done all the decision-making here. Anyone who might object to the classification of one of these volumes as copyrighted and thus available in "snippet-view only" presumably would have the burden of proving the contrary. (And where would you even start? Who would you contact? I have seen nothing on this.)

    Once (or if) the settlement is approved early this fall, Google's "rights" attach to these volumes. If I understand correctly, at that point any individual who wishes to access one of these volumes of Indiana's session laws not already in "full view" will have to pay for it, and for the money will obtain only individual rights, NOT the right to make it freely available to others.

    Broader implications: Finally, this analysis has been limited to volumes of Indiana session laws, but surely similar situations exist more broadly.

    For more on this, see this Aug. 2, 2009 Indiana Law Blog entry: http://indianalawblog.com/archives/2009/08/courts_my_probl.html

    1. Re:Anonymous Coward by Darkness404 · · Score: 1

      Google is dealing with a ton of books as fast as they can. Theres no doubt that not everything is perfect, but the books are scanned and available. With time things will improve, but as of now, they are simply in the scanning things and getting them out there mode, not the "make everything perfect" mode.

      --
      Taxation is legalized theft, no more, no less.
    2. Re:Anonymous Coward by Anonymous Coward · · Score: 0

      How is this +5?

      Don't visit google.com/books

      There easy. Just go to your library or buy them as always.

      If I took some of these law books that are not copyright and scanned only "some" of them, how does that concern you exactly.


      They haven't scanned them all in? Big fucking deal.

      Some pre 1992 they are saying they can't scan them in because they are copyright!?!? Holy shit! People would pissing and moaning here 10x as much if Google was illegally scanning in and offering copyrighted material. There are erring on the side of caution. HOLY SHIT!


      Get a fucking life.

    3. Re:Anonymous Coward by Chris+Mattern · · Score: 1

      Google is dealing with a ton of books as fast as they can.

      And that may be precisely the problem. "There's never time to do it right, but there's always time to do it over."

  9. Something is usually better than nothing by Anonymous Coward · · Score: 5, Insightful

    And this is no exception. Before google books you had access to books from various libraries, books you owned, books you could loan from friends (*shock* *gasp* copyright infringement), books you could buy and books from non-google online sources. Now you have access to all of those and additionally google books. Even if google books is 99% "piece of shit" (which in my experience is simply not true, but nevertheless) you still have the 1% potentially useful material available that wasn't available before, so you win.

    1. Re:Something is usually better than nothing by chthonicdaemon · · Score: 1

      What about signal-to-noise? If I have a nicely organised library and you donate a truck full of books, many of which are filled with drawing by your toddler, it may not be worth my time to sift through them to find the gems. It would be a very bad idea to add them to my library without going through them because I am increasing my odds of getting a bum book, even though the number of good books has gone up.

      --
      Languages aren't inherently fast -- implementations are efficient
    2. Re:Something is usually better than nothing by julesh · · Score: 2, Insightful

      The problem is that the existence of google books makes it harder for others working on similar systems (and there are others, this isn't just a pipedream) to become established. A Google Books court-approved class-action copyright settlement would make it harder for somebody else to reach a similar agreement (because the public interest argument will be harder to make). Essentially, this is a field where the first person to do it is likely to end up with a monopoly, and Google have done it badly, thus precluding other people from doing it properly.

  10. Sure, libraries make mistakes by mschuyler · · Score: 2, Insightful

    like shelving 'Life of an Iceberg' under biographies, but by and large they strive to be and are correct. If they mess up, some other library will fix the error. Libraries' cataloging data is usually centralized by OCLC so that the data is uniform throughput the country as other libraries pull from this central source for their own catalogs. Libraries also use a recognized and standardized subject scheme with a controlled vocabulary, not just a bunch of meta tags. Cataloging librarians are a rare and little-recognized breed of people who spend their entire professional lives trying to make it easier to gain access to material. The result is an organized body of knowledge--not just a heap of books on the floor in no particular order, like the Internet--and Google. For Google to blame libraries for their troubles is like blaming the Machinist Mates on the Titanic for crashing the ship into an iceberg. There, full circle. How did that happen?

    --
    How about a moderation of -1 pedantic.
  11. Why Isn't Google Books A Library? by LifesABeach · · Score: 4, Interesting

    With all the class act talent that Google hires right out of college, why can't Google create its own Public Library on the Internet? Chrome could be the entry way to any book that is in the Public Domain, or by the Authors written permission. Turning the page of a book could be as simple as the [Back], or [Next] button. The "Card Catalog" would be a No-Brainer. No Library goes through these many hops. There's even translation to other languages, Brail, and Audio; from my viewpoint, this SHOULD be the challenge, not what word category is or isn't. If it's a case of "buy the book", then to buy 10 copies of "Gone with the Wind", and ONLY allow up to 10 readers to ONLY read "Gone with the Wind". Google could even have a "Google Online Library Card"; this is were the company hums "Ka-Ching".

    1. Re:Why Isn't Google Books A Library? by QuantumG · · Score: 2, Funny

      So you haven't read any of the stories that have appeared on Slashdot in regards to Google's plans for their Books service eh?

      --
      How we know is more important than what we know.
    2. Re:Why Isn't Google Books A Library? by Anonymous Coward · · Score: 0

      No Library goes through these many hops.

      I know. Google's peering agreements are just awful. The downtime and the slowness.... Sheesh.

      If Google can't get it together, then this search engine thing will NEVER take off.

    3. Re:Why Isn't Google Books A Library? by riffzifnab · · Score: 3, Funny

      With all the class act talent that Google hires right out of college, why can't Google create its own Public Library on the Internet? Chrome could be the entry way to any book that is in the Public Domain, or by the Authors written permission. Turning the page of a book could be as simple as the [Back], or [Next] button. The "Card Catalog" would be a No-Brainer. No Library goes through these many hops. There's even translation to other languages, Brail, and Audio; from my viewpoint, this SHOULD be the challenge, not what word category is or isn't. If it's a case of "buy the book", then to buy 10 copies of "Gone with the Wind", and ONLY allow up to 10 readers to ONLY read "Gone with the Wind". Google could even have a "Google Online Library Card"; this is were the company hums "Ka-Ching".

      I think that's the idea, perhaps you should go check it out: http://books.google.com

    4. Re:Why Isn't Google Books A Library? by Anonymous Coward · · Score: 0

      I think the rub from publisher's is when the book is converted to digital text. The problem with the publisher's is that their business model that has lasted for thousands of years is about to become vastly different. And like Buggy Wip Makers, Travel Agents, and Real Estate Buying Agents; their need to help people will be less. Publishers will need to re-invent their business model in order to survive.

  12. Obnoxious by burgundysizzle · · Score: 3, Insightful

    The inline replies are written with a smug sense of self-entitlement as though he and other "scholars" are the only legitimate users of Google Books. It's NOT about you - you are not going to create enough adsense hits to make this whole thing worthwhile (or turn a profit).

    1. Re:Obnoxious by Volante3192 · · Score: 5, Insightful

      Definatly. It's like, "Oh, look, I found an error. If I had done this, that error wouldn't be there!!" And to that I respond, then do it yourself. YOU go tack metadata onto the 100 million books they have, you smug egocentric bastard.

      And, of course, he completely ignores the 999,999 proper entries compared to the 1 error. Google seems to know there's lots of problems here, and they're not going to get it right the first pass. But having a first pass at all is better than nothing.

    2. Re:Obnoxious by fuzzyfuzzyfungus · · Score: 2, Insightful

      If you were a scholar, writing for an audience of other scholars, why wouldn't you write about the concerns of scholars and from their perspective? I'm sure he knows exactly why Google is doing what it's doing; but that doesn't mean that he can't point out the downsides.

      It's like saying that Slashdot is obnoxious because it is "written with a smug sense of self-entitlement as though he and other 'geeks' are the only legitimate users of the Internet". This is true; but that is because it is a geek website where geeks write about geek stuff. Obviously we know why Comcast is capping and packet shaping; but that doesn't mean we can't whine about the downsides for us.

    3. Re:Obnoxious by jefu · · Score: 1

      Indeed. He seems to think that his sole goal as a scholar is to grab information from wherever and make publications (in fairness, that is the job of most university professors and they have often forgotten the real point of scholarship), instead of trying to improve the state of knowledge of the world (in which case he should be finding the best metadata for his sources and helping google - or other sources - to incorporate that). He also seems to believe that google is there primarily to support his (rather narrow) viewpoint on scholarship in general and that mistakes on their part are somehow personally betraying him. He is wrong in several ways.

    4. Re:Obnoxious by AthanasiusKircher · · Score: 1

      Try some searches yourself; the error rate is far above one in a million. As I've started using Google Books in my research, I've encountered these sorts of errors frequently enough that I don't really trust search results in Google Books to turn up things I expect, if those searches depend on metadata.

      For example, take the criticism of incorrect dating given in the article. Try picking a well-known author of classic literature not mentioned in the article. (Google appears to be working on the ones that are mentioned in that article, so you can't get good stats.) Then restrict dates to the years before the author was born. Just trying a few searches like that seems to indicate that this particular error occurs about 1 in 1000 times for major authors of classic literature in the Google Books database.

      That's only one type of error, and that's not even including all the other possible errors in dating that aren't as blatant as having the publication date before the author was born. Factor in the other kinds of metadata errors mentioned, and I would bet that more than 1% of records have some significant error that would cause them to be left out of reasonable searches requiring metadata.

      That sort of error rate significantly decreases the usefulness of Google Books. Is Google Books a great thing? Of course. But its usefulness is not only in getting access to texts, but in being able to search for an find those books. If a telephone directory had an error giving the wrong name, number, or address in more than 1 out of 100 entries, it would be considered a major problem. If you can't find something in a database, it significantly decreases the usefulness of having all the materials there in the first place.

    5. Re:Obnoxious by Archimboldo · · Score: 1

      I don't recall anyone on Slashdot saying they were the only legitimate users of the internet. If they did, I would condemn it as much as I condemn smug academics arrogant attitudes.

    6. Re:Obnoxious by burgundysizzle · · Score: 1

      It's like saying that Slashdot is obnoxious because it is "written with a smug sense of self-entitlement as though he and other 'geeks' are the only legitimate users of the Internet".

      They're really not the same. Personally I'd mod someone down for saying that someone didn't have the right to use the internet (if I had mod points). My problem mainly rested with the (smug and it's all about "me") tone of the comments inlined into the google response to him.

    7. Re:Obnoxious by burgundysizzle · · Score: 1

      Think of it more as currently in the shape of a data warehouse (which it really is). Most data feeds into data warehouses aren't that great - they need to be massaged into something useful and Google are finding out over time what the problems are.

      If you can't find something in a database, it significantly decreases the usefulness of having all the materials there in the first place.

      That's true but not to the point where you'd consider not making any of it available until it was perfect (or most of the way there). It's not perfect and it probably never will be - given time as they find more systemic problems in their metadata feeds and address them things will get better. They may be at 90% but for most people it's going to be good enough as it's more than likely going to be used like a library browsing sections for interesting things.

  13. The argument that should have been made here... by Looce · · Score: 2, Interesting

    ... is that academics can't rely on Google Books to make their bibliographies, because the publication date and authorship information, which are used in all citation styles (MLA, Harvard, etc.) are incorrect on Google Books for an apparently large amount of books. Categories aren't used in citations, they're used by searchers.

    Jon Orwant of Google said that 1899 was a placeholder year for unknown publication dates, as provided by some of their metadata providers... which leads me to ask if they sanitise their data or do any research into publication dates themselves!

    1. Re:The argument that should have been made here... by Anonymous Coward · · Score: 0

      1899 was a placeholder year for unknown publication dates

      Not 17-NOV-1858?

    2. Re:The argument that should have been made here... by Anonymous Coward · · Score: 2, Informative

      WorldCat.org

      Find it on Google Books, look it up on there; Google Scholar if it is an article. I am a historian, and when I check citations (for journals or my own work), that is how I get it done.

    3. Re:The argument that should have been made here... by timeOday · · Score: 1

      academics can't rely on Google Books to make their bibliographies, because the publication date and authorship information, which are used in all citation styles (MLA, Harvard, etc.) are incorrect on Google Books for an apparently large amount of books.

      What you mean is, they have to bother to pull up the book's title page for that information, rather than simiply pulling off google's metadata. Boo hoo.

  14. Google's brilliant vagueness by dpbsmith · · Score: 4, Insightful

    This is much like Google itself.

    Google's brilliance, and woe, is its sloppy imprecision.

    You type in a query. It returns a bunch of stuff. Quite a lot of it is irrelevant and as perceived as not meeting the requirements of the search, but you don't mind because all you care about is that it finds what you want, not that it finds other stuff. Unfortunately, Google is so good that it tricks you into believing that it always finds everything that matches your query. But, of course, there's no way to find out what it _missed_.

    I've personally noticed and been puzzled by the publication dates. I'd noticed it particularly with periodicals. What seems to be the case here is that Google is very prone to give the date that a journal began publication as the publication date of every article that has ever appeared in that journal.

    Wikipedia editors are well aware of the dangers of using Google hit counts as data. It's amusing to see that there are 1,930,000 hits on "Ghandi" compared to 22,900,000 for "Gandhi" and conclude that Gandhi's name is misspelled 10% of the time... or to notice, as I have, that that percentage is increasing and project the year in which "Ghandi" must inevitably become the accepted spelling... but it is, as they say, "for amusement purposes only."

    1. Re:Google's brilliant vagueness by Anonymous Coward · · Score: 0

      Ghandi's name isn't English, so your spelling is a bit off there.

    2. Re:Google's brilliant vagueness by Marcika · · Score: 1

      There is a straightforward transcription of Hindi devanagari into latin script, so "Gandhi" is obviously right, while "Ghandi" is obviously wrong. (G and Gh represent distinct sounds and different devanagari letters, as do D and Dh.)

  15. Too much information? by presidenteloco · · Score: 5, Insightful

    Yes, having all of the world's literature available for instant full text search sounds
    disastrous for scholars.

    --

    Where are we going and why are we in a handbasket?
    1. Re:Too much information? by martin-boundary · · Score: 1

      Yes, having all of the world's literature available for instant full text search sounds disastrous for scholars.

      It certainly is, if the text is sometimes right, sometimes wrong...

    2. Re:Too much information? by swillden · · Score: 1

      Yes, having all of the world's literature available for instant full text search sounds disastrous for scholars.

      It certainly is, if the text is sometimes right, sometimes wrong...

      I see no hint, in any of the linked discussion, that any of the text is wrong. Some metadata is wrong, but that can be checked against the scanned frontmatter quite easily.

      And the metadata will get fixed. This is a massive undertaking and it will take time to get it right.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    3. Re:Too much information? by wibald · · Score: 1

      The availability isn't the problem, it's the execution. Google could quite easily license the already accurate information from the participating libraries or from OCLC but didn't. And while the article is from the perspective of a scholar, the majority of users are likely to be students, a user base that may not be aware of or understand the implications of the errors until after their paper is finished. And remember that these books are mostly coming from research libraries. These will mostly be scholarly books of primary interest to students and faculty. The fulltext search of millions of books is certainly useful and, as a reference librarian at a research institution, I often encourage students to use it to discover if "there is a book out there on that subject" but the inaccuracies do really matter! A small example of why is the case of books with only a snippet or less available in fulltext. Our students, and anyone with a public library nearby in the US for that matter, can request the book through interlibrary loan (almost always for free). But to make a successful request, you need accurate information about the book you want to see. See where the Google's lack of metadata accuracy could be a problem?

  16. Card catalogs by dpbsmith · · Score: 5, Interesting

    Tangential, but "card catalogs." Ha! I once had a compelling need to look up an article in the Occasional Papers of the Bingham Oceanographic Collection. So I went to the card catalog.

    It wasn't under O. It wasn't under P. It wasn't under B. It wasn't under C.

    It was under N.

    Why? Because, naturally, as of course everybody knows, the Bingham Oceanographic Collection is part of the Peabody Museum. Which is part of Yale. Which (drum roll...)... ...is in New Haven.

    The great thing here is that you can't even say there was an error in the card catalog, unless filing something under a heading that is perfectly correct, but under which nobody would dream of looking for it, is considered an error.

    1. Re:Card catalogs by mmortal03 · · Score: 1

      So, how did you end up finding it?

    2. Re:Card catalogs by Two99Point80 · · Score: 1

      ...part of the Peabody Museum. Which is part of Yale. Which (drum roll...)... ...is in New Haven.

      Well, at least they got the city name right. When I was in Data Systems at Southern New England Telephone (also in New Haven), I got a look at a cleaned-up list of city names in the Customer Records and Billing master database. According to it, we had at least one customer in East Fartford (rather than East Hartford), which might've shown up in "F" rather than "H" in the example you gave...

    3. Re:Card catalogs by Peter+H.S. · · Score: 4, Informative

      Well, organizing books by listing them in which city they are from (printed) is among the oldest way of cataloging printed books. The practice goes back to Gutenberg and the so called "incunabula" period where book dealers/printers/publishers (often the same persons) would make book catalogs out a certain city. So if you needed a certain edition of a title, you would have to track it by such book catalogs, since the Leipzig edition would be different from the Mainz edition.

      It is of course sad that once such common knowledge among scholars now seems forgotten, probably not a hindrance when working with modern sources, but still necessary to know when working with old stuff, just like knowing that words/names starting with J were filed under I etc.
      Many academics still puts the printing city in their sources, though many seems to have forgotten why they do so.

      You just happened to stumble into a book /journal catalog organized by a centuries old and previously very well known method. The error wasn't in the card catalog or the way it was organized, but in that no one ever told you about these ancient methods in your library course.

      --
      Regards

    4. Re:Card catalogs by dkf · · Score: 1

      You just happened to stumble into a book /journal catalog organized by a centuries old and previously very well known method. The error wasn't in the card catalog or the way it was organized, but in that no one ever told you about these ancient methods in your library course.

      The real issue to note here is that one thing computers are much better at than what went before is maintaining indices of cataloged data and performing searching of it. Sure GIGO rules still, but it's now practical to be able to search for a work on any facet of its metadata or even on automatically extracted information from the work itself. That's massively beyond what libraries used to offer. (I've had occasion to use card catalogs, and the biggest problem with them is the restricted number of axes on which you can search, and the fact that you're restricted as to which order to perform the search.)

      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
    5. Re:Card catalogs by Peter+H.S. · · Score: 1

      I don't think the issue is that computers are much better than card catalogs, that fact is just given. The issue here is that once common knowledge are forgotten, so that when the OP used an old catalog system that baffled him, he thought that organizing journals by printing city was an error. But that system is centuries old and so common that even today scholary sources often includes the books printing city even though it doesn't make sense nowadays. Within a generation this more than 500 year old system seemed to have been forgotten. I don't yearn for when card catalogs ruled, I find them extremely limited compared to quering a DB, but I do think that some knowledge of historical ways of organizing books should be taught students at the Universities library courses.

      --
      Regards

    6. Re:Card catalogs by dpbsmith · · Score: 1

      I asked a librarian. And there's probably a lesson there, because the librarian found it right away.

      My recollection is that she didn't actually know it off the top of her head, but knew that in this card catalog the city of publication was the primary entry--most journals also were alphabetized under their, you know, names, but that was just for lagniappe. And she had some volume at hand--or maybe it was five or six volumes on a nearby shelf--at which you could look up a journal title and find the city of publication.

      In reality, if you'd clicked a stopwatch it probably took less than fifteen minutes to find it, including the time I spent looking in the card catalog and not finding it.

      Nostalgia: those card catalogs were veritable museums of typewriter type fonts, the cards having been typed over periods of many decades.

  17. Intellectual Pissing Contest by Anonymous Coward · · Score: 1, Funny

    *grabs popcorn*

  18. Book publishers endangered, cry me a river by moon3 · · Score: 4, Insightful

    They pushed the copyright law to over hundred years (just to make sure they will make money of writers even after they are dead), now comes our big brother Google to the ring to resurrect all the OUT OF COPYRIGHT books -- meaning those dead books that publishers no longer exclusively distribute. What an offense against the poor publishers. Google is creating a real e-Library of enormous proportions of virtually free books, what a threat. I bet I am not alone who wants to see the Newton's books on physics e-published again and searchable.

    1. Re:Book publishers endangered, cry me a river by Anonymous Coward · · Score: 0
    2. Re:Book publishers endangered, cry me a river by Anonymous Coward · · Score: 0

      I bet I am not alone who wants to see the Newton's books on physics e-published again and searchable.

      They have been

  19. Does Google destroy the books after scanning? by pembo13 · · Score: 1

    The impression I get from these stories is that once Google scans them, no one else can. Is that somehow the case?

    --
    "Thanks for all the money you paid to us. We've used it to buy off ISO among other things" -Microsoft
    1. Re:Does Google destroy the books after scanning? by riffzifnab · · Score: 1

      The impression I get from these stories is that once Google scans them, no one else can. Is that somehow the case?

      Yes, once Google scans them they gather up all the copies and burn them. Just kidding, any one is free to scan them and put them online too. Microsoft used to scan books, and the Internet Archive has it's own scanning project that is still ongoing (but they might be restricting themselves to out of copyright works, I don't know).

    2. Re:Does Google destroy the books after scanning? by Truth+is+life · · Score: 1

      Yeah, the IA only does public domain (pre-1923) books. But since this also applies to translations (!) and the majority of the world's books have been published since 1923...

  20. Cue the "OMG Google book monopoly" slashbots by riffzifnab · · Score: 1

    Please give it a rest, anyone can scan all the books they want and post them online. The only problem is that the law hasn't established an efficent way to get the right to post books online. If Google had tried to do this with the laws current they would have had to figure out who owned the right to every book. Imagine how much the internet would suck if search engines had to do the same thing.

    Also to get back to the topic at hand, it looks like they are trying to fix this as best they can and libraries have errors in them, it happens. zomg.

    1. Re:Cue the "OMG Google book monopoly" slashbots by overbaud · · Score: 1

      True that. I've been scanning things for years and putting them online.

      --
      Users... the only thing keeping 1st level support from being the bottom feeders.
  21. Yes - it's all Googles fault... by Anonymous Coward · · Score: 0

    Not the fault of the publishers who "mislabel" their books. Google should be ashamed. Bad Google!

    Of course perhaps the "Academics" should get off their arses and actually do some real research instead of taking everything provided to them by Google books as fact. But of course - Google made them do what ever it is that they do wrong.
     
    And we haven't even gotten into what Microsofts involvement is!
     
    If Google books was done on an iphone then everything would be ok...

  22. Quit Your Whining! by Nom+du+Keyboard · · Score: 1

    I wish these people would just quit their whining about Google and it's book scanning. If you don't like what Google is doing go scan them yourselves. Google is creating something that never existed before - a large repository of the history of books in digital, searchable, available form - and all I hear is complaining. I don't believe that Google has an exclusive on this. I don't believe that their agreements to scan books prelude anyone else from undertaking the same project. And with technology improving in capability and cost every year it might even be cheaper for a latecomer to duplicate this feat. BUT SHUT UP ABOUT IT! If you don't like it then just go away and pretend it never existed and absolutely nothing else in your life will have changed.

    Personally, I'm glad Google is doing it. Think of the screams of pain if it was Microsoft doing this.

    --
    "It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
    1. Re:Quit Your Whining! by Anonymous Coward · · Score: 0

      "I don't believe that Google has an exclusive on this."

      Minor issue there. Until someone else gets to clout and money to shadily buy rights to all the works in one swoop, its pretty exclusive.

    2. Re:Quit Your Whining! by Anonymous Coward · · Score: 1, Insightful

      The concern is really the Faustian bargain that Google has been willing to strike with trade groups (like the Author's Guild settlement). Google has conceded the point that these groups should be facilitated in their great land grab of out-of-print books, in return for Google's right to index them.

      It is reasonable to question whether those bargains are fair, especially since we have projects like the Internet Archive, which wouldn't make such a concession. It's also reasonable to question whether Google and a trade group even have the legal standing to strike that sort of deal.

  23. Spurious Argument by mikethicke · · Score: 2, Interesting

    As an aspiring academic half way through a philosophy Ph. D., I find Nunberg's argument pretty absurd. Google books is a godsend for academics, and would be much more so if there was full access to their entire catalog rather than "limited previews" for most books. I have used Google books countless times to quickly check out whether a book is relevant to my research, or to get the gist of an author's argument without having to trudge down to the library. I know many others who do this as well. In all this time I've never even looked at Google's metadata. No decent academic would rely on such information, as there are far more reliable methods: such as actually checking what's written in the book, which yes, Google scans in.

  24. I expect a "download BibTeX" button by week's end by Anonymous Coward · · Score: 0

    Get on it, Google.

  25. Freuds take on this would be... by overbaud · · Score: 0, Troll

    ... that Nunberg needs to get laid more. I can just imagine the man there banging away on his keyboard about this outrage all backed up.

    --
    Users... the only thing keeping 1st level support from being the bottom feeders.
  26. Incredible arrogance of the "scholar" by Shag · · Score: 1

    In inline comments to the Google head guy's reply to the original blog entry, I find:

    Google: Geoff asks why we decided to infer BISAC subjects in the first place. There is only one reason: we thought our end users would find it useful.

    Scholar: The question is, why did you think end-users would find this useful? Which end-users did you talk to about this? I don't think you'd find a whole a lot of scholars who would embrace the idea of using the BISAC classifications in place of other library classification schemes. In fact, why would anybody think that a scheme designed for organizing the shelves of a Barnes & Noble outlet would be appropriate for a collection assembled out of the holdings of major research libraries?

    I read this as "any book that can be found in the holdings of a major research library is only of interest to scholars." And I think he's entirely off-base. Nose-in-the-air "Scholars" like this gentleman fail to recognize that Google's efforts are about making material available to "the rest of us" who don't have access to those major research libraries. And categorical indexing of material makes complete and total sense if you expect to have non-PhD sorts searching for it.

    I happen to be a scholar, in some sense, of one particular science. If I want to read some classic literature that has absolutely nothing to do with my science, should I be denied access because I'm not a scholar of that?

    --
    Village idiot in some extremely smart villages.
    1. Re:Incredible arrogance of the "scholar" by bigbigbison · · Score: 2, Insightful

      I don't read him as saying, "any book that can be found in the holdings of a major research library is only of interest to scholars." at all. Rather, I read him as sayin that the systems that libraries use to organize books be they Dewey Decimal, Library of Congress, or some other system were created to help organize books for users to use them. The BISAC classifications were developed to help companies sell books. Why use that rather than what the libraries -- the source of these books -- uses?

      --
      http://www.popularculturegaming.com -- my blog about the culture of videogame players
    2. Re:Incredible arrogance of the "scholar" by grcumb · · Score: 3, Interesting

      And I think he's entirely off-base. Nose-in-the-air "Scholars" like this gentleman fail to recognize that Google's efforts are about making material available to "the rest of us" who don't have access to those major research libraries. And categorical indexing of material makes complete and total sense if you expect to have non-PhD sorts searching for it.

      You're fighting the wrong battle here. It's easy to find any number of legitimately nasty things about 'Scholars' and 'Academics' and elitism in general. But arguing for proper classification in Google Books is not one of them.

      For several years I was an avid amateur of Information Retrieval. Classification (and other useful organisational models) of information into related collections is essential when you don't know what keywords you're looking for. This is especially important with historical works, where the use of 21st Century names, terms and other common keywords is next to useless.

      Google search is useful when you know what you're searching for. But knowing what to look for in Google Books is an entirely different matter. Categorisation matters here.

      By using a classification system that is designed for book sellers, Google's chosen a very poor set of criteria. Not only will most of the titles be poorly characterised (and thus harder to find), the effort required to find them increases with their rarity or uniqueness. These aren't always a measure of importance or interest, but often enough, they are.

      Asking Google to consider a proven, effective and well-understood categorisation system is not being snooty; it's an effort to suggest - as we geeks often do - that there might actually be a correct way to perform this task.

      Sometimes what looks like 'arrogance' is actually the state of being right about something when no one else will listen.

      --
      Crumb's Corollary: Never bring a knife to a bun fight.
  27. OMG, someone found an error at Google! by Dr.+Spork · · Score: 1

    This may be a trite point, but yes, Google does err. Google also does a better job than most companies at going back and fixing their errors. This, being an online database, is pretty easy to correct. If by some principle the scholarship potential of this otherwise unavailable information was irredeemably corrupted, then yes, I'd worry. Instead, it sounds like a pretty amazing project which happens to be in beta.

  28. Scholars have lawns too you know by syousef · · Score: 2, Interesting

    This could be the stupidest and most disingenuous argument I've encountered all year. I guess I'll never know since the metadata is not at my finger tips. This might be a good argument for getting the metadata right. It isn't a good argument for tossing the virtual books out with the bathwater.

    So no I won't get off your lawn. We're better off without scholars who'd rather hoard information. Begone!

    --
    These posts express my own personal views, not those of my employer
  29. Re:The ISBN are still the same by fuzzyfuzzyfungus · · Score: 2, Insightful

    Which is incredibly helpful for anybody interested in printed materials before 1966...

  30. Best guesses by countach · · Score: 1

    Sounds like Google are doing their best to fix the problems. What I couldn't quite figure out is why bad data is overriding usually good data like Harvard. Maybe they need to give reliability rankings or something. We are 84% sure this date is right (because it came from Harvard), but there is a 10% chance this one is right (because some other place said that), and a 6% chance of this one (because some guys in Korea said it). Have the option to search only best guesses or all guesses.

  31. Google "Scholar" by Anonymous Coward · · Score: 0

    Google Scholar has big problems of its own, as far as being "scholarly." Citations that Google's non-expert review staff believe represent a given technology get promoted, while citations Google's nonexperts don't seem to want to recognize but which actually were the origin of the technology are suppressed.

  32. The Internet Public Library already exixts by osssmkatz · · Score: 1

    It's called the IPL. www.ipl.org. It has public domain works in the categories you'd expect to find them. (ie. Gutenberg content)

    www.refdesk.org is similar but for reference.

  33. if you don't like it... by caitsith01 · · Score: 2, Informative

    Why did they bother?

    Why did you bother to comment on it? If you don't like it - don't use it.

    You are clearly ignorant of the key problem with the Google books settlement (as it currently stands), which is that Google and only Google will be given the right to reproduce orphaned works. I assume the morons tagging this "caveat emptor" are also ignorant of this.

    So your glib remark should more correctly read, "if you don't like it, never have access to millions of pages of orphaned copyright works again because Google has an exclusive licence to reproduce them electronically". Which doesn't quite work as well, really, does it?

    --
    Read Pynchon.
    1. Re:if you don't like it... by The_Quinn · · Score: 1

      You are clearly ignorant of the key problem with the Google books settlement (as it currently stands), which is that Google and only Google will be given the right to reproduce orphaned works.

      This should be up to whoever holds the legal rights to decide. Their rights, their decision. If the rights-holder will only allow Google to reproduce the works, then that is their decision to make.

    2. Re:if you don't like it... by caitsith01 · · Score: 1

      You are clearly ignorant of the key problem with the Google books settlement (as it currently stands), which is that Google and only Google will be given the right to reproduce orphaned works.

      This should be up to whoever holds the legal rights to decide. Their rights, their decision. If the rights-holder will only allow Google to reproduce the works, then that is their decision to make.

      This is the problem - no-one can be found who legitimately holds the legal rights to orphaned works, that's why they're orphaned. But instead of anyone who wants to being allowed to use them, apparently the writers' guild gets to flog them to Google.

      --
      Read Pynchon.
  34. Reminds me of Vinge's "Rainbow's End". by argent · · Score: 1

    While it's unlikely that Google's scanning technology is as dramatic as the one in Vinge's novel, there appear to be striking similarities. I wonder if Larry Page or Sergey Brin have read it.

  35. A demented Academic Tone. by omb · · Score: 1

    Having read the original blog post this is clearly the vituperative rant of a imagine-wronged academic with which I am all too familiar.

    Google is doing the hard work of scanning and attaching some meta-data. Once that is done (a) more meta-data can be added and (b) errors fixed. Additional mete data will be needed as there TWO academic classifications for english, and many more for non-englisg languages.

    This is just stupid carping, by those who would rather retain control of their baliwick.

    He does not seem to realize that google will be able to search the content making the meta-data somewhat less important.

  36. There is no reason by ikkonoishi · · Score: 2, Funny

    There is no reason for you to post this comment here when you could have put together a properly formed and documented essay in a couple of months. There is was no reason for Newton to come up with his theory of gravity when in a few centuries Einstein would come up with a more complete theory.

    This is a long term project for humanity. We damn well better start now rather than waiting to do it right. Badly data can be cross compared and corrected. Data which has not been digitized at all is completely useless (Towards the purpose of having digitized data). In the time it took you to complain about it you could have pulled up a few scans, and done some good old fashioned legwork in the form of copying it out in ASCII and redrawing the illustrations like clerks of old.

  37. Blame copyright for that by Anonymous Coward · · Score: 0

    Because only licensed entities can create a copy of these works and at this moment in time, only Google has PAID for the license to do this.

    You can do it yourself if you wish: just stump up the $125Million and buy a license.

    If you don't like Google having the sole rights to commercial exploitation of this work, why aren't you complaining about Marvel having sole right to the graphic novels of Stan Lee etc? Or Warner Bros having sole rights to "The Matrix"?

    the problem can ONLY be fixed by

    a) forcing the copyright owners to give up the licensing for their works and make it PD
    b) forcing Google to pay for everyone else to have the rights

    (b) isn't going to happen.

    (a) isn't going to happen unless you kill off copyrights.

  38. Huh? by argStyopa · · Score: 1

    Perhaps someone should point out to Mr. Nunberg (if one can get past his ceaseless caterwauling) that the books digitized come from LIBRARIES, and if scholars find their digitization, cataloging, or other minutiae somehow insufficient, they can always go back to said LIBRARIES and do their research the old fashioned way?

    Some complaints just ring with irrelevance in immaturity. Complaining when someone has gone to great effort and expense to GIVE you something where you had nothing before, simply because they didn't organize it the way you might have, or because of some errors in the process seems...weak. Very weak.

    --
    -Styopa
  39. "as long as the books are fine.." but many are not by waterbear · · Score: 2, Interesting

    As long as the books themselves are perfectly fine (which they seem to be),

    Well, some are really good and well scanned, but others are a mess. From some organizations that do the scanning, you get missing pages and mangled pages. You get pages where the person doing the scanning sometimes put their hand between the page and the glass, so you can read the rings on their fingers but not the text on the page. (Books scanned at NY Public Library for example.) If ever there is a fold-out, you get at max half of it.

    The Google Books organization doesn't seem to want to know, there is a mechanism for reporting single page defects but when 50 defects occur in a book it gets hard to work through them all using the button-clicks: I tried it for two books and also sent a message to Google Books, there was an automated reply and no action after several months.

    So much for 'As long as the books themselves are perfectly fine ....', I'm afraid.

    -wb-

  40. this is absolute horse shit, just ask JSTOR by Anonymous Coward · · Score: 0

    librarians make mistakes too. if you went to any given library database and did funky searches, like 'show me all the maps you have', you will get all sorts of crap back.

    JSTOR had people bitching about them too...guess what? turns out the wonderful paper libraries had fucked up shitty catalogs, and not only that, their collections were themselves missing issues, missing pages, missing all sorts of stuff.

    JSTOR actually has a paper 'backup' of everything they scanned... they get it from universities who throw out their old junk to make way for new junk. and JSTORs collection is more 'pristine' than anything the libraries had to begin with.

    1. Re:this is absolute horse shit, just ask JSTOR by Maury+Markowitz · · Score: 1

      > JSTOR actually has a paper 'backup' of everything they scanned...

      And then charge you outrageous amounts of money to see it.

      I mean really. I'm more than capable of dealing with "bad metadata", especially considering that I never look at it. I use search, like anyone else in the 21st century. And when it comes to that, Google Books is an absolute godsend.

      Maury

  41. uhh doesnt slashdot use tags? by Anonymous Coward · · Score: 0

    oh wait... im sorry nobody can criticize the obvious stupid hypocrisy of the giant linux penis club.

    oh wait.. dont > 99.99% of every website on the planet have a title? isnt that meta data....??

    oh wait.. again, i apologize, oh great open source lords of azeroth.

  42. thats from 1920s/30s... not 'ancient' by Anonymous Coward · · Score: 0

    the cataloger sucked, or the card for the title was missing.

    go look up any book/journal from that time. they are by title and author, not 'city it came from' unless it is a special geographic card catalog.

  43. Scholars by jim_v2000 · · Score: 1

    If Google's service isn't sufficient for your research needs, THEN DON'T FUCKING USE IT. Dear god....

    --
    Don't take life so seriously. No one makes it out alive.
  44. Computers are better at indexing? by dpbsmith · · Score: 1

    Oh, there, I think, I disagree. I once read a book entitled "Indexing, The Art Of," about how book indexes are created, and it was an eye-opener.

    Conversely, there's nothing more useless than a completely computer-generated book index. You're looking for a topic that's discussed in three substantial sections and mentioned in passing fifty times, and the index lists fifty-three page numbers because the computer doesn't know which are the important ones.

    The same principle probably applies to card catalogs and other indexes. Indexing is a deeply human activity; the person doing the indexing has to have a feeling for importance and organization and be able to guess how the user probably thinks about things.

    P. S. Our local public library's card catalog used to have all of the first world war material listed under "Great European War 1914-18", though fortunately someone had stuck in "SEE" cards under "First World War" and "World War I."

    1. Re:Computers are better at indexing? by emj · · Score: 1

      Indexes are insanely expensive to maintain and to create. I've spent 40 hours working with an index, trying to simplify it, still wasn't done with it after that but I had to stop somewhere. Three books where to be put together to a big 380 pages collection, and the index entries got too big.

  45. And why would we care about 'scholars'? by mcc99 · · Score: 1

    "Scholars" make up less than 1% of the 'net-using population. So sorry if it inconveniences you "scholars" that Google Books, et al., is organized for the convenience of the great unwashed masses. Of course, all things should be organized specifically for the convenience of under 1% of the user base, esp. if that 1% has "PhD" after their names. And wear funny clothes, too. The day Google or anyone other public info service goes out of their way to organize information for your convenience vs. mine is the day I find a different service. And guess what? 99%+ of the user base is like me. So we have your ivory tower a$$es out-voted. ("O Democracy, what a terrible toll thou takest upon our tweed jackets and curiously high-socks...") Suck it up. Ppppfffffftttt..... :-P

  46. science history book written using Google Books by peter303 · · Score: 1

    I heard an author talk about on The Discovery of Air at the local bookstore. The book is about the correspondence between Priestly and Thomas Jefferson about Priestly's scientific ideas. This author talk was the first time I heard an author say that Google Books was an important reference source for him. This is a sweet spot for Google Books: 19th and early 20th century books out of copyright, but captured by google's university library digitzation effort.

  47. Another Day of Microsoft Trolling? by ajs · · Score: 2, Interesting

    I hate to be so cynical, but there was a huge uptick in negative articles on Slashdot about Google as soon as Microsoft started their anti-Google PR effort in DC. Now I see at least one anti-Google article on Slashdot every day. Is Slashdot falling for an extensive trolling effort from MS?

    More info available from previous Slashdot article...

  48. Good analogy by dugeen · · Score: 1

    Nevertheless, there are many parallels between drafting laws and writing programs, most of all the 'unforeseen effects at run-time' one. It's such a good analogy that I'm surprised it's not used more often.

  49. I hear about this drivel everyday ... booo by Anonymous Coward · · Score: 0

    As someone that works in Higher Ed with many librarians and academics. I can tell you that librarians are taking any stand to justify their existence in this "new fangled" Internet world. I have some advise; get on the boat, or find a new career because your world is evaporating quickly.

  50. Book train wreck?? by Anonymous Coward · · Score: 0

    When we left the Dewey Decimal System in the dirt, it was a mistake. It is almost always a mistake to let vendors -- and of them -- define standards. Sadly, we are (and have been) heading in that direction for a long time... mostly due to convenience, the balance to laziness.