Slashdot Mirror


How Badly is Google Books Search Broken, and Why? (blogspot.com)

An anonymous reader shares a blog post: It appears that when you use a year constraint on book search, the search index has dramatically constricted to the point of being, essentially, broken. Here's an example. While writing something, I became interested in the etymology of the phrase 'set in stone.' Online essays seem to generally give the phrase an absurd antiquity -- they talk about Hammurabi and Moses, as if it had been translated from language to language for decades. I thought that it must be more recent -- possibly dating from printers working with lithography in the 19th century.

So I put it into Google Ngrams. As it often is, the results were quite surprising; about 8,700 total uses in about 8,000 different books before 2002, the majority of which are after 1985. Hammurabi is out, but lithography doesn't look like a likely origin for widespread popularity either. That's much more modern that I would have thought -- this was not a pat phrase until the 1990s. That's interesting, so I turned to Google Books to find the results. Of those 8,000 books published before 2002, how many show up in the Google Books search result with a date filter before 2002? Just five. Two books that have "set in stone" in their titles (and thus wouldn't need a working full-text index), one book from 2001, and two volumes of the Congressional record. 99.95% of the books that should be returned in this search -- many of which, in my experience, were generally returned four years ago or so -- have vanished.
Further reading: How Google Book Search Got Lost; Whatever Happened To Google Books?; and Google's New Book Search Deals in Ideas, Not Keywords.

18 of 106 comments (clear)

  1. Set in Stone by AlanObject · · Score: 4, Insightful

    I always thought that "set in stone" refers to the condition where you have carved words into stone and they can't (easily) be undone.

    Is there any other possible origin of that phrase?

  2. I am sorry for your pain using Google. by jellomizer · · Score: 4, Interesting

    However should you have gone to a library, and perhaps worked with a Librarian to help guide you in your research?
    Google is a good search tool, but it isn't a research tool.

    --
    If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    1. Re:I am sorry for your pain using Google. by Anonymous Coward · · Score: 4, Insightful

      Google WAS a good search tool. Nowadays it's damn bad. My most hated anti-feature right now is the impossibility to force a word to appear on the results. Back in the day you only needed to add a '+' in front of it, but it no longer honors it so they can brag about how many millions of "results" they give you, even if you don't want them.

    2. Re:I am sorry for your pain using Google. by jellomizer · · Score: 3, Interesting

      Well Google's Marketing would say most anything to keep the company in good graces.

      However Google and its like services, Are part of the solution but not the full solution.

      Searching is an important part of research, Google is a good tool for researchers, but it only help them search. A modern Librarian, can help you use Google to get more context out of your searches, direct you to Non-Google tools, and often the library will have access to data that is often behind a paywall.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    3. Re:I am sorry for your pain using Google. by Anonymice · · Score: 2

      Wrap the word in quotes.

      There's a basic list of search syntax at the below link, however you can find far more if you search "google search syntax".
      https://support.google.com/web...

    4. Re:I am sorry for your pain using Google. by del_diablo · · Score: 2

      Using google to search for fantasy books you have read as a child is amusing.
      1/2 of the stuff it manages to bring up to the search result is 'you must read 10 best books' with no relevance to the keywords. The rest is randomly high ranked search results. What is hillarious that it also generates links to forums that is inactive, where peopled did collective mindwork to do the same thing.

      Some keywords like 'rat' or 'mouse' also draws in extremely weird search results that has nothing to do with the query.

  3. It's not broken by nospam007 · · Score: 3, Funny

    It just now thinks you didn't _mean_ what you entered.
    Join the line.

  4. Prior to the 1940's? by HiThere · · Score: 4, Informative

    I'm rather certain that I've read the phrase in something that was written either in the 1940's or the early 1950's, and it didn't seem a unique turn of phrase in the place where I read it.

    FWIW, James Joyce says, in "Portrait of the Artist"

    It is peopled by the images of fabulous kings, set in stone. Their

    I don't know why Google didn't find that for you. OTOH, I haven't enough google-fu to use Google search to search for a range of dates.

    --

    I think we've pushed this "anyone can grow up to be president" thing too far.
  5. Wrong tool? by Flexagon · · Score: 2

    Maybe, just maybe, Google Books is a poor choice for a tool. As big as it is, it's going to be spotty, and weighted toward more recent, digital, texts, and ones that are sufficiently available for scanning.

    Better to use something that represents actual research. If English is your focus, which it seems to be given your current line of attack, it might be better to look in the full Oxford English Dictionary, readily available in and through your local library, even digitally.

  6. Google Books Has Been Deteriorating For Years by careysub · · Score: 4, Interesting

    Like most of its projects, Google has lost interest in Google Books and has not bothered to maintain it, much less continue developing it. This has been going for more than a decade now. NGram search for example stopped adding new texts to the index in 2008.

    Google fought and won a court case to put 25 million more orphan books which it had already scanned, out of print and largely unavailable, into Google Books. But decided it wouldn't bother. Because out of print books cannot be monetized, it would seem and thus are of no interest to Alphabet, which has over $100 billion in cash on hand. Spending a few million to support Books would shave a small fraction of a percentage off the growth of its investment wealth which is unacceptable to the company that has officially retired the "Don't be evil" slogan.

    At least they haven't pulled the plug on it entirely. I guess there is still some monetization to be had from in-print books.

    --
    Starships were meant to fly, Hands up and touch the sky - Nicky Minaj
    1. Re:Google Books Has Been Deteriorating For Years by H3lldr0p · · Score: 3, Interesting

      At least they haven't pulled the plug on it entirely.

      AFAIK Alphabet has put the "good" version in universities where the library admin does all the heavy lifting of scanning in books and such. That was part of the suit settlement. The public doesn't get to access researchers have.

      If I were the Fine Author, I'd head over to one of the unis that signed up with Google and use it there before declaring any sort of hard result.

    2. Re:Google Books Has Been Deteriorating For Years by Aighearach · · Score: 2

      My local University Library even has banks of automatic book scanners in case somebody wants to add a book from the shelves to the digital collection. Instead of the old "photocopy the whole text" strategy that was in common use in the past.

      Unfortunately, it is only available to staff and students; I have a library card that lets me check out books, but I don't have access to the digital copies or the book scanners.

  7. There's irony for you by Solandri · · Score: 2

    Maybe master writing intelligible sentences before worrying about entomology

    Um...

    • entomology is the study of insects
    • etymology is the study of word origins
    1. Re:There's irony for you by 93+Escort+Wagon · · Score: 3, Funny

      • entomology is the study of insects
      • etymology is the study of word origins

      Well, except it’s obvious the write-up was really bugging him.

      --
      #DeleteChrome
  8. Re:the future of research is scary by 93+Escort+Wagon · · Score: 5, Interesting

    Wasn’t that one of the early signs of civilization’s decay in Asimov’s Foundation universe? Scholars no longer did original research on their own; they’d just study what previous researchers had already written on a subject, and re-summarize it?

    Sometimes it’s scary how prescient that dude was.

    --
    #DeleteChrome
  9. Google- not broken if you ask the right question. by az-saguaro · · Score: 2

    I assume you searched in Google Ngram.
    https://books.google.com/ngram...

    If you search "set in stone", it appears that usage is a latter day idiom.
    But, here is the secret to this conundrum. Language and idioms change - shift, migrate, morph - similar but slightly evolved words to express the same idea.

    The inherent idea is that something is immutable, indelible, unerasable, uneditable, irrevocable. It is predicated on the idea that you can write, sketch, mockup, proof all you want and still make corrections, like hitting the preview and edit buttons on a Slashdot post, but once you hit submit, your words are eternal, just like when the stone carver finally etches the words into a stele or tombstone.

    Writers write. Typographers set. Artists etch. Stone carvers carve. Through history, all such variations have been used. But since carvers carve, one might think that the classical idiom is carved in stone, with the other variations being corrupted forms based on more modern communication paradigms.

    So, do what I did. Too bad I cannot post a screen capture, but you can do this yourself.
    Go to Google Ngram Viewer.
    Enter (copy-paste) the following line in the search box:

    carved in stone,written in stone,set in stone,etched in stone

    "Carved in stone" is abundant, going back well before 1800.
    The other three have arisen just in recent decades.
    So, prior generations used the idiom correctly. Recent generations have used analogous but technically incorrect variants.
    Collectively, "written, etched, set" were originally just a tiny fraction of the whole, but recently their usage is rising. This means that current generations have either forgotten the true idiom, have gotten sloppy, or have fallen into a wave of rhetorical monkey-see monkey-do copycat-ism or fadism.

    Something else interesting.
    The "written, etched, set" curves are quite congruent, all showing a rapid uprise starting 1970,then an inflection circa 1990, and now topping out, with "set in stone" becoming asymptotic with or equal to "carved in stone", thus the dominant modern transmigration of the idiom. The "written, etched, set" curves are the classical sigmoidal curves of the Verhulst equation of population dynamics. These curves imply that usage of these variant terms is reaching population saturation, each term in its own camp, with non-traditional verbiage having overtaken classical verbiage.

    So, Google is not broken, in fact could be a rather clever historical research tool.

    https://books.google.com/ngram...

  10. Re:the future of research is scary by TuringTest · · Score: 3, Interesting

    He was not prescient. Like Orwell with 1984, he was largely documenting contemporary trends of his time, just with enough insight to extrapolate their consequences.

    The literary device of Science Fiction was used to strip it the narration of the emotional attachment to real world politics, which is how SF usually works.

    --
    Singularity: a belief in the "God" idea with the "demiurge" relation inverted.
  11. "Set in stone" is probably a mash-up. by Rambo+Tribble · · Score: 2

    What is likely the original phrase, I remember hearing as early as my childhood in the 1950s, was "carved in stone". A equivalent phrase, most likely newer in origin, is "set in concrete". You can see the evolution. It's kinda like "irregardless".