Slashdot Mirror


Trending Low-Volume Google Searches with Gootrude

michaelrash writes "The Google Trends project provides some visibility into how popular search terms like 'Myspace' or '2008 Election' change over time and points out relevant news articles that create jumps in search volume. This is a handy tool, but there are many search terms that Google Trends does not display any results for. Such terms (such as 'Linux Firewalls' — with the quotes) have insufficient search volumes to display graphs according to the error message that Google Trends generates. Fair enough. Google sets an internal threshold on search volume, and this threshold could be set for reasons that range anywhere from Google Trends is still experimental to Google not wanting to provide data on how it builds its massive search index for emerging search terms. Either way, I would like a way to see search term trends that Google doesn't currently make available to me. So, I've released an open source project called 'Gootrude' to do just this. For the past year Gootrude has collected a set of low-volume search terms and interfaced with Gnuplot to visualize them."

23 of 37 comments (clear)

  1. wow by Gewalt · · Score: 2, Insightful

    wow, um...congrats I think? I mean, after you get over your pat on the back, can anyone explain why this matters?

    --
    Modding Trolls +1 inciteful since 1999
  2. It it only me.... by vidarh · · Score: 4, Insightful

    ... or does the author of this tool seemingly not realize that Google Trends reports volume of searches, while what he's tracking is amount of documents indexed for a search term, and that there's no basis for assuming the two are correlated in a meaningful way?

    1. Re:It it only me.... by Gewalt · · Score: 5, Interesting

      I find it highly unlikely that someone who can make the page in question would not be smart enough to also understand what it is that google/trend is really doing, and as such, I choose to believe instead that the author is being intentionally deceptive.

      --
      Modding Trolls +1 inciteful since 1999
    2. Re:It it only me.... by aleph42 · · Score: 3, Insightful

      Agreed, the summary is misleading, as is the comparaison (from TFA) to googletrends.

      This aside, the interest of "gootrude" is that it's not porvided by google, and so it's part of the many efforts to reverse engineer how goole comes up with his numbers.

      Specificaly, it appears from TFA that the "number of results" stated by google is a wild guess for low numbers (1,000-10,000), with very sharp variations which hint at an iterative process.

      So as I get it, it's not a tool for you and me, rather for google specialists.

      --
      Don't take my posts literally; it's just code to control my botnet.
    3. Re:It it only me.... by Idimmu+Xul · · Score: 1

      The perspective he seems to be taking is not so much 'what users search for' but more 'what users post about or publish' with a view to studying the correlation of a large site publishing something and then the number of other websites or pages picking it up and running with it.

      I'm pretty sure he understands what he's doing, the article summary is just a bit twisted.

      --
      Free Playstation 3, XBox 360 and Nintendo Wii

      --
      The problem with slashdot is that most of its users were bullied and stuffed into lockers as kids!
    4. Re:It it only me.... by kestasjk · · Score: 1

      I find it highly unlikely that someone who can make the page in question would not be smart enough to also understand what it is that google/trend is really doing, and as such, I choose to believe instead that the author is being intentionally deceptive. It's a trap!
      --
      // MD_Update(&m,buf,j);
  3. Different data by UnHolier+than+ever · · Score: 2, Informative

    Google Trends plots the frequency of queries, i.e. the number of times information is asked about a subject. Gootrude plots the number of pages found, or the quantity of information google can retrieve on this subject. These are completely different.

    1. Re:Different data by alnicodon · · Score: 1

      Many thanks for making this clear : this is also what I had fathomed from the very clear summary, but wasn't too sure.

      Well.. we might actually be the two wrong ones :)

      Al.

  4. Singular works okay. by palegray.net · · Score: 1, Informative

    Such terms (such as "Linux Firewalls" â" with the quotes) have insufficient search volumes to display graphs according to the error message that Google Trends generates. Try Linux Firewall in quotes as the search term for some results.
  5. Not allowed by google by swarsron · · Score: 3, Informative

    Besides not being the same as google trends, this tool is not allowed by the TOS of google. Automatic querying of their services without prior permission is forbidden by google. But since it probably won't put any noticeable load on their network they most likely won't care

    1. Re:Not allowed by google by Vectronic · · Score: 4, Insightful

      Until there was an article posted on Slashdot that is.

    2. Re:Not allowed by google by icyslush · · Score: 1

      Google has a relatively simple API you can apply for to allow for a fixed number of automated queries of their system. It doesn't actually give you new functionality but does make automated queries of their databases "authorized". Without the API license key, you run the risk of getting noticed by them and ban-hammered if they think your just a bot scraping their data, something they do NOT like. I think this article just got in because it had both Google and Open Source as subjects. If they have figured a clean way to find SEARCH volume (which is hard) as opposed to RESULTS volume (which is stupidly easy), get back to me. :)

    3. Re:Not allowed by google by swarsron · · Score: 2, Informative

      Google doesn't give out any more keys for this api, only old keys continue to work. So if you don't already have a key you're out of luck

    4. Re:Not allowed by google by icyslush · · Score: 1

      Really? Whoops! [hides google key in lead lined safe]

    5. Re:Not allowed by google by bobbozzo · · Score: 1

      That sounds really useful; got the code posted anywhere?

      thanks

      --
      Nothing to see here; Move along.
    6. Re:Not allowed by google by vrmlguy · · Score: 1

      I'll try to post it when I get home tonight. Ironically, I'll probably post it on my Googlepage.

      --
      Nothing for 6-digit uids?
  6. Time for me... by jalet · · Score: 1

    to do something similar with my parody of google where search terms can be looked at in real time (empty or spammy search terms are replaced with fake words on display, but not in the history).

    --
    Votez ecolo : Chiez dans l'urne !
  7. Over 2 hours by TheCycoONE · · Score: 1

    This article has been on /. for almost 3 hours and "Linux Firewalls" still isn't a significant enough search query for Google Trends? Well THAT is surprising.

    1. Re:Over 2 hours by El_Oscuro · · Score: 1

      Just did it a few times. If everyone on /. does it, maybe we can hose their statistics...

      --
      "Be grateful for what you have. You may never know when you may lose it."
  8. Re:a few different results... by lpq · · Score: 2, Informative

    Just did searches on all of the terms the author mentions and got a few different numbers:

    1. "iptables attack visualization" -- 19 results (~35) (close)
    2. "single packet authentication" -- 93 (1,300) -- off by more than 1 magnitude
    3. "linux firewalls attack detection" - 9290
    3a. "Linux Firewalls Attack Detection" - 9240 (~9000) (close)
    4. cipherdyne -- 85,200 (~70,000) ~off a bit
    4a.Cipherdyne -- 84,500 (~70,000)
    5. gpgdir (same)
    6. fwsnort (same)
    -------
    Note...caps vs. no caps made no difference on 1, 2 and 5. But for terms 3 & 4, caps made a slight difference ... anyone know why? I thought caps were supposed to be ignored?

    Most were close, but cipherdyne had about a 15% difference, but the worst was "single packet authentication" -- That one was off by more than 10x! Wonder what's up with that.

    Interesting curiosities...

  9. OT : Moving average and graphs by 4D6963 · · Score: 1

    Everytime I see graphs with a moving average, be it in TFA or some stock market graph it makes me cringe. OK, the moving average isn't the best filtering out there, there's a whole range of finite impulse response filters that have a more desirable frequency response than a moving average (which is convolution a rectangle, which means its frequency response is essentially a sinc function, which means a shitload of ripples), but why on Earth don't they compensate for the delay induced by the convolution?

    Why do they let it have half the rectangle's width in delay when they could just compensate it so that the curve wouldn't look offset compared to the original data. And most mind-blogglingly, why on Earth do the same sort of people add another curve that is the difference between the original data and the delayed moving average?? Why oh why? It's senseless, as if the moving average was compensated then you could call it a high-pass filter and directly look at the high frequency components of the original data without adding any parasite low frequency component which doesn't match to anything desirable.

    Someone enlighten me please.

    --
    You just got troll'd!
  10. Privacy? by Temporal · · Score: 2, Insightful

    Google sets an internal threshold on search volume, and this threshold could be set for reasons that range anywhere from Google Trends is still experimental to Google not wanting to provide data on how it builds its massive search index for emerging search terms.
    Or maybe for privacy reasons? Some search queries implicitly reveal the identity of the person making them. Such queries are naturally low-volume, so refusing to show low-volume queries is an effective way to protect the privacy of the searchers.
  11. michaelrash by michaelrash · · Score: 1

    I have updated my original post to address some of the comments made here on Slashdot. Peer review is always good, and thank you all for the insights.