Slashdot Mirror


Google Previews New Search Infrastructure

Google has announced a "developer preview" of a new search infrastructure, though one wouldn't have to be a developer to try it out. Google is asking for feedback on how the search results in the new regime stack up against the old. Matt Cutts has posted a mini FAQ. Some early testing indicates that the new search may be faster in some cases, and return more relevant results, than the old one. Those who attempt to game Google search for a living will be scrambling henceforth. Has anyone identified the new crawler bot in log files?

36 of 129 comments (clear)

  1. Re:Major Disapppointment by religious+freak · · Score: 2, Insightful

    Yeah, I kinda feel you there. I'm kind of itching for some real leap in progress; I think it's due. Semantic queries ala wolfram alpha (well, not LIKE wolfram alpha, but what wolfram is trying to do) are where I'd expect things to go. Seems like the old guard are running out of ideas.

    --
    If you can read this... 01110101 01110010 00100000 01100001 00100000 01100111 01100101 01100101 01101011
  2. New crawler bot... by Gavin+Scott · · Score: 5, Insightful

    Why would there be a new crawler?? How many more copies of the Interwebs does Google need?

    G.

    1. Re:New crawler bot... by Thanshin · · Score: 2, Funny

      Why would there be a new crawler?? How many more copies of the Interwebs does Google need?

      The answer to your question is: "Yes. Yes indeed."

      Thank you for betatesting our new rethoric responder.

    2. Re:New crawler bot... by libcrypto · · Score: 2, Interesting

      My thoughts exactly. They probably developed a new algorithm for finding the best results. There is no need for a new crawler. Found this link on search engine architecture which is helpful. http://infolab.stanford.edu/~backrub/google.html

    3. Re:New crawler bot... by Will.Woodhull · · Score: 2, Interesting

      New crawlers are needed because the web is changing.

      1. The automated cross referencing system on some blogs requires new logic to identify which article is the true search target, and which ones are simply referencing that article.
      2. The increasing use of ajax techniques to update portions of a web page requires a new approach to crawling.
      3. Other new ways of delivering content are also forcing changes, but these two are sufficient to make the point. Teh intarwebs is changing, and teh spiders need to be redesigned to crawl through all them new types of tubes.

      Some of these problems will be mitigated by HTML5 (assuming that web developers adopt the new standard-- which is likely for those not married to the Microsoft ecosystem). But even when HTML5 becomes fully mature, there will need to be some big changes in crawler and indexing technology.

      --
      Will
  3. New algorithm = more relevant results by maxwell+demon · · Score: 5, Insightful

    The more relevant results may be just because the algorithm is new, so the SEOs couldn't yet optimize for it. If it really gives more relevant results will be seen after it is the main search algorithm for some time.

    Remember, in the beginning the old algorithm used to be very good in finding relevant results.

    --
    The Tao of math: The numbers you can count are not the real numbers.
    1. Re:New algorithm = more relevant results by CarpetShark · · Score: 5, Interesting

      Remember, in the beginning the old algorithm used to be very good in finding relevant results.

      I'm not convinced that the degradation is entirely due to SEO. Google used to be a much more technical search -- when you used specific terms, you got specific matches. It seemed to be very much like Altavista with AND between each term. Now, you get a mix of things, as if it was OR between each term. Granted, *that* could be just SEO.

      Secondly though, if you search for X, you're asked if you meant Y, and your search results already seem to be for the popular Y result they think you meant.

      Likewise, you used to be able to search for hyphenated-terms (I hyphenated all time because it's usually a character less, and requires less editing after the fact than putting quotes around words), but now, it seems to split them into two terms.

      I think google have dumbed down their search for people who don't know how to use search engines.

    2. Re:New algorithm = more relevant results by Trepidity · · Score: 2, Insightful

      The web itself has changed too, for reasons other than SEO (though it's sometimes hard to tell which is which). PageRank isn't a universal law of nature, with the "best" result to any particular query being related to how many incoming links a particular site has. Rather, it's a heuristic based on something that often happened to be true--- the most useful information was located on pages at sites that were frequently linked to. It's possible that correlation is no longer as strong as it used to be.

    3. Re:New algorithm = more relevant results by dublindan · · Score: 3, Interesting

      I agree. What I hate is if I search for "foo bar baz" it seems to ignore that I put quotes around it.. If I put quotes, I'm looking for EXACT matches.. but Google seems to still treat it as foo OR bar OR baz... :'(

    4. Re:New algorithm = more relevant results by dnwq · · Score: 4, Informative

      I don't know about you, but I get exact matches for "foo bar baz".

    5. Re:New algorithm = more relevant results by CAIMLAS · · Score: 4, Insightful

      Too bad this can only be modded to +5. It needs to be made 'sticky' to the top of the thread (and every goddamn Google programmer's forehead, ever).

      Seriously: can we PLEASE have the ability to accurately filter things via syntax include/exclude and grouping again? I know it still 'works' but it doesn't work half a damn. Every once in a while I'll google for an error or some such and i'll have to prune it down to a handful of terms to even get results (and I know there should be more than just a handful for these kinds of things, because it's not uncommon.) Google is becoming almost useless for technical searches.

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    6. Re:New algorithm = more relevant results by Paaskonijn · · Score: 5, Informative

      Secondly though, if you search for X, you're asked if you meant Y, and your search results already seem to be for the popular Y result they think you meant.

      Try searching for +X.

    7. Re:New algorithm = more relevant results by value_added · · Score: 2, Interesting

      Google used to be a much more technical search ...

      I tend to agree, but IIRC, casual searches for technical terms were never that good. In my case, I invariably still get an unfiltered (read "near-endless") list of links to mailing list posts (identical content hosted by different list aggregators), or my favourite, the same frigging README file stored on what seems to be every other server on the internet. At least in the past, some of us could rely on usenet (as archived by Google groups) searches to separate out the chaff, but today everyone insists that web-forums are the way to go, so the signal-to-noise ratio is higher than ever.

      Granted, there's typically few ads possible for technical searches, so Google has no monetary incentive to improve them, but you'd think some geek employed by Google and trying to find useful information in a web search would step up and suggest an improvement or two.

      Then, again, maybe he's searching for things like deals on cameras (or Britney Spears) like everyone else. ;-)

    8. Re:New algorithm = more relevant results by MadMaverick9 · · Score: 2, Insightful

      Well - I guess "EXACT" means different things to us then ...

      In my world "foo bar baz" is not the same as:

      "foo, bar, baz"
      "foo, :bar, :baz"
      "foo = bar = baz"
      "foo->bar->baz"

      Oh well ... could just be me ...

    9. Re:New algorithm = more relevant results by Serious+Callers+Only · · Score: 4, Interesting

      Google seems to ignore punctuation, that's why you'd get those results.

      You put in "foo, bar, baz", it searches for "foo bar baz". It does not search for foo OR bar OR baz, as you suggested, it just strips the punctuation, and then searches for that exact phrase. There's a guide to the methodology you can google for.

      I understand why they omit punctuation, but It'd be nice if you could ask it to search including punctuation easily (not sure if you can), as it makes searching for code or precise phrases (with puncutation) very difficult.

    10. Re:New algorithm = more relevant results by PsychoSlashDot · · Score: 5, Insightful

      I could live with the current semantics just fine if there were two Google modes: research and purchase. When I search for "Laserjet 4000" in research mode, I'm explicitly saying that I'm searching for pages ABOUT Laserjet 4000 printers, and absolutely not looking for a way to BUY a Laserjet 4000. Contextually isolating these two modes would be hugely helpful. When I want to buy a Widget and I'm simply looking for the best deals, I don't want a bunch of pages where people are reviewing or discussing the product. When I want to fix my Widget, I don't want a bunch of pages trying to sell me a new one. Sometimes a mixture is good, but for me it usually isn't.

      --
      "Oh no... he found the .sig setting."
    11. Re:New algorithm = more relevant results by DrEldarion · · Score: 2, Informative

      They already have a "reviews" restrict, and they have an entire section dedicated to commerce:

      http://www.google.com/search?q=laserjet%204000&hl=en&output=search&tbs=rvw:1&tbo=1
      http://www.google.com/products?q=laserjet+4000&aq=f

  4. Re:First Post by darkvad0r · · Score: 5, Informative

    stop spending them, that'll do (at least it worked for me)
    alternatively, you could check your settings and set the relevant option to "I don't want to help" (see the FAQ)

  5. What I'd like to see from Search 2.0 by Zocalo · · Score: 3, Interesting

    Actually, I'm mostly fine with the speed and typical results I'm getting at the moment. What annoys me the most about searching is when the first several pages of results are full of links to places that require you to have an account before you can access the answer or download the file. If I could define a blacklist that automatically excludes some of the worst offenders from my queries, that would be worth far more to me than shaving a few milliseconds of each search.

    --
    UNIX? They're not even circumcised! Savages!
    1. Re:What I'd like to see from Search 2.0 by Anonymous Coward · · Score: 4, Informative

      You can see content of experts-exchange.com "answer" using the "cached" link under the Google result, Then just scroll down past the bogus posts and you'll see the real posts.

    2. Re:What I'd like to see from Search 2.0 by cyclomedia · · Score: 2, Informative

      And the fact that if you ever search for the name of a piece of software the first 100 results are brothersoft.com, getyourfreeshithere.com, freesoftwarefix.com, warezfactory.com etc etc etc etc

      --
      If you don't risk failure you don't risk success.
    3. Re:What I'd like to see from Search 2.0 by LordLimecat · · Score: 3, Interesting

      All that matters is that your referrer is google. Doesnt have to be cached-- if what you see on the live page is different from what the googlebot sees, google will drop them from the results for SEO violations.

  6. Re:First Post by master5o1 · · Score: 2, Funny

    But that's too easy.

    --
    signature is pants
  7. Re:Major Disapppointment by Korin43 · · Score: 4, Interesting

    The least they could do is update the calculator.. I mean, why can't I put in "2 pounds of chocolate in cups" and get an answer? I realize that finding out the density of chocolate may be difficult for Google to do, but why not team up with Wikipedia (have people add things like densities to articles, and then Google can crawl that and use it for calculator results). Or even easier, things that can be found on the periodic table, like "10 kg of lithium in moles" or "atomic weight of calcium".

    There seems to be so many things that it could be much more helpful with, and it can't be that hard since it already can answer questions like "What is the mass of the earth times the speed of light squared?", so why can't I ask for the "mass of the earth expressed as energy" (or possible "mass of the earth in joules")?

    I guess it's probably just that Google doesn't get many ad clicks when people ask the calculator questions :(

  8. Re:Major Disapppointment by koolfy · · Score: 5, Interesting

    two words :
    Exalead
    Yauba

    Exalead is more powerful, and Yauba is a little less effective for specific search like "gentoo bug kernel 2.6.30 fglrx", but guarantees 100% anon, and is pretty powerful and useful in some cases.

    Google is not the better search engine on the web, their new engine is very good, but google itself hasn't envolve since... I don't know, it's always the same, and we barely see new features added. (take a look at exdalead labs).

    After testing several search engines, it appears that google is not the one with the best ideas, and that pertinence and engines of others like exalead aren't bad enough to consider them inferior to google. Google is the most known, and others well known like bing are not as powerful as those two less-known search engines.

    --
    Segmentation Fault in "Life, Universe and Everything" at line 42. Don't Panic.
  9. Could we please go back to Google Search ~v2003? by CAIMLAS · · Score: 4, Interesting

    I don't know about anyone else, but I used to get much more search-contextual information on fringe information from Google, even when compared to a highly-tailored search. I don't know if Google does its indexing differently now, or if it's indexing/crawling different subsets of data, but the results are not only different, but often less useful in an academic/info-junkie sense.

    For instance, searing for "hammurabi" now results in Wikipedia being the first link. This is true for most searches where there's a wiki page, and for many where the search phrase is simply mentioned in the wp page (yet there is no individual wp page for the topic). A lot of the sites I've got bookmarked when researching superstitions and myth surrounding his code (giants, atlantis, etc.) which are still present do not show up in the search results today - but did around 2003.

    Likewise, search for anything which might have current cultural significance ('bush war crimes') and then compare it to something that had cultural significance just a couple years ago ('saddam war crimes'). The results are drastically different and (in the case of the former) cater to lazy people; they also make actually finding a -site- (as opposed to just a 'current event' article) on the topic somewhat more frustrating. (This is just an example, though there are plenty of other similar situations - forgive my 3am brain.)

    Now, it might be that Google has actually gotten a lot better at returning pertinent results: so good that those little things I see and go "ohhh interesting! *click*" don't occur nearly as often, and as an info junkie, I view google as having degraded.

    Who knows. Still head over heels better than Bing or anything else out there, as far as I'm concerned. I'm glad more progress on 'searching better' is being made. I just wish they'd not clog the works making -cultural- assumptions about what I'm after and stick to the semantics of my search phrases.

    --
    ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
  10. I can help you by Anonymous Coward · · Score: 3, Funny

    If you don't want to use them, I can do that for you. For some reason, I seem to never get mod points. So... Please PM me your password.

    -Yours, Anonymous " Coward

  11. Re:Major Disapppointment by ChienAndalu · · Score: 5, Insightful

    Since when is "putting cruft on search results page so that it is barely usable" and "not implementing sessions and cookies" evolution? Google won because it was nice and clean compared to altavista and yahoo.

  12. Social networking sites ranked lower by Paaskonijn · · Score: 2, Insightful

    I see that name searches for unimportant people (like myself) don't put the Facebook, Netlog, Myspace, ... results on top anymore.
    Progress!

  13. Re:Major Disapppointment by ubrgeek · · Score: 2, Interesting

    I found and started playing around with iseek for my Master's classes and have been impressed with the results. Being able to ask questions using natural language is really helpful when I'm not sure exactly for what terms I'd be searching when I first start looking for answers.

    --
    Bark less. Wag more.
  14. Re:Major Disapppointment by ChienAndalu · · Score: 5, Insightful

    You seem to equate "features" with quality of the search engine.

    Some value

    - speed
    - a clean interface and
    - relevance of the search results (which can be improved by analyzing my previous searches)

    If you want to surf the web anonymously, use TOR. Trusting the site saying "we don't have server logs, PROMISE" is silly.

  15. Re:Major Disapppointment by Anonymous Coward · · Score: 2, Insightful

    The real problem is that the web is ever-expanding in it's multimedia capabilities... and our ability to index such media is falling woefully behind. We don't have any magic software to scan through a video, identifying objects, and sorting out major themes to tag it with... that's left to the folks who upload them. The same could be said for pictures and audio... and even, in some cases, text. How many times have you been searching for some form or other that some company keeps a PDF of that is a scanned image from a hard copy (so that the text is not search-able)?

    More hard research needs to be done into automatically creating indexing terms for all of the various media out there. Once this starts to happen, we have a chance (albeit small) of taming the web.

  16. Bye-Bye content spinners!!!! by Tsu+Dho+Nimh · · Score: 3, Insightful

    This is going to mess up the content spinners and the paragraph swappers who are trying to either attract ads or build a link farm. Those who have well-build, informative, content-rich pages can sit back and watch the fun.

    "Content Spinning" explained, kinda sorta

  17. Re:Major Disapppointment by eric_brissette · · Score: 3, Funny

    I doubt it has anything to do with ad revenue. There are too many possible variables for this type of search to be useful.

    Chocolate in what form? chips? a solid brick? syrup? cocoa powder? melted?

    Two lbs of sawdust in cups. What type of wood? Birch? Poplar? Maple? Walnut? Sawdust from a chainsaw or a table saw?

    You have got to be kidding me. What next? Two lbs of filing cabinets in gallons?

  18. Re:SEO results by PotatoFiend · · Score: 2, Insightful

    Oh great, my site drops from position #4 to position #44, with no explanation as to why.

    Conversely, if a search result goes from #44 to #4 simply because someone paid some SEO firm to make that happen, the search results should state so explicitly. When you pay for SEO you're feeding a disease that renders the search algorithms increasingly ineffective. Gaming a public resource is selfish, and with this "reset" by Google you're witnessing how your actions can come back to hurt you in the long run.

    And it makes no sense from an objective relevance standpoint.

    Please explain how paid gaming of the system is objective.

    --
    "Liberty may be endangered by the abuses of liberty as well as the abuses of power." -- James Madison
  19. Re:Major Disapppointment by ChienAndalu · · Score: 2, Interesting

    My head is fine without any tinfoil, thank you. I have much personal information on google and don't care much about anonymity. I often use my real name on the Internet (maybe even here someday).

    But I know that difference of using a site that says "I promise you anonymity" and Tor.