Slashdot Mirror


Search Engine Learns From User Feedback

An anonymous reader writes "Ian Clarke, founder of the Freenet project, has set up a web search engine that allows users to rate each of the search results it returns. WhittleBit will use your feedback to determine which keywords should be added or removed from your search, then you can search again to get more accurate results. This could be useful for those cases where Google just refuses to return the search results you want. Could improved interactivity be the next big search engine advancement after Pagerank?"

10 of 269 comments (clear)

  1. no it won't replace google. by garcia · · Score: 5, Interesting

    Could improved interactivity be the next big search engine advancement after Pagerank?"

    In short, no.

    I have tried Whittebit before (a user had a link to it in his .sig on Slashdot). I was unimpressed with the results the first time (there were 8 or so to work with) and limiting with the thumbs down was of little use when there were so few results.

    I can't see google's superiority being challenged by this at all. What else would Whittebit offer me other than this "feature"? I didn't see anything else when I used it (and in fact, was rather annoyed by the fact that it remained at the top of the screen while reading the link I was sent to).

    No thanks, just my worthless .02

  2. Kaltix by bmongar · · Score: 4, Interesting

    I think something like what Kaltix is trying has a better chance of replacing Google. However I don't see that happening either. I just think Google will learn from the user based systems

    --
    As x approaches total apathy I couldn't care less.
  3. I like it. by Doesn't_Comment_Code · · Score: 4, Interesting

    I like the idea of interactive page rankings. I don't think it should be the one decisive ranking alogrithm. But human interaction is just what search engines need.

    I do a lot with Google, and it leaves some to be desired. The goal of Google is to make the ranking of pages partly out of the hands of webmasters, so they can't just trick the spiders. And that has worked very well for Google (serves over 70% of internet searches). But all page ranks are very cold and calculated. Maybe that cold, calculated rank is a good place to start, and then it's time for human reviewers to fine tune the list.

    By the way, Google has attempted to acheive this concept of human ranking by watching to see how long you stay at a page you clicked on. If they rank a page 1, and you click it, and immediately return to the search page, they penalize that page. So if even Google is trying the same abstract concept, it probably has a future on the web.

    --

    Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.
    1. Re:I like it. by Thoguth · · Score: 4, Interesting

      By the way, Google has attempted to acheive this concept of human ranking by watching to see how long you stay at a page you clicked on. If they rank a page 1, and you click it, and immediately return to the search page, they penalize that page. So if even Google is trying the same abstract concept, it probably has a future on the web.

      If that's true, then the way I do searches is counter-productive. I load the google search page, and then middle-click all the links that look the most promising and read them in tabs. No wonder Google's searches have seemed to get worse and worse for me lately, I'm training it to think my most promising results are no good!

      --
      The requested URL /iframe/sig.html was not found on this server.
  4. Ack! Do you know what you're doing? by numbski · · Score: 4, Interesting

    This is a great idea in concept, but the potential for abuse is incredibly high (if it's implement on a system that actually matters, like google).

    Imagine for a moment, a geek for hire, such as myself, writing a PERL script and deploying it on several servers nationwide. It uses LWP::UserAgent and spoofs a few different versions on IE on Windows. It then run searches for hot keywords that my client wants to rank high on. Then it 'mods down' anything it isn't my client's product, and 'mods up' what is, or links to, my clients products.

    Set the script to run several times a day at each location. Write some spyware that does so in the background of a shareware-app-for-hire (Kazaa?).

    You see where I'm going with this? Protections would have to be in place.

    --

    Karma: Chameleon (mostly due to the fact that you come and go).

  5. Re:Cool, but can't last by saskwach · · Score: 4, Interesting

    I think this is for whittling down a person's individual searches. My preferences when I'm searching for something about rj45 plugs won't affect yours. This could be cool if used in conjunction with pagerank, so that I don't have to keep clicking on all the little "o"s...it makes it so I only have to see 1 page of links.

    The biggest flaw I can see with this system is that if I'm looking for something rare and specific, once I find it, I won't thumbs-up it, I'll just click on the link...It might be useful to have a "thumbs-down all on page checkbox" which might narrow the search intelligently.

  6. Google is Highly Accurate by (eternal_software) · · Score: 4, Interesting

    "This could be useful for those cases where Google just refuses to return the search results you want."

    That has really never happened to me. Google is fast and extremely accurate, especially when you do a more advanced search, + this and - that.

    I'm not sure I would want to take the time to "rate" search engine results and re-search when I can just fine-tune my search from the start.

  7. What is really needed is... by Anonymous Coward · · Score: 5, Interesting

    What is really needed is to separate out commercial sites. Google works great 90% of the time but when you are searching for something that triggers a response from sites trying to sell something, the results get swamped with the commercial noise.

    This would benefit commercial sites because when you really are looking to buy something, you will be guaranteed not to be annoyed by anything non-commercial.

    -- YAAC (Yet Another Anonymous Coward)

  8. Something like that by siskbc · · Score: 4, Interesting
    The biggest flaw I can see with this system is that if I'm looking for something rare and specific, once I find it, I won't thumbs-up it, I'll just click on the link...It might be useful to have a "thumbs-down all on page checkbox" which might narrow the search intelligently.

    That would help, but it would have to know why they're bad to know how it would differ from other results that might be more acceptable.

    Here's what I would do. First, instead of google returning the most relevant choices, it needs to be a factor of relevance and diversity. So, with the typical "apple" search, it would return some apple computer results, some fiona apple results, and some results about the fruit. All of those would be highly relevant, but it would only give, say, a few of each. You could then click on the more relevant results (if you wanted apple the fruit, you'd click on the three fruit links), at which point it would reject the others and give you more of what you want.

    The key here is that it would have to give diversity in the beginning for you to be *able* to differentiate things like what you want from things you don't. This is not how google works now, I don't believe.

    For what it's worth, this algorithm wouldn't be too complicated to do. I lack the programming ability, but I could do the algorithm in pseudocode (at point most decent programmers could reduce it to C++). It should be quite possible.

    --

    -Looking for a job as a materials chemist or multivariat

  9. It's called "Relevance Feedback" by gbnewby · · Score: 4, Interesting
    In the academic field of information retrieval, this is called "relevance feedback." It's a part of many information retrieval (IR) algorithms, some of which can happen automatically (i.e., unsupervised). There is also overlap with the fields of machine learning and even Bayesian processes (see today's other /. story about spam filters -- spam filtering is actually the same problem, conceptually, as search engines try to solve).

    In Yahoo and other search engines (but not Google, that I've seen), you often get a "click-through" that goes to their system before transparently redirecting to the actual URL you clicked. This is relevance feedback. It's true that the system can't determine whether you LIKED the site (aka, whether it was "relevant"), but at least it's some sort of feedback the system can use to tune.

    The other most familiar type of system I can think of is Alexa (now owned by Amazon.com, and the brainchild of the Internet Archive's Brewster Kahle). With Alexa, they could count not just that you visited a site, but how long you spent and where else you went. This is at least part of the basis for Amazon's recommendation system for books and other geegaws they sell.

    Can this work in a search engine? Yes, certainly. Does it mean that a search engine that implements relevance feedback will instantly be better than Google? Definitely not! There are many other things (about 20, from what I've heard) that go in to the ranking system that Google uses...Pagerank is one of them, but there are many other factors (such as term frequency, document HTML structure, etc.). Some these, notably Pagerank, work poorly on relatively small collections (in the TREC conference, people have almost never found that Pagerank, HITS or similar algorithmns improve performance with "only" a few tens of GB of Web documents -- a few million pages).

    Wanna know more about information retrieval? The TREC page above is very good for state-of-the-art research reports (see the Publications area -- it's all online and free). More general texts are mostly in libraries, but one good one online is Managing Gigabytes, which covers the IR aspects thoroughly and also has lots of ideas about how to use compression in an IR system (something that I'm curious whether Google & others do).