Slashdot Mirror


Search Engine Learns From User Feedback

An anonymous reader writes "Ian Clarke, founder of the Freenet project, has set up a web search engine that allows users to rate each of the search results it returns. WhittleBit will use your feedback to determine which keywords should be added or removed from your search, then you can search again to get more accurate results. This could be useful for those cases where Google just refuses to return the search results you want. Could improved interactivity be the next big search engine advancement after Pagerank?"

53 of 269 comments (clear)

  1. no it won't replace google. by garcia · · Score: 5, Interesting

    Could improved interactivity be the next big search engine advancement after Pagerank?"

    In short, no.

    I have tried Whittebit before (a user had a link to it in his .sig on Slashdot). I was unimpressed with the results the first time (there were 8 or so to work with) and limiting with the thumbs down was of little use when there were so few results.

    I can't see google's superiority being challenged by this at all. What else would Whittebit offer me other than this "feature"? I didn't see anything else when I used it (and in fact, was rather annoyed by the fact that it remained at the top of the screen while reading the link I was sent to).

    No thanks, just my worthless .02

    1. Re:no it won't replace google. by nate1138 · · Score: 4, Insightful

      Perhaps one reason there were so few result returned is the fact that this seems to be more of a proof of concept than a fully functioning engine. Imagine combining a feedback mechanism with an already excellent search like Google. This can't stand alone, but it would be an excellent addition to an engine that already has a huge index.

      One thing that does worry me, what about the potential for abuse. Something like a script that connects to whittlebit, searches by a keyword important to your industry, and gives all of your competitors links thumbs-down.

      --
      Where's my lobbyist? Right here.
  2. Cool, but can't last by Dr.+Transparent · · Score: 4, Insightful

    Great idea until the second month when your local viagra spammer's SEO guy moves all his pages to the top of the search for "Futurama" or "Ninja Turtles."

    1. Re:Cool, but can't last by xyvimur · · Score: 2, Insightful

      And that's why it will not suceed. Everything where users are given enough privileges can be turned into `unusable crap' by a group with `bad-intentions'.

    2. Re:Cool, but can't last by saskwach · · Score: 4, Interesting

      I think this is for whittling down a person's individual searches. My preferences when I'm searching for something about rj45 plugs won't affect yours. This could be cool if used in conjunction with pagerank, so that I don't have to keep clicking on all the little "o"s...it makes it so I only have to see 1 page of links.

      The biggest flaw I can see with this system is that if I'm looking for something rare and specific, once I find it, I won't thumbs-up it, I'll just click on the link...It might be useful to have a "thumbs-down all on page checkbox" which might narrow the search intelligently.

    3. Re:Cool, but can't last by agurkan · · Score: 2, Interesting

      It is possible to delay the serving of pages that require interactive action. Then an automated robot will not be very fast.
      Also, the behavioural pattern of an automated robot can be detected very easily, imagine a connection from a domain suddenly submits favorable reviews for a particular page, and no other such review is submitted. This should raise a red flag. If the effect of reviews is processed and used after an analysis, I think robots can be defeated.

      --
      ato
  3. As long as... by vasqzr · · Score: 2, Insightful


    Ad revenues have nothing to do with the ratings....

    All the good search engines end up corrupting themselves (by making money, which I guess is the point of anything...)

  4. Kaltix by bmongar · · Score: 4, Interesting

    I think something like what Kaltix is trying has a better chance of replacing Google. However I don't see that happening either. I just think Google will learn from the user based systems

    --
    As x approaches total apathy I couldn't care less.
  5. I doubt this will fly by The+Bungi · · Score: 5, Informative
    People have been doing experiments like these since the first search engine was rolled off the assembly line. They're prone to abuse and dependent on the goodwill of the user. Imagine of PageRank was based on this - that "SearchKing" dude would have a bot searching for crap and then voting "yes" every time.

    Won't work. Goodwill as we knew it in '95 is gone from the Internet.

  6. hell no by Anonymous Coward · · Score: 2, Interesting

    no, i dont want to have to give feedback in a search, I just want to type keywords and find related results ...

  7. I like it. by Doesn't_Comment_Code · · Score: 4, Interesting

    I like the idea of interactive page rankings. I don't think it should be the one decisive ranking alogrithm. But human interaction is just what search engines need.

    I do a lot with Google, and it leaves some to be desired. The goal of Google is to make the ranking of pages partly out of the hands of webmasters, so they can't just trick the spiders. And that has worked very well for Google (serves over 70% of internet searches). But all page ranks are very cold and calculated. Maybe that cold, calculated rank is a good place to start, and then it's time for human reviewers to fine tune the list.

    By the way, Google has attempted to acheive this concept of human ranking by watching to see how long you stay at a page you clicked on. If they rank a page 1, and you click it, and immediately return to the search page, they penalize that page. So if even Google is trying the same abstract concept, it probably has a future on the web.

    --

    Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.
    1. Re:I like it. by Thoguth · · Score: 4, Interesting

      By the way, Google has attempted to acheive this concept of human ranking by watching to see how long you stay at a page you clicked on. If they rank a page 1, and you click it, and immediately return to the search page, they penalize that page. So if even Google is trying the same abstract concept, it probably has a future on the web.

      If that's true, then the way I do searches is counter-productive. I load the google search page, and then middle-click all the links that look the most promising and read them in tabs. No wonder Google's searches have seemed to get worse and worse for me lately, I'm training it to think my most promising results are no good!

      --
      The requested URL /iframe/sig.html was not found on this server.
    2. Re:I like it. by JimDabell · · Score: 2, Interesting

      By the way, Google has attempted to acheive this concept of human ranking by watching to see how long you stay at a page you clicked on. If they rank a page 1, and you click it, and immediately return to the search page, they penalize that page.

      Do you have a reference for that? According to the HTTP RFC, user-agents aren't supposed to talk to the server when users hit their back buttons, but rather display what the user last saw (despite it possibly being stale). It seems odd that Google would try to circumvent this, especially as there isn't a reliable way of doing so.

    3. Re:I like it. by costas · · Score: 2, Interesting

      Well, if you're excited about user-rankings and feedback, check out the newsbot in my .sig. It focuses on user interaction with the code/algorithm to build not just page rankings but also relationships --between pages, and between users. Try it out, I am guessing you'll like it...

  8. Ack! Do you know what you're doing? by numbski · · Score: 4, Interesting

    This is a great idea in concept, but the potential for abuse is incredibly high (if it's implement on a system that actually matters, like google).

    Imagine for a moment, a geek for hire, such as myself, writing a PERL script and deploying it on several servers nationwide. It uses LWP::UserAgent and spoofs a few different versions on IE on Windows. It then run searches for hot keywords that my client wants to rank high on. Then it 'mods down' anything it isn't my client's product, and 'mods up' what is, or links to, my clients products.

    Set the script to run several times a day at each location. Write some spyware that does so in the background of a shareware-app-for-hire (Kazaa?).

    You see where I'm going with this? Protections would have to be in place.

    --

    Karma: Chameleon (mostly due to the fact that you come and go).

  9. tweaking for higher results by Ugodown · · Score: 2, Insightful

    Even though google uses PageRank, often sites are higher in the results are only there because they had the right keywords in the title. Sites like this have been tweaked with other similar tricks to score higher. Obviously, this new system would be able to get around this. Perhaps, when joined with Google, this could take over when PageRank fails to be applicable. Then we would have something great!

    --
    --- to swing on the spiral...
  10. Body before it gets /.ed by nother_nix_hacker · · Score: 5, Funny

    It was going well until we realised that all people wanted was pron so we just provide that now.

  11. Similar concept... by X86Daddy · · Score: 4, Informative

    I think I found the link somewhere on Slashdot once:

    Gnod.net is a learning system like a search engine that allows you to put in your three favorite authors/musicians/movies and it returns a series of "suggestions" that match, asking you if you like/dislike/haven't heard of each result in series.

    This sort of creature has the potential of placing the final nails in the media cartels' coffins, as it provides what's missing from current P2P and self-production techniques: a recommendation/promotion mechanism.

  12. No. by Anonymous Coward · · Score: 2, Troll

    Does this "search engine" search images? No. Google does.

    Does this "search engine" search 20 years of Usenet? No. Google does.

    Does this "search engine" provide stock quotes, maps, phone numbers, and news? No. Google does.

    Thanks for playing. Google will never lose.

    1. Re:No. by mhesseltine · · Score: 4, Funny

      Is this testing a concept? YES

      Could something like this be implemented in Google? YES

      Is this supposed to replace Google? NO

      Are you a troll? YES

      Thanks for playing.

      --
      Overrated / Underrated : Moderation :: Anonymous Coward : Posting
  13. One word - "abuse" by MrFenty · · Score: 2, Insightful

    This will quickly be abused, much like other rating systems like Amazon's book reviews. Anything worthwhile will ultimately be abused, you can be sure of that.

  14. Google is Highly Accurate by (eternal_software) · · Score: 4, Interesting

    "This could be useful for those cases where Google just refuses to return the search results you want."

    That has really never happened to me. Google is fast and extremely accurate, especially when you do a more advanced search, + this and - that.

    I'm not sure I would want to take the time to "rate" search engine results and re-search when I can just fine-tune my search from the start.

  15. Ouch - major slashdot - mirror of page by Sanity · · Score: 5, Informative
    The server is down - it was totally ill-equipped to handle a slashdotting unfortunately, I was hoping it would get some testing, but this is a bit much ;-)

    As a poor substitute to being able to play with it (try bookmarking whittlebit.com and coming back in a day or two) I will try to answer people's questions. For the moment - here is the blurb from the front page:

    What is WhittleBit?

    Have you ever searched for something and wished you could tell the search engine that it was totally on the wrong track and it should try again? Well now you can! WhittleBit works much like most other search engines, except it can help you to refine your searches by allowing you to give positive or negative feedback on each search result.

    Simply rate the search results by clicking on the "thumbs up" or "thumbs down" buttons then click on Whittle to get a refined set of search results based on your feedback.

    Tips

    • Even if you visit another site and then return, WhittleBit will remember your search query until you explicitly click the "New Search" button.
    • You can either rate a search result on the results page itself, or visit the page and rate it using the buttons at the top of the page. You will return to the WhittleBit search results after clicking one of the buttons.
    • WhittleBit requires a browser which supports "Cookies" and "Frames" such as Mozilla or Internet Explorer.
    - Ian Clarke, creator of WhittleBit
  16. Not my damn job by loserbert · · Score: 2, Insightful

    I want THEM to tell ME what the good results are, not the other way around. If I wanted to do that I'd write my own search engine. Don't bring some lame ass solution where I have to do all the work.

  17. Sounds Great...but by mstieg · · Score: 5, Insightful

    who wants to wade through results and rank them? I came here to search!

    That's why google is king. It doesn't require you to do *anything*. It barely *allows* you to do anything.

    And it still returns what you need.

    That's the perfect UI.

    1. Re:Sounds Great...but by jrkotrla · · Score: 5, Funny

      hmm...

      You're aren't required to do anything.... are barely allowed to do anything..... and this is perfect?

      you must be a Mac user, right?

      --
      In God we trust,
      everyone else we firewall!!
  18. "Free Search" has no place in the commercial web. by Boss,+Pointy+Haired · · Score: 2, Interesting

    Google's PageRank is failing miserably for commercial search. PageRank is fine for academic / informational searches.

    In a commercial environment, it is simply not possible for a free search service to exist that is fair, represents an even distribution of wealth, and is immune from abuse.

    Advertising has to be paid for. "Free Search" is fine for university sites and purely non-profit informational pages, but for a commercial search your position in search engines must be purchased based on the keywords against which you wish to bid.

    Otherwise basic economics breaks down.

  19. great news! by PhysicsExpert · · Score: 2, Interesting

    This seems like a great idea. Google might be number 1 in the search engine rankings at the moment but it would be good to see them have a bit of competition so that they do not use their dominant position for financial gain.

    Here in the lab we're doing some work on using the principles of thermodynamics in order to improve search engines. The second law of thermodynamics states that in a closed system ethalpy will alway increase, which is a lot like the disorder cause by sites spamming themselves to search engines . In addition the searching patterns of users can be thought as analogous to the fermi level of a solid. In theory applying thermodynamic equations to the process of search engines should allow for more efficient algorithms to be developed. Although this has been known for some time the process involves solving some fairly hefty quadratic equations which have needed some serious computing power to process. Hopefully though a real leap forward should be no more than a few months away.

    --
    All that glitters has a high refractive index.
  20. how long until by the_2nd_coming · · Score: 2, Funny

    how long until google buys them out?

    I give it 3 weeks after they begin getting rave reviews.

    --



    I am the Alpha and the Omega-3
  21. What is really needed is... by Anonymous Coward · · Score: 5, Interesting

    What is really needed is to separate out commercial sites. Google works great 90% of the time but when you are searching for something that triggers a response from sites trying to sell something, the results get swamped with the commercial noise.

    This would benefit commercial sites because when you really are looking to buy something, you will be guaranteed not to be annoyed by anything non-commercial.

    -- YAAC (Yet Another Anonymous Coward)

  22. How ironic? by CompWerks · · Score: 2, Interesting

    Is it that a google search for whittlebit doesn't even have a link to whittlebit.com.

    --
    If you can read this sig - the bitch fell off.
  23. Kartoo by SillySlashdotName · · Score: 2, Informative

    I have used kartoo and like it.

    It does not "learn" per se, but allows you to select from multiple possibilities using a GUI - and it has been available for a while.

    If I have problems finding something with Google, I use Kartoo.

    --
    Acts of massive stupidity are almost never covered by warranty. --me.
  24. Something like that by siskbc · · Score: 4, Interesting
    The biggest flaw I can see with this system is that if I'm looking for something rare and specific, once I find it, I won't thumbs-up it, I'll just click on the link...It might be useful to have a "thumbs-down all on page checkbox" which might narrow the search intelligently.

    That would help, but it would have to know why they're bad to know how it would differ from other results that might be more acceptable.

    Here's what I would do. First, instead of google returning the most relevant choices, it needs to be a factor of relevance and diversity. So, with the typical "apple" search, it would return some apple computer results, some fiona apple results, and some results about the fruit. All of those would be highly relevant, but it would only give, say, a few of each. You could then click on the more relevant results (if you wanted apple the fruit, you'd click on the three fruit links), at which point it would reject the others and give you more of what you want.

    The key here is that it would have to give diversity in the beginning for you to be *able* to differentiate things like what you want from things you don't. This is not how google works now, I don't believe.

    For what it's worth, this algorithm wouldn't be too complicated to do. I lack the programming ability, but I could do the algorithm in pseudocode (at point most decent programmers could reduce it to C++). It should be quite possible.

    --

    -Looking for a job as a materials chemist or multivariat

  25. Post-Google Searching by omnipotens · · Score: 2, Interesting
    I've wished that Google would do this for ages. The possibilites for increasing accuracy are endless with a model for this. I wonder where else this could go. Maybe some sort of integration with another (though possibly encumbered by its relationship with LookSmart) post-google search engine like Grub? However, this is a BIG step. Once information like this begins to be integrated into a massive database, we could see the next quantum leap in search engine accuracy, and possibly breadth. One thing that could help all of this is to watch what it going on by a list-- here is mine, so far:
    • Teoma
      Is supposedly more accurate than google, but I've found it to be only okay at best
    • Turbo10
      "Searches the deep net" by connecting to site databases to get the most relevant info. A lot of this info, however, comes from Google itself.
    • Grub (a project, not an active engine)
      A distributed search engine project. It would use tons of people's computers as crawlers like seti@home
    • WhittleBit
      Read the story
    So, maybe we'll get somewhere after google (not that google isn't a Good Thing), after all? And.... well, Ian Clarke and his projects is/are/may soon be really rocking the world. Those include:
    • Locutus: www.locut.us
      A giant search system for pre-existing content, aimed at corporations.
    • Freenet: www.freenetproject.org
      An anonomyous content-storage system that works as a giant encrypted webserver of sorts.
    • Whittlebit: www.whittlebit.com
      A search engine that learns through user interaction
    • Kanzi: http://cematics.com/site.php/kanzi
      A neat little AI hack that helps webmasters do their job easier
    • Uprizer: www.uprizer.com
      A "edge distribution network" that will optomize content distribution. It uses some Freenet Technology
  26. Google Problems by Ateryx · · Score: 2, Interesting
    Although probably bias as it is by msn, there was an excellent article about the faults of google in a past article

    Unless I read the article incorrectly, this response-feedback-accuracy was the exact cause of the problem with google as shown by msn.

    Just an observation...

    --
    "The truth suffers from too much analysis"
  27. Pagerank cool by MxTxL · · Score: 3, Interesting

    Page rank is cool, uses distributed data to improve search results. Definately AWESOME in the search engine world.

    BUT i would also like to see the distributed concept applied to searching itself. Something like this idea, but having the engine return results on what were popular click-thrus for searches. From what i can tell (IANA Google Expert) Google isn't keeping click through data on search results (they are on the adwords, but that's different). By tracking click thru data and calculating how long a user stayed at a clicked result before hitting the back button or otherwise returning to google... good insights can be learned. Aggregate this over millions of users with billions of page views... wouldn't take too long to figure out what everyone wants to see for a particular search result. Combine all of that with improving your searches by what others are searching for... i think you are talking a powerful system.

    Granted this whole idea may be liable to spamming and all of that... but that's not part of the concept yet. On the surface, it seems like a good idea.

    NOTE: I know other engines track click thrus, but i don't think any of them do it for non-advertising purposes.... if it's purely to improve results then cool. If it's to show you better ads, not cool.

  28. Totally unneeded. by Kickasso · · Score: 2, Insightful

    Web pages are already rated -- by other web pages. Ever noticed these blue underlined chunks of text? They are called links. Each link is a rating that says "Lookie here, I liked it and you might too!" And somebody already uses this rating system in a search engine. Bonus points for correctly guessing who.

  29. Not prone to abuse by blchrist · · Score: 2, Insightful
    I don't understand how this system can be abused. From the post: WhittleBit will use your feedback to determine which keywords should be added or removed from your search, then you can search again to get more accurate results.

    People are not changing how the search engine ranks the results for other people, it is just slightly modifying your query to produce more precise results. How can that be abused to make trash sites show up with rank 1?

  30. Ok, back up - kinda by Sanity · · Score: 4, Informative
    Ok, it is back up after I killed the "whittling" engine - feel free to play with the UI, but it won't do anything intelligent.

    This was more intended as a proof of concept - rather than an all-out replacement for Google. I was frustrated with the way that Google works really well if you are looking for something easily defined and-or well known, but trying to find something obscure that was "masked" by more popular sites with similar keywards could be a real PITA. Whittlebit is designed to automate the manual process of trying to refine your keyword choice to get the search results you want.

  31. Google do this by Richard5mith · · Score: 2, Informative

    I'm sure I've seen Google do this. I've occasionally seen that links I click on in Google search results get forwarded through another Google URL which is no doubt tracking what I'm clicking on.

    Like a lot of Google features they're testing though, it's very much random and it's been a month since I've seen it.

  32. I don't know about this by ChiefArcher · · Score: 2, Insightful

    I think people will start making their websites look better.. and then make other ones look bad (like it's been said in here).

    What if i get a list of proxys.. write a program and click on each of the links and rate all of them..
    It's easy as that... I don't think it'll work.

    All the porn and viagra sites will be #1

    Chiefarcher

  33. In other news... by ikkonoishi · · Score: 2, Funny

    Server declares "Nobody loves me" before crashing and taking down the search engine which allowed users to rank its results.

    Experts believe this was due to repeated thumbs down given to its site within its own results.

  34. Abuse by Ed+Avis · · Score: 2, Insightful

    So how do you deal with trolls and spammers who will vote up or vote down sites for partisan reasons? Or ignoring that, what about straightforward differences of opinion? (The world may be polarized 50/50 between those who think 'firebird' refers to a database and those who think it is a web browser - at least among the geekier-than-average WhittleBit users.)

    Anonymous feedback won't scale well to the big bad Internet; some kind of login and network of trust is needed.

    --
    -- Ed Avis ed@membled.com
  35. Re:Ack! Do you know what you're doing? by xyzzy · · Score: 4, Insightful

    You're missing the point. The system isn't watching user actions while searching to fine tune OTHER user's results, but to fine tune THAT user's results.

    While you can certainly claim that one user's actions MIGHT indicate relevance for another user's queries, it's certainly true that if a user gives you a clue that the document you have returned is irrelevant, it must be irrelevant.

  36. Google has this too by acm · · Score: 3, Informative

    If you install the google toolbar you can vote for or against pages on an individual basis.

    acm

  37. Commercial sites overflooding search engines by DRWHOISME · · Score: 2, Insightful

    Something needs to be done to seperate stories,informative article useful for research and education from the crass commercial websites that are like SPAM on all search engines. Some sort of separation needed. Do something about that and i will be happier. Just type in anything about money or business on all the search engines and you will be flooded with irrelevant links.

  38. Better for limited document sets by realfake · · Score: 2, Insightful

    While the idea has plenty of problems for use on a general web search engine, it could work very well to tune results on a site's internal search engine, where the user has no vested interest in one result coming up higher than the others, the user only wants good results.

    It might also have potential, even if the thumbs up/thumbs down are only shown to trusted users. One of the enduring problems in tuning search engines is that the people who build the search engine aren't the people who know the content best. Getting the content people some way to say "yes, this item should come up higher for this term" is a powerful idea, IMO.

  39. Netnose! by notsoanonymouscoward · · Score: 4, Informative

    Not trying to steal the show too much from whittlebit, but theres another new search engine recently released. Netnose lets the users decide which keywords a web page should be listed under. The search results also include handy identifiers about the page content like whether it has popups, or contains adult material (as decided by the raters).

    --
    I ate my sig.
  40. Re:Pleasure and Pain by commodoresloat · · Score: 3, Funny

    Great idea! Just think of the applications of such technology to pr0n!!!

  41. Google phoning home... by presroi · · Score: 4, Informative
    Could improved interactivity be the next big search engine advancement after Pagerank?"


    Well, actually, Google does receive feedback. Once in I while, google changes its result page in a way alexa is doing every time:

    You don't get a url to the result back but rather a pointer in a way like www.google.com/result?target=realurl.

    I'm sorry that I can't provide you a real url but I'm confident that someone in this /.-crowd can help me out. Thank you in advance.
  42. Abuse by sageFool · · Score: 2, Informative

    Seems totally open to abuse, and there seem like their are issues with people not rating results and keeping the statistics meaningful. If they can get something up for doing ratings and figuring out if a user thinks a result is 'good' or 'bad' that is easy for the user to use, isn't abuseable, and has some kind of statistical validity I will be impressed, but I think it is much harder to do than most people think. Yar!

  43. It's called "Relevance Feedback" by gbnewby · · Score: 4, Interesting
    In the academic field of information retrieval, this is called "relevance feedback." It's a part of many information retrieval (IR) algorithms, some of which can happen automatically (i.e., unsupervised). There is also overlap with the fields of machine learning and even Bayesian processes (see today's other /. story about spam filters -- spam filtering is actually the same problem, conceptually, as search engines try to solve).

    In Yahoo and other search engines (but not Google, that I've seen), you often get a "click-through" that goes to their system before transparently redirecting to the actual URL you clicked. This is relevance feedback. It's true that the system can't determine whether you LIKED the site (aka, whether it was "relevant"), but at least it's some sort of feedback the system can use to tune.

    The other most familiar type of system I can think of is Alexa (now owned by Amazon.com, and the brainchild of the Internet Archive's Brewster Kahle). With Alexa, they could count not just that you visited a site, but how long you spent and where else you went. This is at least part of the basis for Amazon's recommendation system for books and other geegaws they sell.

    Can this work in a search engine? Yes, certainly. Does it mean that a search engine that implements relevance feedback will instantly be better than Google? Definitely not! There are many other things (about 20, from what I've heard) that go in to the ranking system that Google uses...Pagerank is one of them, but there are many other factors (such as term frequency, document HTML structure, etc.). Some these, notably Pagerank, work poorly on relatively small collections (in the TREC conference, people have almost never found that Pagerank, HITS or similar algorithmns improve performance with "only" a few tens of GB of Web documents -- a few million pages).

    Wanna know more about information retrieval? The TREC page above is very good for state-of-the-art research reports (see the Publications area -- it's all online and free). More general texts are mostly in libraries, but one good one online is Managing Gigabytes, which covers the IR aspects thoroughly and also has lots of ideas about how to use compression in an IR system (something that I'm curious whether Google & others do).

  44. So why *isn't* this being done? by siskbc · · Score: 2, Interesting
    No offense but:

    In general, statements like that are used by people who haven't actually thought through the algorithm in detail, or who don't have good knowledge of algorithmic theory.

    None taken. Put it this way - I could write it in matlab, and I could write it pretty bad in C++. However, I'm not familiar with google's code, and wouldn't be able to integrate it into that. But I could write a version of it, just not as it would need to be, final form. In other words, I'm very familiar with the algorithms involved, that's definitely not the problem. I do work on problems similar to this in grad school - the source of the data is completely different, but the same tools can be applied.

    In specific, your suggestion sounds excellent. Sufficiently excellent that I would be very surprised if Google, with their famously large R&D division, didn't have some very smart people thinking about it or something similar.

    Thanks, and I agree - if they're not doing this, they should be/have been. What I outlined would be reasonably accomplished through new applications of existing decision theory algorithms.

    Thinking about it briefly the first couple aproaches I come up with wind up being factorial time. Plus there is a lot of fuziness as far as how to promote Fiona Apple links but not just lousy Apple Computer ones, not to mention search terms where the "families" of hits are less distinct than for Apple.

    it's not as fuzzy as you'd think, and I think this could be done with less computational overhead than you'd initially believe. Basically, what we have is a classic supervised pattern classification algorithm, where the two classes are "useful" and "not useful." At the point where you tell it the groupings, then it's just a matter of determining what characteristics are common among the groups. You'd have to reduce the results to more ordinal characteristics, but this would be a solution similar to how mozilla translates emails into vectors of charactersitics for their Bayesian mail filters.

    Most of this could be done starting with, say, a few hundred results or so per search. Arranging into categories from here would be fairly trivial, at which point those categories would be presented to the user. The user could then update the relationships as they are determined by the computer, and resubmit.

    Of course, the more samples you use, the more overhead. Also, the more descriptors/features/parameters, the mroe overhead. Using one way of doing it, the problem would be linear with samples, and O(N^3) with features (due to a matrix inversion). Not all that bad, particularly when the number of features can be capped, and does not grow (necessarily) with samples.

    --

    -Looking for a job as a materials chemist or multivariat