Search Engine Learns From User Feedback
An anonymous reader writes "Ian Clarke, founder of the Freenet project, has set up a web search engine that allows users to rate each of the search results it returns. WhittleBit will use your feedback to determine which keywords should be added or removed from your search, then you can search again to get more accurate results. This could be useful for those cases where Google just refuses to return the search results you want. Could improved interactivity be the next big search engine advancement after Pagerank?"
Could improved interactivity be the next big search engine advancement after Pagerank?"
.sig on Slashdot). I was unimpressed with the results the first time (there were 8 or so to work with) and limiting with the thumbs down was of little use when there were so few results.
.02
In short, no.
I have tried Whittebit before (a user had a link to it in his
I can't see google's superiority being challenged by this at all. What else would Whittebit offer me other than this "feature"? I didn't see anything else when I used it (and in fact, was rather annoyed by the fact that it remained at the top of the screen while reading the link I was sent to).
No thanks, just my worthless
Great idea until the second month when your local viagra spammer's SEO guy moves all his pages to the top of the search for "Futurama" or "Ninja Turtles."
Ad revenues have nothing to do with the ratings....
All the good search engines end up corrupting themselves (by making money, which I guess is the point of anything...)
If they want people to actually use this, they have to come up with some better way of collecting feedback, that scoring bar that remains on top is very irritating.
I think something like what Kaltix is trying has a better chance of replacing Google. However I don't see that happening either. I just think Google will learn from the user based systems
As x approaches total apathy I couldn't care less.
Won't work. Goodwill as we knew it in '95 is gone from the Internet.
"Just like birds can never understand how to make a pizza"
The pizza I ate last night tasted like it was made by birds.
OddManIn: A Game of guns and game theory.
no, i dont want to have to give feedback in a search, I just want to type keywords and find related results ...
I like the idea of interactive page rankings. I don't think it should be the one decisive ranking alogrithm. But human interaction is just what search engines need.
I do a lot with Google, and it leaves some to be desired. The goal of Google is to make the ranking of pages partly out of the hands of webmasters, so they can't just trick the spiders. And that has worked very well for Google (serves over 70% of internet searches). But all page ranks are very cold and calculated. Maybe that cold, calculated rank is a good place to start, and then it's time for human reviewers to fine tune the list.
By the way, Google has attempted to acheive this concept of human ranking by watching to see how long you stay at a page you clicked on. If they rank a page 1, and you click it, and immediately return to the search page, they penalize that page. So if even Google is trying the same abstract concept, it probably has a future on the web.
Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.
This is a great idea in concept, but the potential for abuse is incredibly high (if it's implement on a system that actually matters, like google).
Imagine for a moment, a geek for hire, such as myself, writing a PERL script and deploying it on several servers nationwide. It uses LWP::UserAgent and spoofs a few different versions on IE on Windows. It then run searches for hot keywords that my client wants to rank high on. Then it 'mods down' anything it isn't my client's product, and 'mods up' what is, or links to, my clients products.
Set the script to run several times a day at each location. Write some spyware that does so in the background of a shareware-app-for-hire (Kazaa?).
You see where I'm going with this? Protections would have to be in place.
Karma: Chameleon (mostly due to the fact that you come and go).
Even though google uses PageRank, often sites are higher in the results are only there because they had the right keywords in the title. Sites like this have been tweaked with other similar tricks to score higher. Obviously, this new system would be able to get around this. Perhaps, when joined with Google, this could take over when PageRank fails to be applicable. Then we would have something great!
--- to swing on the spiral...
It was going well until we realised that all people wanted was pron so we just provide that now.
I've had BBQ chicken pizza before. Chicken is great on pizza. So birds can actually make very fine pizza.
Now I can look up porn using synonyms! :D
i've had enough of your lack of respect for birds.
they'll learn how to make pizza eventually, its just taking them a little longer. I mean, monkeys learned how to make it....oh wait, thats after they evolved into us...
... that they apparently didn't do a patent search before implementing this. I have a patent covering pretty much exactly what this entails (really!)
I think I found the link somewhere on Slashdot once:
Gnod.net is a learning system like a search engine that allows you to put in your three favorite authors/musicians/movies and it returns a series of "suggestions" that match, asking you if you like/dislike/haven't heard of each result in series.
This sort of creature has the potential of placing the final nails in the media cartels' coffins, as it provides what's missing from current P2P and self-production techniques: a recommendation/promotion mechanism.
Does this "search engine" search images? No. Google does.
Does this "search engine" search 20 years of Usenet? No. Google does.
Does this "search engine" provide stock quotes, maps, phone numbers, and news? No. Google does.
Thanks for playing. Google will never lose.
Warning: fsockopen(): unable to connect to 127.0.0.1:9182 (Connection refused) in /home/ian/whittlebit.com/wqserver.php on line 13
Connection to WQServer failed
I rate it thumbs down (for now)...
This will quickly be abused, much like other rating systems like Amazon's book reviews. Anything worthwhile will ultimately be abused, you can be sure of that.
"This could be useful for those cases where Google just refuses to return the search results you want."
That has really never happened to me. Google is fast and extremely accurate, especially when you do a more advanced search, + this and - that.
I'm not sure I would want to take the time to "rate" search engine results and re-search when I can just fine-tune my search from the start.
Hey assclown maybe you want to tell me where the fuck did all the materia came from???
As a poor substitute to being able to play with it (try bookmarking whittlebit.com and coming back in a day or two) I will try to answer people's questions. For the moment - here is the blurb from the front page:
- Ian Clarke, creator of WhittleBitI want THEM to tell ME what the good results are, not the other way around. If I wanted to do that I'd write my own search engine. Don't bring some lame ass solution where I have to do all the work.
[ Don't reply to this ]
who wants to wade through results and rank them? I came here to search!
That's why google is king. It doesn't require you to do *anything*. It barely *allows* you to do anything.
And it still returns what you need.
That's the perfect UI.
WhittleBit server whittled down to nothing in 150.030 seconds.
Google's PageRank is failing miserably for commercial search. PageRank is fine for academic / informational searches.
In a commercial environment, it is simply not possible for a free search service to exist that is fair, represents an even distribution of wealth, and is immune from abuse.
Advertising has to be paid for. "Free Search" is fine for university sites and purely non-profit informational pages, but for a commercial search your position in search engines must be purchased based on the keywords against which you wish to bid.
Otherwise basic economics breaks down.
wait, i though materia was the stuff you put in your sword in Final Fantasy 7
This is a great idea in concept, but the potential for abuse is incredibly high (if it's implement on a system that actually matters, like google).
Check out the voting buttons on the google toolbar.
SCO employee? Check out the bounty
This seems like a great idea. Google might be number 1 in the search engine rankings at the moment but it would be good to see them have a bit of competition so that they do not use their dominant position for financial gain.
Here in the lab we're doing some work on using the principles of thermodynamics in order to improve search engines. The second law of thermodynamics states that in a closed system ethalpy will alway increase, which is a lot like the disorder cause by sites spamming themselves to search engines . In addition the searching patterns of users can be thought as analogous to the fermi level of a solid. In theory applying thermodynamic equations to the process of search engines should allow for more efficient algorithms to be developed. Although this has been known for some time the process involves solving some fairly hefty quadratic equations which have needed some serious computing power to process. Hopefully though a real leap forward should be no more than a few months away.
All that glitters has a high refractive index.
You mean came IN, of course...
i would have given a thumbs down to this message, but unfortunately there was no thumbs down at all. ;)
Warning: fsockopen(): unable to connect to 127.0.0.1:9182 (Connection refused) in /home/ian/whittlebit.com/wqserver.php on line 13
Connection to WQServer failed
Consensus is good, but informed dictatorship is better
how long until google buys them out?
I give it 3 weeks after they begin getting rave reviews.
I am the Alpha and the Omega-3
What is really needed is to separate out commercial sites. Google works great 90% of the time but when you are searching for something that triggers a response from sites trying to sell something, the results get swamped with the commercial noise.
This would benefit commercial sites because when you really are looking to buy something, you will be guaranteed not to be annoyed by anything non-commercial.
-- YAAC (Yet Another Anonymous Coward)
Is it that a google search for whittlebit doesn't even have a link to whittlebit.com.
If you can read this sig - the bitch fell off.
What we need are computers that experience pleasure and pain, along with the means to deliver these sensations.
When a search engine delivers good results, the user rewards the engine with a dose of pleasure.
In return for bad results, the user unleashes a blast of pain.
That should teach the circuits a thing or two about delivering the goods!
-kgj
I have used kartoo and like it.
It does not "learn" per se, but allows you to select from multiple possibilities using a GUI - and it has been available for a while.
If I have problems finding something with Google, I use Kartoo.
Acts of massive stupidity are almost never covered by warranty. --me.
Let this guy tell you about it.
Fortunately, most God-believers are getting this kind of facts pretty straight, but you obviously don't. Are you one of those guys thinking that the Appolo landings were just hoaxes ?
That would help, but it would have to know why they're bad to know how it would differ from other results that might be more acceptable.
Here's what I would do. First, instead of google returning the most relevant choices, it needs to be a factor of relevance and diversity. So, with the typical "apple" search, it would return some apple computer results, some fiona apple results, and some results about the fruit. All of those would be highly relevant, but it would only give, say, a few of each. You could then click on the more relevant results (if you wanted apple the fruit, you'd click on the three fruit links), at which point it would reject the others and give you more of what you want.
The key here is that it would have to give diversity in the beginning for you to be *able* to differentiate things like what you want from things you don't. This is not how google works now, I don't believe.
For what it's worth, this algorithm wouldn't be too complicated to do. I lack the programming ability, but I could do the algorithm in pseudocode (at point most decent programmers could reduce it to C++). It should be quite possible.
-Looking for a job as a materials chemist or multivariat
You mis-spelled 'God' there tough guy. It is a proper noun and thus, capitalized. LEARN TO SHOW SOME RESPECT you illiterate OAF!
Better than my night....my pizza was made WITH birds.
Buy Steampunk Clothing Online!
- Teoma
- Turbo10
- Grub (a project, not an active engine)
- WhittleBit
So, maybe we'll get somewhere after google (not that google isn't a Good Thing), after all? And.... well, Ian Clarke and his projects is/are/may soon be really rocking the world. Those include:Is supposedly more accurate than google, but I've found it to be only okay at best
"Searches the deep net" by connecting to site databases to get the most relevant info. A lot of this info, however, comes from Google itself.
A distributed search engine project. It would use tons of people's computers as crawlers like seti@home
Read the story
A giant search system for pre-existing content, aimed at corporations.
An anonomyous content-storage system that works as a giant encrypted webserver of sorts.
A search engine that learns through user interaction
A neat little AI hack that helps webmasters do their job easier
A "edge distribution network" that will optomize content distribution. It uses some Freenet Technology
Unless I read the article incorrectly, this response-feedback-accuracy was the exact cause of the problem with google as shown by msn.
Just an observation...
"The truth suffers from too much analysis"
Yes, yes we should rely on a single source for searching for information. Any attempts to develop alternatives to Google should be ridiculed.
Stupid diversity and competition - who needs it!
Page rank is cool, uses distributed data to improve search results. Definately AWESOME in the search engine world.
BUT i would also like to see the distributed concept applied to searching itself. Something like this idea, but having the engine return results on what were popular click-thrus for searches. From what i can tell (IANA Google Expert) Google isn't keeping click through data on search results (they are on the adwords, but that's different). By tracking click thru data and calculating how long a user stayed at a clicked result before hitting the back button or otherwise returning to google... good insights can be learned. Aggregate this over millions of users with billions of page views... wouldn't take too long to figure out what everyone wants to see for a particular search result. Combine all of that with improving your searches by what others are searching for... i think you are talking a powerful system.
Granted this whole idea may be liable to spamming and all of that... but that's not part of the concept yet. On the surface, it seems like a good idea.
NOTE: I know other engines track click thrus, but i don't think any of them do it for non-advertising purposes.... if it's purely to improve results then cool. If it's to show you better ads, not cool.
Web pages are already rated -- by other web pages. Ever noticed these blue underlined chunks of text? They are called links. Each link is a rating that says "Lookie here, I liked it and you might too!" And somebody already uses this rating system in a search engine. Bonus points for correctly guessing who.
But apparently monkeys CAN be taught how to post messages on the internet...
People are not changing how the search engine ranks the results for other people, it is just slightly modifying your query to produce more precise results. How can that be abused to make trash sites show up with rank 1?
This view has been successfully disputed for centuries on the basis of something known as ASTRONOMY. Please, sir, wake the fuck up.
This was more intended as a proof of concept - rather than an all-out replacement for Google. I was frustrated with the way that Google works really well if you are looking for something easily defined and-or well known, but trying to find something obscure that was "masked" by more popular sites with similar keywards could be a real PITA. Whittlebit is designed to automate the manual process of trying to refine your keyword choice to get the search results you want.
Son, you're not funny. I'm quite sure my IQ is a lot higher than yours. Sorry.
I'm sure I've seen Google do this. I've occasionally seen that links I click on in Google search results get forwarded through another Google URL which is no doubt tracking what I'm clicking on.
Like a lot of Google features they're testing though, it's very much random and it's been a month since I've seen it.
I'm not sure why a search engine would need to "represent an even distribution of wealth."
Any system like this with such a clear need for top ranking is going to invite abuse. You can find that sort of abuse in the local phone book if you'd like to take a look. Notice all the "(A|AA|AAA|AAA) company" listings. What happens with search engine spamming is similar.
Who is it you are suggesting should pay, the users in a Lexis-Nexis style model or the submitters in a Yellow Pages fashion?
It had a similar user-ranked search engine... Going to the website now redirects to Teoma. It had a similar green color scheme to the old HotBot site (which has now been bought out by lycos).
Boy, thinking about all these old search engines really brings back memories. These young people today - they've been spoiled by Google. Back in my day, everybody used a combination of search engines, and even then nothing worked right.
Where is the explanation??? Can not explain it???
All products must reach a critical mass at some point. Otherwise, quality products will go into oblivion, e.g. BeOS. The fundamental flaw with Mr Clarke's work is that it was motivated by a desire to make digital viewing of child pornography undetectable by law enforcement. These kind of products will never reach a critical mass, and in turn never flourish in the mainstream market.
LEARN TO SHOW SOME RESPECT you illiterate OAF!
Congratulations, you've just won the Coherence Award.
Hey asshat don't you see this thread is about JOKES on the subject of God(and birds and pizza)? We don't need your pompous "I know it all" attitude here, go post on a thread that is not offtopic and leave us alone. Also, YHBT, YHL, HAND.
I think people will start making their websites look better.. and then make other ones look bad (like it's been said in here).
What if i get a list of proxys.. write a program and click on each of the links and rate all of them..
It's easy as that... I don't think it'll work.
All the porn and viagra sites will be #1
Chiefarcher
heh, its pretty funny that you called me 'tough guy' when my post included 'final fantasy'
Wow! That is pretty amazing. I can't explain this picture. (and I'm not atheist)
Server declares "Nobody loves me" before crashing and taking down the search engine which allowed users to rank its results.
Experts believe this was due to repeated thumbs down given to its site within its own results.
So, monkeys with high IQ can post on the internet.. boo-hoo, sue me for not being as pointillous as your snobby self.
So how do you deal with trolls and spammers who will vote up or vote down sites for partisan reasons? Or ignoring that, what about straightforward differences of opinion? (The world may be polarized 50/50 between those who think 'firebird' refers to a database and those who think it is a web browser - at least among the geekier-than-average WhittleBit users.)
Anonymous feedback won't scale well to the big bad Internet; some kind of login and network of trust is needed.
-- Ed Avis ed@membled.com
Photoshop. This Morrison picture is all over the place and of course, Jimmy's ghost would take that same pose just for fun down at the cemetary.
If such crap is the cornerstone of your worldview, I really, REALLY feel sorry for you.
Don't choke on your muffins kid! Looking at your sig all I can wonder is how old are you? 13? 15?
You're missing the point. The system isn't watching user actions while searching to fine tune OTHER user's results, but to fine tune THAT user's results.
While you can certainly claim that one user's actions MIGHT indicate relevance for another user's queries, it's certainly true that if a user gives you a clue that the document you have returned is irrelevant, it must be irrelevant.
It has yes/no voting options on its toolbar.
15 AND A HALF! And don't you think that just because I'm younger than you I can't think! I got my opinions, and one of them is that you suck.
I'm not sure why a search engine would need to "represent an even distribution of wealth."
Of course there is no requirement on a search engine to represent an even distribution of wealth, but it is in the SE's own interest to if it does not want to become the spam-fest that is the commercial Google.
When there are thousands of companies providing $service, why should serch engines direct the overwhelming majority of traffic to the one site that happens to fit their algorithmic opinion the best.
Any system like this with such a clear need for top ranking is going to invite abuse.
Exactly, that is why I am saying that free search has no place in the commercial Internet.
If you install the google toolbar you can vote for or against pages on an individual basis.
acm
Something needs to be done to seperate stories,informative article useful for research and education from the crass commercial websites that are like SPAM on all search engines. Some sort of separation needed. Do something about that and i will be happier. Just type in anything about money or business on all the search engines and you will be flooded with irrelevant links.
>Imagine for a moment, a geek for hire, such as myself, writing a PERL script and deploying it on several servers nationwide. It uses LWP::UserAgent and spoofs...
You need to read the description again:
will use your feedback to determine which keywords should be added or removed from your search, then you can search again to get more accurate results.
This does not imply that the results of your feedback would affect somebody elses search.
since this got modded up to a 5, i'd guess that the moderators should be reading the description again too.....
What do you mean by "commercial search"? Search engines are for information. Information can be used for commercial or noncommercial purposes. "Advertising" and "commercial information" are two different things. If you're just talking about searching for products or commercial information, "free search" engines work fine; the information is still findable. If you're looking for specific commercial information, go to the company's website and search there. But I don't agree with what you seem to be saying, which is that search engines should be advertising disguised as a reference tool. Screw that. If I want advertising, I can watch TV. Or set up a commercial search engine that announces itself as nothing more than an advertising engine, and you can sell product position all you want. But when I want information -- even information about commercial products -- I want that information filtered by relevance, not by who paid the most money to deliver the information to me. If I want to look up "safe cars" on the web, I don't just want ads from Chrysler; I want information from different perspectives. There is a huge difference between information relevant to commerce and information that is bought and paid for.
Yea...until you get a system that's into S&M.
... and the sadist says 'No' ..."
A little reverse psychology will fix perverse, disobedient systems:
"The masochist says 'Hurt me'
-kgj
While the idea has plenty of problems for use on a general web search engine, it could work very well to tune results on a site's internal search engine, where the user has no vested interest in one result coming up higher than the others, the user only wants good results.
It might also have potential, even if the thumbs up/thumbs down are only shown to trusted users. One of the enduring problems in tuning search engines is that the people who build the search engine aren't the people who know the content best. Getting the content people some way to say "yes, this item should come up higher for this term" is a powerful idea, IMO.
Well I'm 47 and I got the same opinion anyway. This troll's paternalizing is downright ridiculous but eh, isn't that what religious establishment is all about ?
Not trying to steal the show too much from whittlebit, but theres another new search engine recently released. Netnose lets the users decide which keywords a web page should be listed under. The search results also include handy identifiers about the page content like whether it has popups, or contains adult material (as decided by the raters).
I ate my sig.
Well, actually, Google does receive feedback. Once in I while, google changes its result page in a way alexa is doing every time:
You don't get a url to the result back but rather a pointer in a way like www.google.com/result?target=realurl.
I'm sorry that I can't provide you a real url but I'm confident that someone in this
... I can find porn the every one else sees and not end up in popup hell :)
Well on a per-client basis, his point about the embedded spy-ware would certainly be valid.
The Anti-Blog
- Hello sir, you're the one who took that Jim Morrison picture in Paris ?
- Yeah, pretty weird isn't it ?
- Well, I was wondering if you didn't "help it" a little...
- What do you mean ?
- Hmm... We can obviously see Morrison take this pose all over the Net, and we can see the picture messing up on the alpha and...
- I tell you it's the real thing man, believe me.
- It doesn't make sense sir. The perspectives are wrong and the picture has clearly been cut out of some place else.
- I'm telling you it's real. Believe me.
Yeah, of course I'll write to this guy. How constructive.
I can see vendors writing scripts that will at random times access the search engine with searches related to their product, and automatically give "thumbs down" to high ranked results not affiliated with their own products, and thumbs up for their own pages.
For pr0n, this would of course happen an order of a magnitude more often, starting two days BEFORE the search engine launches.
Regards,
--
*Art
Seems totally open to abuse, and there seem like their are issues with people not rating results and keeping the statistics meaningful. If they can get something up for doing ratings and figuring out if a user thinks a result is 'good' or 'bad' that is easy for the user to use, isn't abuseable, and has some kind of statistical validity I will be impressed, but I think it is much harder to do than most people think. Yar!
well actually i'm 19...but thanks for the warning....the other day a lemon poppyseed muffin did give me somewhat of a problem...i'll be more careful in the future. maybe its cause i'm not praying enough.
This is not the first time I have seen people make similar dumb mistakes.
I think it has to do with not reading the actual article/sample. /.ers tend to skim a lot.
excitingthingstodo.blogspot.com
...while making assumptions about someone's IQ after one sentence is a sign of great intelligence, of course. Thanks for the tip mate.
Variations of these re-ranking schemes have been around since the 1970's. This sort of scheme is also easy to do implicitly by tracking which results have been clicked on and then using the keywords in those clicked-on results to improve the original search terms. Hotbot.com, among others, used to do this.
Don't go ape pal! HarHarHar!
I guess to get around this we kinda need to create a list of "friends" or people who don't abuse the service. You know, honest people who care about things other than money.
Another way to do this is for each user check to see how they are voting compared to the other votes for that site. If they are consitently opposite the public reaction then they are most likely some sort of troll. Also limiting the number of times someone can submit feedback could cut down on abuses. Thresholds 'n stuff.
In Yahoo and other search engines (but not Google, that I've seen), you often get a "click-through" that goes to their system before transparently redirecting to the actual URL you clicked. This is relevance feedback. It's true that the system can't determine whether you LIKED the site (aka, whether it was "relevant"), but at least it's some sort of feedback the system can use to tune.
The other most familiar type of system I can think of is Alexa (now owned by Amazon.com, and the brainchild of the Internet Archive's Brewster Kahle). With Alexa, they could count not just that you visited a site, but how long you spent and where else you went. This is at least part of the basis for Amazon's recommendation system for books and other geegaws they sell.
Can this work in a search engine? Yes, certainly. Does it mean that a search engine that implements relevance feedback will instantly be better than Google? Definitely not! There are many other things (about 20, from what I've heard) that go in to the ranking system that Google uses...Pagerank is one of them, but there are many other factors (such as term frequency, document HTML structure, etc.). Some these, notably Pagerank, work poorly on relatively small collections (in the TREC conference, people have almost never found that Pagerank, HITS or similar algorithmns improve performance with "only" a few tens of GB of Web documents -- a few million pages).
Wanna know more about information retrieval? The TREC page above is very good for state-of-the-art research reports (see the Publications area -- it's all online and free). More general texts are mostly in libraries, but one good one online is Managing Gigabytes, which covers the IR aspects thoroughly and also has lots of ideas about how to use compression in an IR system (something that I'm curious whether Google & others do).
In general, statements like that are used by people who haven't actually thought through the algorithm in detail, or who don't have good knowledge of algorithmic theory.
None taken. Put it this way - I could write it in matlab, and I could write it pretty bad in C++. However, I'm not familiar with google's code, and wouldn't be able to integrate it into that. But I could write a version of it, just not as it would need to be, final form. In other words, I'm very familiar with the algorithms involved, that's definitely not the problem. I do work on problems similar to this in grad school - the source of the data is completely different, but the same tools can be applied.
In specific, your suggestion sounds excellent. Sufficiently excellent that I would be very surprised if Google, with their famously large R&D division, didn't have some very smart people thinking about it or something similar.
Thanks, and I agree - if they're not doing this, they should be/have been. What I outlined would be reasonably accomplished through new applications of existing decision theory algorithms.
Thinking about it briefly the first couple aproaches I come up with wind up being factorial time. Plus there is a lot of fuziness as far as how to promote Fiona Apple links but not just lousy Apple Computer ones, not to mention search terms where the "families" of hits are less distinct than for Apple.
it's not as fuzzy as you'd think, and I think this could be done with less computational overhead than you'd initially believe. Basically, what we have is a classic supervised pattern classification algorithm, where the two classes are "useful" and "not useful." At the point where you tell it the groupings, then it's just a matter of determining what characteristics are common among the groups. You'd have to reduce the results to more ordinal characteristics, but this would be a solution similar to how mozilla translates emails into vectors of charactersitics for their Bayesian mail filters.
Most of this could be done starting with, say, a few hundred results or so per search. Arranging into categories from here would be fairly trivial, at which point those categories would be presented to the user. The user could then update the relationships as they are determined by the computer, and resubmit.
Of course, the more samples you use, the more overhead. Also, the more descriptors/features/parameters, the mroe overhead. Using one way of doing it, the problem would be linear with samples, and O(N^3) with features (due to a matrix inversion). Not all that bad, particularly when the number of features can be capped, and does not grow (necessarily) with samples.
-Looking for a job as a materials chemist or multivariat
You really got some folks worked up with that one. As for the rest of you, how could you fall for such an obvious troll?
- None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
I'm not sure why a search engine would need to "represent an even distribution of wealth."
I know exactly why: because he's joking. Look at his user name.
You know, let random folks rate the ratings that other people give, something like that. I know it's a radical idea, but I bet we could come up with something if we tried.
- None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
I just checked the source code on Google.com and they DO have javascript on their main page. So I eat my words from that last post.
If I get time I'll look through all of it and see what it does.
Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.
That would be a good idea, but how do you keep metamod from being abused as well? Hmmm... a /. styled random system would probably help.
good points.
only a little later than you. Try and find anything to do with money or business or anything that involves entreprenuership and you will be disappointed.
Since the majority of the poster doesn't seem to understand this.
...there would be human interaction. Wasn't the whole point of page rank so that results couldn't be manipulated so easy? Yeah I know there is that link exchange stuff that google had to deal with but for the most part it works. As soon as we allow humans to voting someone will start to screw with the system.
My Hello World is 512 bytes. But it's also a valid Fat12 boot sector, Fat12 file reader, and Pmode routine.
The hitrate has died down enough that I think it might be able to handle it - I switched it back to full-functionality about 5 minutes ago and it seems to be coping - lets hope this continues (it is downhill from here - right?).
That's very much the idea.
Of course, many search items might not be this easy to categorize.
That's true, but at point the most successful solution would be what google does now. In other words, if there isn't any clear substructure to the organization of results, just return the most relevant.
We haven't yet proven that your algorithm is easy
Obviously, since I don't have access to google's database of webcrawls to test it. However, I will say that the problem reduces to pattern classification tasks that are reasonably well characterized. I have a pretty good background in that, and am familiar with techniques that work very well with problems similar to this one. I'm actually fairly confident this would work.
and we also haven't proven that it would be useful for many types of searches.
As I said, it doesn't have to be useful for all searches. It simply has to be useful for the categories of searches at which google now fails. The "apple" search is one of the prime examples of google's failure, as it's pretty damned good at everything else.
-Looking for a job as a materials chemist or multivariat
Your ranking only affects your own search results.
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
I have never before seen so many ignorant replies rated 5. Please refer to this.
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
Your feedback only affects YOUR search results.
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
I've seen this already on a search engine called NetNose. Nothing original.
This will greatly enhance your search for free pr0n.
Isn't the internet great!
OH THE SHAME I fell off the wagon and use sigs again!
I checked it out, and gave it the "apple" test. Didn't do bad, but I think it could be done better. I might talk to the guys, thanks for the link.
-Looking for a job as a materials chemist or multivariat
- a) which results actually get loaded by a user
- b) the gap in time between successive result load
- c) the last result a user loads
I can imagine useful interpretations of each factor such as (a) links that get loaded were more useful than those that were skipped, (b) longer times between loads might mean richer relevance to user's search, and (c) user definitively found what they were looking for upon viewing the result. Granted there are common load patterns that would throw these interpretations askew, but the very same thing could be said of the information that google currently uses to rank links, and there is still enough statistically valuable data there to make them a great search engine.- First they ignore you, then they laugh at you, then ???, then profit.
This could be useful for those cases where Google just refuses to return the search results you want.
You mean, like, where I search google for "Porno and Snuff Films", and I honest to god only want results pertaining to the song by the band "The Lawrence Arms".
Heh, wish they could do that on kazaa.
~Will
sig?
...is the only thing Google is missing. At least that I know of. I know it won't happen anytime soon, though, as it would kneel the Google servers in no time.
But how about enabling regular expressions on your search results, ie. you do a normal search first, and then do a regular expression search to refine the results from your first search? That would've been something, and I guess it wouldn't have required that much CPU power (as compared to allowing regular expressions in the first place).
I'm doing quite a bit genealogical research, and regular expressions would have been nice when search for people and dates (ie. years) related to those people.
--
Evil Attraction
Ian Clarke announced in this slashdot comment that he's leaving the US. Here's the story at boing boing.
HIV Crosses Species Barrier... into Muppets
Getting user feedback is easy.
Getting useful user feedback of the kind you actually want is hard.
This would be ideal for something like the Google image search, where there are a large amount of broken links and the required user feedback is fairly simple: Broken Link.
Based on that user feedback Google could check the page to verify if the link is in fact broken.
The cornerstone of any rating system like this is trust. While it might be useful for improving your own search, I really dont see how this kind of system could be shared among a community and scale properly. I cant imagine it would hold up for very long without a very complicated trust metric.
There is no real interactivity, but Google sometimes encodes links in search result pages in a way so that it can log which links are actually clicked on. Simply by replacing http://site.com/page.html with http://google.com/query+terms+encoded+here/referer /site.com/page.html (or something like that, you get the idea - Google gets the hit, logs it and then forwards the requesting client to the real page).
Other search engines do this all the time.
However, I remember one Google employee saying in an interview that this information is not integrated in the actual search engine - yet.
Tibianna is a system that uses machine learning to refine a Google search. See http://www.thestaticvoid.net/portfolio/p_tibianna. html
Indeed. And if you want to experiment with relevance feedback, take a look at the Xapian Project for a highly scalable GPLed implementation of the Probabilistic Information Retrieval Model (which is derived from Bayes Theorem, the basis of all those Bayesian spam filters).
The only search demo up at the moment is the search over the site itself, which doesn't show off the relevance feedback especially well - the pages on the site cover a rather narrow topic area, which is fine for searching the site, but less good as a demo. But if you try it you'll see it suggests various related words and you can click on these to add them to the search. Additionally you can click a checkbox next to a hit to indicate that it's relevant - check a few and hit search and the suggested words will be based on those documents you liked.
*sigh* maybe I can test it later.
As an aside, I find it a bit annoying that it uses frames and that the query input is readonly when you have (or would have had) a results page, so you can't enter new terms then click "New Search." Since it uses Javascript anyway, that would have been easy enough.
This Like That - fun with words!
I searched for electronics to see how many hits I got and look what comes back:
/home/ian/whittlebit.com/wqserver.php on line 13
"Warning: fsockopen(): unable to connect to 127.0.0.1:9182 (Connection refused) in
Connection to WQServer failed"
Yet the page itself seems to load really quickly
For what it's worth, the way I'd do it would *not* require a subject/keyword taxonomy. To me, that's one of Google's strengths - they don't attempt to evaluate content ahead of time. I wouldn't either - in fact, I think you *have* to develop the substructure/classification relationships based on the results of the search. Otherwise, all you have is a fancy card-catalog system.
I read their patent (5,924,090), and it was fairly badly written (even compared to other patents!). I'm not sureif my method would be covered, as I wouldn't even attempt to use a taxonomy, as I think it would fail based on its very nature. It does seem as if they're trying to "0wnz0r" the entire idea of classifying results, which seems damned broad, and I think people were doing this far before July 2001 when they filed.
The whole taxonomy patent thing is weird, though. As I said, I've seen *numerous* websites offer up pre-designated categories based on search results (Yahoo, etc). Do all these companies license Northern Lights' technology, I wonder? If not, then the patent must be more specific than it initially seems.
As for ease of use, you're right - this wouldn't necessarily be the way you want the search engine to always work. To me, it would be like another version of "advanced search." You don't always want to use it, but you definitely want it there.
-Looking for a job as a materials chemist or multivariat
What google really needs is a Yahoo email style 'This is spam' button. People love to punish so they will press that button if a site is obnoxious enough to piss them off.
Protecting that button from abuse would be a challenge, but I do not think impossible. Slashdot's moderation system seems to work well.
Here is my idea:
Random mojo points ( mojo = credibility ) is distributed to a few thousand random users every day.
These users rate sites as spam like everyone else, but unlike slashdot moderators, they do not know they have been assigned mojo. Nobody knows how much mojo they have.
People can use up to half the mojo they have each day. A person given one point of mojo who rates 6 sites as spam in a day has given each site 1/12 of a spam point, and will have access to 1/2 of his remaining 1/2 of a mojo point = 1/4 mojo point the next day. This is so that the person will be encouraged to rate spam continuously, not just when they think they might have mojo. ( they aren't supposed to know they have mojo anyway )
If the person rates only one site as spam the first day, then that site gets 1/2 a spam point. The first person to rate that site as spam gets 1/4 of a mojo point for agreeing with the first person. He can use up to half that mojo tomorrow. The next person to rate that site as spam, gets nothing for agreeing with the first person, but half of the spam points the second person assigned to the site. The site accumulates spampoints, but mojo is like an olympic torch that grows half a s bright each time it is passed. The analogy holds perfectly if each person rates only one site a day, but if more than one site gets rated by each person, then it is more like a torch that gets dimmed by half and then divided up and passed to multiple runners. The total spampoints assigned to different sites as a result of giving one mojo-point to a random user is 1 + 1/2 + 1/4 + 1/8 + .. + 1/2^n = 2 mojo points.
Because each randomly assigned mojo-point results in 2 mojo points being assigned, forgiveness formulas can phase out 2 spampoints for every mojo-point given out randomly.
Slashdot uses metamoderation to moderate moderators. This prevents moderators from modding up GNAA and WIPO posts. My idea uses agreement to distribute mojo-points instead of metamoderation.
If you assign too many mojo-points, then you open the door for scripts that create accounts that sit and wait for mojo-points from heaven and then use those ( relatively powerful ) first hand points to attack a competitor's site's google listing.
You want to make these heaven-sent points rare, so that it is not worth the trouble of the raterbot kiddies. Mojo points should be rare enough that the number of accounts you would have to have running as scripts to get a decent number of points would make you an obvious spamraterbot purveyor, and your site could be blocked. Most times this would be unneccessary as they would be drowned out by people with legitimate accounts. It would be impossible to create scripts that gleaned mojo by rating known spam as such since known spam would not appear in google listings and so would not have new spampoints ready to be aggreed with. Scripts trading exponentially decaying mojo with each other on lists of known spam sites is harmless. If the scripts decided to rate random sites as spam, then they would not get any mojo for it
Eat at Joe's.
Uraninite grows around here. I found this guy who tells how to purify it w/ cheap stuff from the HW store. You could just go outside pick up the pitchblende, purify it and start yerself a lil' atomic pile. Sort of like a compost heap that you don't need to turn with a pitchfork. Radioactive decay don't need no air and water! Yeah, and some dude says how you can get radium from it too. Man and this kid made a nuke reactor in his parent's flower shed. I think a uranium powered pile would be kewler. I think there's some barium in ammongst the radium in ore... It slows neutrons down so they can cause further decay. I mean, if you put some dynamite and a fuse under the pile then when it got real hot KAPOW!! Any idjut could do it. And you'd have to be an idjut to try! I mean with all that radiation you'd prolly die.. It's kewl how all ya gotta do is pretty much put a bunch of radioactive crap in a pile and then it gets EVEN MORE radioactive and nasty. dude....
Here in the lab we're doing some work on using the principles of thermodynamics in order to improve search engines. The second law of thermodynamics states that in a closed system ethalpy will alway increase, which is a lot like the disorder cause by sites spamming themselves to search engines.
Is this the same lab as "Here in the lab for instance, many of my colleagues have been releasing their scientific papers onto Kazaa instead of through more established journals such as 'new scientist'." ? What kind of messed up lab is this exactly anyways? Got a reference?
And why are you working on applying thermodynamics to search engines? I thought you were working on _particle_ physics? "The word processor is good, although somewhere it is set to autoreplace the word lepton with leprechaun which is proving most annoying as I write my paper on particle physics."
Second of all, it looks like you mispelled "enthalpy". Second of all, enthalpy refers to the energy in the system, entropy refers to the amount of disorder. When you're speaking of disorder increasing you're speaking of entropy, not enthalpy. In closed systems enthalpy decreases as entropy increases.
Third of all, if the systems are so similar that the knowledge of one can be applied to the other, how is that you think you've found a way around the second law of thermodynamics? Either the systems are disimilar enough that your "expertise" isn't going to help, or they're similar enough that you could apply the results to actual physics and have created a device to generate infinite power rather than futzing around with search engines.
Finally, the above rules are only true in closed systems, which search engines are not. There are people writing new web pages all the time, and new pages getting referenced, and old references being improved, so the second law of thermodynamics doesn't even apply.
You can't even get your terms right and you're trying to apply them to the wrong sitaution. I'm not even going to bother finding counter examples to the rest of your drivel.
This Space Intentionally Left Blank
---who wants to wade through results and rank them? I came here to search!--
Reminds me of... "Who wants to wade through posts and MOD them? I came here to read!"
But seriously, wouldn't we all benefit in the long run from giving feedback to a search engine (provided the thing can be troll- and bot-proofed)? They'll never build a good AI to take over the world and obliterate life as we know it until we COOPERATE and slowly divulge the network of associations that make us human.
Now excuse me while I go find my tinfoil hat...
...
I did some research on a similar idea in 1999 and 2000 as part of my CS degree. It is currently going through the patent process right now. Don't worry, I'm a techie like most of you and don't intend to send out "cease-and-desist" letters to anyone with anything remotely resembling my process. My main motivation is to have something nice-looking on my resume, and perhaps to profit if I can sell the patent to a company at some point in the future. It was also very educational to go through the whole patent process on my own with no lawyers...