A New Tack In Search Engine Formulation

← Back to Stories (view on slashdot.org)

A New Tack In Search Engine Formulation

Posted by ryuzaki0 on Friday November 3, 2000 @11:38PM from the doh! dept.

An unnamed correspondent writes: "PC World reports that 'big-shot Web directories such as Yahoo and LookSmart' are missing thousands of the best links, which a new startup HotLinks has in in their directory by building it from people's bookmarks." This sounds like a smart idea (building from people's own bookmarks), but is it doomed to create in-breeding of links? That is, in a search engine based on bookmarks, will they be able to get enough "new blood"?

6 of 99 comments (clear)

Min score:

Reason:

Sort:

BIGGER SECURITY HOLE by mpskeeter · 2000-11-04 00:14 · Score: 4

No, I am not mpskeeter--I clicked on that link below and now I am him...sorry about that.

Do a search for "password"--some of these geniuses have their banking and etrade usernames/passwords up there. Email and xdrive passwords are abundant.

Also, an awful lot of these guys look at illegal pr0n. These bookmarks are right next to the ones showing their personal home pages with pictures of the wife and kids. The FBI and a divorce lawyer or two are gonna have a field day with this.

I tried to contact one of the guys with his bank account open, but, for security reasons, his email addy is not on his profile...

real smart website they got there.
A _big_ security hole by mpskeeter · 2000-11-03 21:01 · Score: 5

Hmmm.. I'm posting this as 'mpskeeter', though
my username at slashdot is totally different. :)

Guess how?
www.slashdot.org/users.pl?op=userlogin&.. etc was
a link on that site, enabling people to easily gather username/passwords.

(Offcourse, bookmarking such a link is a _bad_ idea, it even says so on the login page) :)
Oh Boy.... by Crewd · 2000-11-03 18:46 · Score: 5

I can see it now, people spamming hotlinks.com with their bookmarks of goatse.cx and Bouillabaisse.
Geek self-referential belief system by waimate · 2000-11-03 18:52 · Score: 4

This would, of course, create a self-referential belief system for geeks, wherein few new notions would enter the collective conciousness, and the group view of the world would be skewed by, er, the group view of the world.
It's like being able to choose what things you want to appear in your own daily newspaper - it's inherently flawed because the most interesting things one encounters are often those one didn't expect to be interesting.
Similarly the very best things to find with a search engine are those things which are not common knowledge. The job of a decent search engine is to flush out gems, not popular opinion.
my evaluation by DeadSea · 2000-11-03 19:05 · Score: 4

Whenever I find a new search engine, part of what I rate it on is how well it can find my homepage, and how easy it is to get my homepage listed.
As far as regular search engines go, it was much faster to get google to crawl my site and list it than anything run by, inktomi, altavista, or northernlight. I am very happy with google.
As far as directories go, Yahoo lists two of the 7 sites that I maintain. I have managed to get dmoz listings for 6 of the 7, two of which, i didn't submit myself.
This new directory only appears to have one of my sites, and at a URL that has been inactive for almost two years at this point. I'll have to see how easy it is to get stuff listed, but so far I am not impressed.
The Jukebox Phenomenon by MoNickels · 2000-11-03 19:50 · Score: 4

The problem with the bookmark approach is that it will tend to result in the Jukebox Phenomenon.

The short version of this is that current Top 40 radio station rotation systems are reputed to stem from the analysis of a jukebox supplier who noticed the same 40 records kept getting played over and over. This is because when a record gets played once, it tends to get played again, resulting in circular reinforcement, with hits one through 100 charted in a steeply declining curve. This is how current radio programming, music marketing and MTV work today: reinforcement.

The problem with this approach (in music or data) is that popularity is no guarantee of accuracy, appropriateness or utility. This is represented in the music world by the high cost (real and otherwise) of successful entry into the market. New music (data) is not popular enough to be included, but it can't easily be included without becoming popular.

Personal bookmark collections tend toward the same phenomena. Besides the inaccuracy stemming from factory-included links (which I would hope they account for), the bulk of entries will result from links in turn resulting from searches on existing search engines, which are, no matter how big, closed data sets: they have boundaries and do not include the entire web. These searches are also happening in a only few places, resulting in the JP. Hotlinks will thus tend to include sites that have already appeared elsewhere. A certain number of "missing" pages will be newly included (the user's own sites, work sites, sites of friends) but very few "missing" pages of other kinds, particularly low-traffic pages (such as those with refined and highly specialized content: deep governmental directories, university research labs). In other words, Hotlink's approach is not much different than Google's number-of-times-linked approach or bulk submitting on an engine's "add your site" link, just a larger population sample.

Napster experiences the Jukebox Phenomena: If I look for Loudon Wainwright III songs, I tend to find lots of iterations of the same three songs and not much else: Dead Skunk, I Wish I Was A Lesbian and the duo with Iris Dement. But if I want to find, say, any song off of the Therapy album, it tends not come up because it is not as popular. This is because the JP has propagated the popularity of the same three songs. An ideal data source would include the entire data set, popular or not. (I am aware Napster cannot and is not designed to be a complete data set).

If one's goal is to include more web sites, a more accurate approach than Hotlink's would be to scavenge user's History files. That would, in my case, include a few hundred additional sites a week, although I'm sure the privacy issues would be a problem. If one's goal is to return the most accurate results, an even better approach would be infinite page caching in which a new iteration of a page does not replace the previous entry, but is added to it. In this way, one could search across history as well as data.

--
Wordnik, a dictionary project which aims to collect