Slashdot Mirror


A New Tack In Search Engine Formulation

An unnamed correspondent writes: "PC World reports that 'big-shot Web directories such as Yahoo and LookSmart' are missing thousands of the best links, which a new startup HotLinks has in in their directory by building it from people's bookmarks." This sounds like a smart idea (building from people's own bookmarks), but is it doomed to create in-breeding of links? That is, in a search engine based on bookmarks, will they be able to get enough "new blood"?

36 of 99 comments (clear)

  1. Re:Most linked order by GrenDel+Fuego · · Score: 2

    I could be wrong, but I believe that's how google works.

  2. Hotlinks does not search content... by MoNickels · · Score: 2

    I should add that Hotlinks does not index nor search content on those millions of links it has.

    If you search for "four score and seven years ago" (the beginning of American president Abraham Lincoln's Gettysburg Address, for our international visitors), you get bupkus. Nothing. Any other search engine will turn up the Gettysburg Address in the first ten links.

    The first problem with this is that it requires too much pre-knowledge. I have to know that it is called the "Gettysburg Address" or get lucky in that someone might have titled their page or link "four score and seven years ago." If I was an Iranian student studying American slavery and somebody passed along the first few lines of the speech as something I should be aware of, Hotlinks is of no help whatsoever.

    The second problem with this is that, in looking at my own links, I see that the titles of the links are often completely irrelevant, inaccurate or vague. What use is a search based on page titles? Even Yahoo returns normal search engine results when it doesn't find anything in its own directory.

    Finally, if I want someplace to store my links, I'll use my own web site.

    --

    Wordnik, a dictionary project which aims to collect

  3. It depends ... by Alien54 · · Score: 3
    It depends on where they get the book marks....

    after all, what kinds of links would you get from everyone who worked at Microsoft? or Sun? Would there naturally be a corporate bias in the culture.

    or a regional bias, or whatever....

    what you would probably need would also be some rating by the internet age of the person (how long have the been online) because the people who have been around awhile probably have a more useful collection.

    and I also wonder how different this is from advertiser tracking of where you go by cookies.

    the best combination might be to combine cookie tracking with an internet search engine database. but there are drawbacks here as well.

    [shrug]

    --
    "It is a greater offense to steal men's labor, than their clothes"
  4. I'm unimpressed by shiffman · · Score: 2
    Being the egotist I am, I decided to try them out by looking for my own pages. My home page was there, sort of. What's interesting, and indicative to me of just how badly their approach works, is that the one link I found is a good five years out of date. It found a bookmark to the site when it resided on my former employer's server. (I acquired my domain almost four years ago.) Worse, it includes an intermediate directory that was made unnecessary almost a year before that.

    All of which suggests to me that their attempt to eliminate human effort will produce a lot of old garbage (I don't clean up my bookmarks; do you?) and the obvious set of well known corporate sites.

  5. Backflip by Jeffrey+Baker · · Score: 3

    Gee, that sounds an aweful lot like Backflip, who, by the way, are Fucked.

  6. spiders are better by josepha48 · · Score: 2
    IMHO...

    In my opinion spiders are better. A spider can go out on the web and get the latest links. Then it can heep them up to date. Google does this and so far they seem to have a very good search engine. They can also (I think they do) use 'clicks' to rank the links bringing the most clicked upon links to the top of the search results. Giving you what people click on most in the search results. NBCi does this too with there global brain technology (whatever that is).

    I don't want a lot, I just want it all!
    Flame away, I have a hose!

    --

    Only 'flamers' flame!

  7. BIGGER SECURITY HOLE by mpskeeter · · Score: 4

    No, I am not mpskeeter--I clicked on that link below and now I am him...sorry about that.

    Do a search for "password"--some of these geniuses have their banking and etrade usernames/passwords up there. Email and xdrive passwords are abundant.

    Also, an awful lot of these guys look at illegal pr0n. These bookmarks are right next to the ones showing their personal home pages with pictures of the wife and kids. The FBI and a divorce lawyer or two are gonna have a field day with this.

    I tried to contact one of the guys with his bank account open, but, for security reasons, his email addy is not on his profile...

    real smart website they got there.

  8. Re:bookmarks everywhere... by aozilla · · Score: 2

    I use different computers too, but I use companion.yahoo.com for my bookmarks...

    --
    ok then your [sic] infringing on my copyright! Could you as [sic] me next time before STEALING my comments for your own?
  9. Lateral Discovery? by hymie3 · · Score: 2
    I'd *really* like to see lateral discovery added to this (props to matt@csbgroup.org). Did 1,000 people have links to theFatProject ? I want to know what pages the majority of those linkers have in common. Maybe they'd all have links to the Body Modification Ezine. Or perhaps 83% of them linked to the NSA parody site.

    This would be highly cool for finding eclectic stuff. Kinda like, well, the lateral discovery in Napster.

    hymie

  10. blank page... by danny · · Score: 2
    When I visit the Hotlinks site, I get a blank page. Oh, another clueless site that assumes everyone wants to enable Javascript and be bombarded with popup windows and other junk.

    Danny, who still prefers Netscape 3.04

    --
    I have written over 900 book reviews
  11. A _big_ security hole by mpskeeter · · Score: 5

    Hmmm.. I'm posting this as 'mpskeeter', though
    my username at slashdot is totally different. :)

    Guess how?
    www.slashdot.org/users.pl?op=userlogin&.. etc was
    a link on that site, enabling people to easily gather username/passwords.

    (Offcourse, bookmarking such a link is a _bad_ idea, it even says so on the login page) :)

  12. Re:Combination by lgas · · Score: 2
    Actually, it's called 2wrongs.com. Google does do rankings based on linking, we do ranking based on (amoung other things) frequency of bookmarks, etc.

    FWIW, we are entirely Linux based too.

  13. Combination by KirTakat · · Score: 3

    Perhaps a better idea would be a combination of the two, one that searched like a traditional engine, but used a listing of everyone's links to rank the search, i.e. if there were twenty sites on Linux drivers, but one of them was booked-marked by way more people, that would be the first listed one..

    --
    /* Of course I'm real, but can you prove it? */
  14. people volunteer the bookmarks by weeble · · Score: 2

    The bookmarks they are reading are those from the websites where you can save your bookmarks online - they are already public property.

    A good thing I think, when your M$ OS blows its top it is the one thing that people forget to backup.

    --
    Slashdot Beta should die a painful death.
    1. Re:people volunteer the bookmarks by kfg · · Score: 2

      Ok, but then that just reduces its value even more through a great deal of sample bias.

  15. Oh Boy.... by Crewd · · Score: 5

    I can see it now, people spamming hotlinks.com with their bookmarks of goatse.cx and Bouillabaisse.

  16. Well it's an intriguing idea at any rate by kfg · · Score: 3

    Of course, I'll also have to come to grips with the idea that they're rooting around in my bookmarks, won't I?

    I mean, crawling the web for publicly accessable sites is one thing, crawling my data to see where I like go is another.

    Carnivore: The search engine.

    It would certainly come up with the most popular sites though. I'm not even sure that is a good thing. Anybody out there got any bookmarks that they wouldn't want their mom to see? What if it turns out we ALL have that one and it comes up on mom's search? Of course mom might be there already herself and dosn't want US to know.

    On the whole it seems like a "Not a best of the web, but at least very popular with the masses" type of deal and otherwise of limited use a real research tool. Part of the new " Power shopping on the web" paradigm.

  17. Geek self-referential belief system by waimate · · Score: 4
    This would, of course, create a self-referential belief system for geeks, wherein few new notions would enter the collective conciousness, and the group view of the world would be skewed by, er, the group view of the world.

    It's like being able to choose what things you want to appear in your own daily newspaper - it's inherently flawed because the most interesting things one encounters are often those one didn't expect to be interesting.

    Similarly the very best things to find with a search engine are those things which are not common knowledge. The job of a decent search engine is to flush out gems, not popular opinion.

    1. Re:Geek self-referential belief system by joshv · · Score: 2
      This would, of course, create a self-referential belief system for geeks, wherein few new notions would enter the collective conciousness, and the group view of the world would be skewed by, er, the group view of the world.

      Hmmm.... sounds what slashdot has become.
      -josh

  18. bookmarks everywhere... by bernhardh · · Score: 2

    well, at least they will have the most up to date list of preconfigured Microsoft and Netscape search engines, media lists, communities etc.
    I *like to type w*w*w*.*s*l*a*s*h*d*o*t*.*o*r*g once in w while, and since i constantly use different computers i never got to use bookmarks really. some are in my head and as long as they can not read my memories, they're pretty lost...

  19. Static pages by Tal+Cohen · · Score: 3

    A vast amount of information is still kept in static pages. Now, please check your bookmarks: how many point to static pages, and how many to dynamic, constantly-updated pages? It's only natural that you have very few, if any, links to static pages, since you've visited them, read the information you needed, and you're not likely to come back (nothing changes!).

    --
    - Tal Cohen
  20. my evaluation by DeadSea · · Score: 4
    Whenever I find a new search engine, part of what I rate it on is how well it can find my homepage, and how easy it is to get my homepage listed.

    As far as regular search engines go, it was much faster to get google to crawl my site and list it than anything run by, inktomi, altavista, or northernlight. I am very happy with google.

    As far as directories go, Yahoo lists two of the 7 sites that I maintain. I have managed to get dmoz listings for 6 of the 7, two of which, i didn't submit myself.

    This new directory only appears to have one of my sites, and at a URL that has been inactive for almost two years at this point. I'll have to see how easy it is to get stuff listed, but so far I am not impressed.

    1. Re:my evaluation by DeadSea · · Score: 2

      October stats for a site I manage: 1436 Googlebot/2.1 (+http://googlebot.com/bot.html) 1199 Slurp/si (slurp@inktomi.com; http://www.inktomi.com/slurp.htm 823 Mercator-1.0 671 Scooter/2.0 G.R.A.B. V1.1.0 570 Gulliver/1.3 548 Ask Jeeves)" 517 CrawlerBoy Pinpoint.com

  21. Google already does it? by Howie · · Score: 3

    Isn't this basically how Google's scoring works? They score pages according to how "well-linked" they are, particularly to other well-linked pages. OK, this is using bookmarks, but the premise is the same, isn't it? As soon as Blogger has completed it's plans for world domination, most people's favorite links will be online anyway (and they'll all be 'that cool new dancing hamster page').

    --
    "don't fall into the fallacy of believing that Perl can solve social problems. Maybe Perl 6 can, but that's a ways off"
  22. Who bookmarks sites anyway? by Mike1024 · · Score: 2
    Hey,

    I don't know about you people, but I don't often use bookmarks. www.slashdot.org, www.userfriendly.org and www.pointlesswasteoftime.com are the sites I think are good. They aren't bookmarked; I can type the URL faster than I can fiddle with those stupid bookmarks. The sites I bookmark are those that look a little interesting, but either aren't good enough for memorisation or have a long, confusing URL like http://www.rollanet.org/~joeh/10ghz/n6gn_article.h tml. Wouldn't a better judge of site quality be the number of repeat visitors?

    Michael

    ...another comment from Michael Tandy.

    --
    "Goodness me, how unlike the FBI to abuse the trust of the American public." -- The Onion
  23. But I don't have any bookmarks! by IO+ERROR · · Score: 2
    I am sure I can't be the only person out there who rarely or never uses bookmarks! Hm, I've had this browser config for a year now, and I've accumulated a grand total of three bookmarks. Now since I found all of these through other means, what good is a new search engine? Now I know there are people who put practically every site they've ever visited in their bookmarks (which makes it surprising this site doesn't seem to have any porn) but I only bookmark things I want to find instantly six months from now, or things I might forget to look at otherwise.
    ---
    --
    How am I supposed to fit a pithy, relevant quote into 120 characters?
  24. Wait a minute... by Seumas · · Score: 2

    If so many people have the sites in their bookmarks that they'll be listed on the site's search engine -- who's going to be left to search for the sites, since they're already in everyone's bookmarks?!
    ---
    seumas.com

  25. Interesting, but ultimatly doomed.. by Captain_Frisk · · Score: 3

    This is an interesting idea, and if we used bookmarks like we are supposed to, then I imagine that it might. However, it won't. Here's why:

    1. It requires active participation of internet users. The beauty of other search engines is that you can set a bot to go out crawling, and when its done, you have a bunch of links. This idea requires that I visit this site, and give them permission to access my Hard Disk. How many users are going to do this?

    2. What percentage of websites out there are even in someones favorites? I can't imagine that every site is in someone elses favorites.

    3. I use favorites to keep track of sites that I won't remember the URL for. Say I read somethiung, i want to come back to it later, but I don't know where it is. I use it like temporary storage. Its faster to type in a url, especially now with the various autocomplete functionality out there. Thus, if Hotlinks raided my bookmarks, they would find a link to Slashdot postings by John Carmack, a few articles on Image Processing and Edge Detection, and the full text of The Little Prince. Its very specific information, and not the kindof information that Hotlinks is looking for.

    I think this idea is in trouble. Who doesn't use Google anyway?

    Captain_Frisk

  26. Can you say 'demographics'? by DrWiggy · · Score: 3

    A person's bookmarks are an insight into that person. From them, not only could you work out their main interests and hobbies, but their sense of humour, perhaps their politcal persuasion, and almost certainly their sexual tastes. The information being submitted to hotlinks, is invaluable in terms of demographic analysis.

    Of course, the same information could be gathered with the use of persistent cookies and normal search engines - what people search for are just as useful, but when people click away and "surf" what they decide to keep close to hand probably gives a better insight of the person.

    Just a thought.

  27. the power is not the search engine by po_boy · · Score: 3
    To me, the power in this is not that you can search through other peoples' bookmarks, it's that you can store your bookmarks here.

    I use about 3 or 4 different computers in a week, and I actually do use bookmarks in my browsers. This means that I end up bookmarking stuff on one machine, and then not having it at another when I use it.

    I'm not sure how many other people have a similar problem, but this service appears to solve it. The average slashdot reader and myself have webservers and the ability to hack together a few perl scripts, or the knowledge to find and mail our .netscape/bookmarks.html files around to keep our boomarks synchronized and always available if we want, but how can most people do this?

    I would imagine that this service would be useful to the average multi-computer user. Is this the best solution you have seen for this problem? What other methods do you employ to move bookmarks from one machine to another and keep your bookmark files in sync with each other? Is this the best type of solution we can provide to users for this kind of poblem? Do you think that it's widespread enough that a good solution would be used by many people?

  28. Yes, but... by Malevolent · · Score: 3

    ..most people find websites using existing search engines - therefore the vast majority of bookmarks will be from sites already ordered high up in conventional ranking systems.

    --
    -Tom
  29. Preset Links?? by MathJMendl · · Score: 2

    Ack. What about the predefined internet links, such as Hotmail and Microsoft's links inside of Internet Explorer? If they are on every computer's preset ie links then they would theoretically be on top of many searches in this site. This is not a good thing (TM).

    --


    "I have not failed. I've simply found 10,000 ways that won't work." --Thomas Edison
  30. What? No SEX? by mangu · · Score: 2

    They don't have any links to what must be the most widely sought item on the Internet. At least, it's the most widely spammed item...

  31. The Jukebox Phenomenon by MoNickels · · Score: 4

    The problem with the bookmark approach is that it will tend to result in the Jukebox Phenomenon.

    The short version of this is that current Top 40 radio station rotation systems are reputed to stem from the analysis of a jukebox supplier who noticed the same 40 records kept getting played over and over. This is because when a record gets played once, it tends to get played again, resulting in circular reinforcement, with hits one through 100 charted in a steeply declining curve. This is how current radio programming, music marketing and MTV work today: reinforcement.

    The problem with this approach (in music or data) is that popularity is no guarantee of accuracy, appropriateness or utility. This is represented in the music world by the high cost (real and otherwise) of successful entry into the market. New music (data) is not popular enough to be included, but it can't easily be included without becoming popular.

    Personal bookmark collections tend toward the same phenomena. Besides the inaccuracy stemming from factory-included links (which I would hope they account for), the bulk of entries will result from links in turn resulting from searches on existing search engines, which are, no matter how big, closed data sets: they have boundaries and do not include the entire web. These searches are also happening in a only few places, resulting in the JP. Hotlinks will thus tend to include sites that have already appeared elsewhere. A certain number of "missing" pages will be newly included (the user's own sites, work sites, sites of friends) but very few "missing" pages of other kinds, particularly low-traffic pages (such as those with refined and highly specialized content: deep governmental directories, university research labs). In other words, Hotlink's approach is not much different than Google's number-of-times-linked approach or bulk submitting on an engine's "add your site" link, just a larger population sample.

    Napster experiences the Jukebox Phenomena: If I look for Loudon Wainwright III songs, I tend to find lots of iterations of the same three songs and not much else: Dead Skunk, I Wish I Was A Lesbian and the duo with Iris Dement. But if I want to find, say, any song off of the Therapy album, it tends not come up because it is not as popular. This is because the JP has propagated the popularity of the same three songs. An ideal data source would include the entire data set, popular or not. (I am aware Napster cannot and is not designed to be a complete data set).

    If one's goal is to include more web sites, a more accurate approach than Hotlink's would be to scavenge user's History files. That would, in my case, include a few hundred additional sites a week, although I'm sure the privacy issues would be a problem. If one's goal is to return the most accurate results, an even better approach would be infinite page caching in which a new iteration of a page does not replace the previous entry, but is added to it. In this way, one could search across history as well as data.

    --

    Wordnik, a dictionary project which aims to collect

  32. I don't see a problem if... by GC · · Score: 2

    I don't think that it would be a problem if you ensured that you took bookmarks from people who didn't exclusively use that search engine.

    This search engine formulation requires that other search engines exist and are used by others. If it ever gets popular it will find that increasing it's popularity will increase it's resistance to getting more popular, an interesting exercise in game theory...

  33. Re:Nonstandard HTML by Chuck+Chunder · · Score: 3

    You mean the "nonstandard proprietary HTML qualifier" onload as documented in the html specs?

    --
    Boffoonery - downloadable Comedy Benefit for Bletchley Park