Slashdot Mirror


Is The Web Becoming Unsearchable?

wayne writes: "CNN is running a story on web search engines and their inablity to keep up with the growth of the web. Web directories such as Yahoo! and the Open Directory Project can take months to add a site and the queue of unreviewed sites is growing. Most search engines are even further behind and are filled with off-topic and dead pages. The trend is toward pay for listing. Will the free, searchable web fade away?" The article gets beyond the "Wowie, so much content, engines can't keep up" typical blather and addesses some of the reason search engines have a hard time keeping up.

249 comments

  1. Signal to noise ratio by Anonymous Coward · · Score: 1

    of the entire web has degraded so much that it's not the search engines that are full of useless garbage... it's the web itself and these engines simply have indexed what exists "out there". Garbage In, Garbage Out -- still holds true after all these years.

    1. Re:Signal to noise ratio by Inside_Joke · · Score: 1

      That's pretty much true. If there wasn't so much complete trash on the 'Net, then maybe the search engines like Yahoo would have a better chance of returning something other than mindless drivel and dead links.

      What we REALLY need is an Internet Cleanup Crew. People specifically hired to go through all these directories and clean out all the stale, useless garbage. We could reduce the size of the Web to fit on a Zip disk after we got rid of the junk! Hell, getting rid of the pr0n alone would be enough to get it on a CD-R.

      --
      I refuse to answer that question on the grounds that you're an idiot!
  2. Yahoo taking months to add a site by Micah · · Score: 1

    You can say that again. I submitted a site last November and it @!##@% still isn't there in its directory! What's the deal?

  3. Yowie. by Skyshadow · · Score: 4
    Yep, all that content, and yet when there's a slow day at work I can still run out of interesting stuff to look at on the internet.

    Yowie.

    ----

    --
    Every year during my review, I just pray the words "slashdot.org" aren't mentioned.
    1. Re:Yowie. by joekool · · Score: 1

      Man, everytime you post you must get at least half a dozen restraunt suggestions--I wish I lived somewhere close, so I could at least try a few, instead of just asking my girlfriend if she ever heard of them!

      --

      Slackware: old school feel, new school gear.
    2. Re:Yowie. by krid · · Score: 1

      check out world wraps on california avenue in palo alto. despite the name, it's a great middle-eastern joint, run by some nice folks from jordan (iirc). the food is really good, and the guys behind the counter a quite friendly.

      ya know, it would have been cleaner if you provided a way for us to tell you about gyro places *besides* posting a totally offtopic comment...

    3. Re:Yowie. by Philaretus · · Score: 1

      When I lived in the South Bay, I used to go to a Greek place on DeAnza Blvd south of Stevens Creek called Yiassoo. It's on the east side of the street in what used to be an old Taco Bell. Great gyros - back in 1994 anyway.

  4. Yes by Trepidity · · Score: 1

    Yes.

    Next question?

    1. Re:Yes by pod · · Score: 1

      I've found this to be the case as well. I don't know what the author of the parent post was looking for (porn perhaps?) but everything I've searched for so far has turned up plenty of free resources in the first few pages of hits. In fact, looking to buy something (looking for a supplier) is pretty tough, and sites like Yahoo are more useful in this area.

      --
      "Hot lesbian witches! It's fucking genius!"
    2. Re:Yes by treat · · Score: 1

      Most searches for herbal medicines (e.g. "5-HTP") find you way more hits (especially the high ranking ones) from companies trying to sell you it than actual objective information about it.

    3. Re:Yes by rjamestaylor · · Score: 1
      You're an idiot. Sorry, but it's true. Or, a troll.

      http://www.google.com/search?q=flowers does indeed show JustFlowers on the top of every page -- but within in the "Sponsored Link" box. The search results themselves are unaffected by ads. Pay a little more attention before castigating a truly useful web service.

      --
      -- @rjamestaylor on Ello
    4. Re:Yes by rjamestaylor · · Score: 1
      Sorry. Insults not called for. I have no idea how intelligent you are.

      That said, I still take issue with your complaint that ads appear on these search pages. These ads are clearly marked as such and do not influence the search results provided by Google. No harm, no foul.

      A few times I have found what I wanted via the ads and not the results (however, one gross exception was Ximian's buying ads on searches for KDE). And I've never been even slightly confused as to what was an ad or what was a result.

      --
      -- @rjamestaylor on Ello
    5. Re:Yes by rgmoore · · Score: 3
      Yes, free, independent sites ARE tough to find, even with Slashdot's favorite Google. Eveyr time you search for ANYTHING, the first 1000 hits are always for a commercial site.

      Except that this isn't true. If I look up, say, Ronald Reagan, none of the top 5 hits are big commercial sites. They include the Whitehouse pages on former presidents, a fan page, the Reagan Presidential Foundation, the Reagan Library, and the Official Reagan Web Site. If I look up Linux Kernel, the #1 site is the Kernel Archives page. Maybe you're looking for data where there just aren't many interesting independant web sites out there, which is not something that can be cured with a better search engine.

      --

      There's no point in questioning authority if you aren't going to listen to the answers.

    6. Re:Yes by mlas · · Score: 1

      >Eveyr time you search for ANYTHING, the first 1000 hits are always for a commercial site.

      Umm, let's try an experiment: go to google and type in something inherently non-commercial: i.e., "am i hot or not" or "all your base belong to us". I guarantee that the first few links will NOT be commercial. Google's main ranking comes from back-links, which is great 'cause it's inherently difficult to fiddle with the rankings. If an indie is popular and many people link to it, it can come up first. The system ain't perfect, but it's close enough to be a modern miracle, especially if you're good at searching. Remember, folks, use lotsa proper nouns and "quote exact phrases"!

      --
      "Luck is the residue of design" --Branch Rickey
    7. Re:Yes by KilljoyAZ · · Score: 1

      I believe Google gets paid by sites wishing to be listed first on a given search. For example, JustFlowers.com is on the top of every page when you search Google for 'flowers.'

      --
      This .sig is currently on hiatus for retooling.
    8. Re:Yes by KilljoyAZ · · Score: 1

      Argh, that's what I meant. No need for insults.

      --
      This .sig is currently on hiatus for retooling.
    9. Re:Yes by KilljoyAZ · · Score: 1

      It wasn't meant as a complaint, it was a poorly-worded observation. Google's one of my favorite search engines too. Anything they can do to keep it free to the end user without sacrificing the quality is ok by me.

      Next time I'll take more time in wording my comments :)

      --
      This .sig is currently on hiatus for retooling.
  5. Call Intercept by Isaac-Lew · · Score: 1
    It's kind of like caller ID and all the other useless services.

    I wouldn't call all of those services useless...there's an interesting one by Verizon called Call Intercept If the person's number is unavailable or anonymous on Caller ID, they are sent to a message asking them to identify themselves. If they don't, then they don't get through. Great for those "please stay on the line for an important message..." phone calls that telemarketers & bill collectors love :).

  6. Re:Google by crayz · · Score: 1

    This is probably something like what you're looking for, though the word "Aida" can't be found at all with the others.

  7. Re:AltaVista hates Lynx by ninjaz · · Score: 2
    Interesting - I had never submitted a link to Altavista personally to see how the whole process works. After seeing your description of the GIF mechanism, I've tried it and see what you mean.

    Apparently this is an attempt at foiling script-based ping and if down, submit as dead type attacks on other people's entries.

    I think a more reasonable way of handling this would be to, eg., check the site for 2 days in 12 hour increments (to allow for, eg., eBay's Sunday maintenance Windows and such). If no positive response during that period then drop the link.

    In any case, I was only using that mechanism as an example of a saner way than having 100 votes to automatically mark a site as dead. I don't personally use Altavista's search engine or condone it, and how this mechanism could be linked to a browser button (which could work with Altavista if they used my method instead of requireing a multi-submit + enter text from a GIF reporting process)

    Sounds like a good title for a trivial patent, even..

    Method of verifying URL availabity for a database of URL's

  8. Re:Another way by ninjaz · · Score: 3
    "What if a user falsely claims a site to be dead?" Well, what if it took 100 different IPs to claiming it to be dead before it really was considered dead?
    Actually, it is trivial to maliciously get 100 IP's to claim a site to be dead. All you need is a page that gets 100 hits/day and an IMG tag embedding the URL to the dead link reporting page w/ the target URL embedded. Whoever hits the page will unwittingly make a request to mark your target dead from their own IP. Or, script kiddies could create botnets for the purpose of submitting dead links to get high-profile sites delinked, etc.

    The correct way to handle this situation is how the search engines already do - when a link is reported dead, they just make a request to the link. If it generates an HTTP 404 response code, or the site is down, it's marked actually dead.

    I'm not convinced this is always a good idea, though - I've worked for a guy who would battle for top positioning on the search engine with a few competitors. When either of them noticed that the other's site was down, they'd submit the other site as a dead link. I like google's Cached page mechanism, which allows you to view sites that are currently unreachable. Great for when you need docs from a site which happens to be down at the time.

    How about a button in browsers that enables you to mark a page as a dead link?
    This is actually trivial to implement, as shown in Google's toolbar page: http://www.google.com/options/toolbar.html

    Of course, you'd need to use this technique with a search engine who takes dead link submissions. Eg., Altavista and its "Add or Remove a Page" link here: http://web.altavista.com/cgi-bin/query?pg=addurl

  9. Good one! by Juju · · Score: 1

    Thanks for the laugh...

    --
    Black holes occur when God divides by zero.
  10. The *only* way to search the web.. by talks_to_birds · · Score: 1
    ..remains WebFerret.

    (Well, not really, but it's damn good...)

    It's about the only Window$ app I use anymore.

    It's kinda gone down hill after the parent company was bought out by ZDNet but it still really works pretty well.

    It meta-searches about a dozen of the major search sites simultaneously.

    I use it alot to search for the meaning of obscure error messages and error codes and stuff like that.

    Used to use it alot for searching out what cryptic .dll filenames were related to...

    t_t_b
    --
    I think not; therefore I ain't®

    --
    I'm on PJ's "enemies" list! Are you?
  11. Google spidering by danny · · Score: 2
    Google used to spider my sites almost twice a month, but it seems to have reduced its crawl frequency since it started indexing dynamic content. Another problem with the latter is that e.g. book pages at Amazon can appear multiple times in search results, as Google follows links from different associate programs.

    I've also been kicked from first to sixth on a search for "book reviews" :-(.

    Danny.

    --
    I have written over 900 book reviews
  12. Re:Hmm... by Mawbid · · Score: 1

    Really? Google found it for me on about 2,510 pages.
    --

    --
    Fuck the system? Nah, you might catch something.
  13. A few errors by madprof · · Score: 1

    The piece tries to make a good point about dynamic content that is generated by user input not being indexable which is true.
    Search engines can't type things into forms and get results in an intelligent way.
    It's just a shame that they get confused in their expressions.
    Nice piece generally though. 550 billion wab pages is an awful lot..

  14. Unsearchable... nope not here. by _LORAX_ · · Score: 1


    I can usually find what I'm looking for either using Google or altavista. The hurestics used for google are the best I've seen in any search engine. I can ususally find stuff that is anywhere from several days to several years old.

    Come on... these are the same people that were claiming that we would all run out of IP's by now. They don't seem to realize that everything adapts.

    1. Re:Unsearchable... nope not here. by silicon_synapse · · Score: 1

      I think we HAVE run out of IP addresses...I could be wrong, but that's never happened before so I doubt it.


      --

  15. Re:possible solution by luge · · Score: 1

    Umm... sure, it would be a great idea if it would work. But the whole proposal depends on the directory structure being harder to spam than keywords. I don't see any reason why it would be any harder to put "teens->education->health" in the directory structure for hardcoreteensex.com than it would be with current keyword-based schemes. I'd love to hear why you think that this would be different than what already goes on... but I'm not holding my breath about being convinced.

    --

    IAAL,BIANLY

  16. Re:Google by sacherjj · · Score: 1

    Darn. We thought we could get that one past you...

  17. There is a way to fix this... by leonbrooks · · Score: 2
    ...on Google, at least: link early, link often. Link to your favourite sites on every page you make, which boosts them in the ratings.

    BTW, AFAIK Google doesn't change rankings for money, it adds those little side-links for money. I do hope they stop adding gingerbread now lest the site end up as cluttered and useless and Deja did.

    --
    Got time? Spend some of it coding or testing
  18. Neurogrid by Julz · · Score: 1

    My friend has/is developing a system and tools for creating a p2p search network. This seems like one way to interconnect searches and information as it becomes more interspersed thoughout the know universe. Have a look at Neurogrid

    --
    When shit hits the fan get some of these https://youtu.be/pY-GncsZ-UE
  19. What I've been looking for by Tuor · · Score: 2

    I'd like to see some specilized search engines, nothing too complicated. What I've been wanting for some time is a search engine of just .edu.

    There are lots of relly informative .edu sites out there, but they don't show up well on search engines, and may are burried levels deep. i.e. college.edu/~professor/fall2000/class/topic/lotsof info.html

    (btw, if anyone finds a .edu engine, PLEASE let me know!)

    ========= Put my nick in front of the "_". I love my computer

    --
    I love my computer -- You make me feel alright (Bad Religion)
    1. Re:What I've been looking for by Grit · · Score: 1

      Google's advanced search page lets you search restricted to a particular site, which can be used to restrict to ".edu". Use their form, or do searches of the form "key word site:.edu"

  20. Re:Primitive Replacement for a Directory by Fred_A · · Score: 1
    For example, where can I get my oil changed in Paris, France?

    Why should Google replace the yellow pages ?

    Can't you just try www.pagesjaunes.fr like any sane person would (hint you'll get 1510 answers, all right on spot).

    (duh)

    --

    May contain traces of nut.
    Made from the freshest electrons.
  21. directory and trust by apropos · · Score: 1

    It's becoming obvious that scan-type engines are having increasing difficulty with the amount of data on the internet. The bandwidth required by search engines will increase exponentially, and at some point it *will* become unworkable.

    The other alternative is to have webmasters manage the directory themselves. This is problematic because webmasters have a strong incentive to list their website in as many places as possible. Some pr0n kings would do every single listing if they could.

    So you take away the incentive for the listers to list everywhere, or give them a strong enough reason to list only in the correct places. Since the pr0n kings will never get it straight, you'd be better off using "trusted" maintainers. With the wonderful world of PKI cryptography, verifying submissions could be completely automated and your staff of submitters could be *very* large.

    So you make it possible for anyone to become a submitter. It can't be easy enough for the pr0n masters to get a new ID every day, but maybe once every three to six months (say).

    Then if enough complaints (*authenticated* complaints) are lodged, some sort of distributed arbitration process could decide to revoke a submitter's status - and then remove all of that submitters submissions.

    The distributed arbitration process could take the form of a jury of twelve randomly selected submitters (or submitters with a special arbitration rating?). Basically, people could be polled at random, and anyone willing to be on the jury could examine the facts and make a vote. Perhaps a discussion group could be setup for deliberation.

    Hmmm... it would be an interesting example of an online society. Would the system really run itself? If anyone has ideas, email me at tom@alterworld.net (put tom-ok in the subject line or it will bounce).

  22. Yeah by Ravenscall · · Score: 2

    I hardly ever use search engines anymore. Most of the sites that I find are linked directly off of pages that are specializing in what I am looking for anyhow, and I find that the content is usually of a higher quality anyhow.

    Either that or freinds will send me links.

    --
    You say you want a revolution....
    1. Re:Yeah by sideshow-voxx · · Score: 1

      There are a few good sites where people publish their own bookmarks - like mybookmarks.com.

      If these bookmarks could be categorised properly we would end up with a directory of quality-tested sites that is self-maintaining and current. That's what we want isn't it?

      --

      "Anybody remotely interesting is mad, in some way or another" - Doctor Who

  23. Offtopic, Funny by Robin+Lionheart · · Score: 1

    > Black holes occur when God divides by zero.

    I once joked to my friend Steve Pearl that God resets the universe's divide by zero errors. He wittily remarked, "So you're saying when we are thrown, God catches us?"

  24. I have an idea... by gatkinso · · Score: 1


    Why don't they distribute the link verification much the same way SETI@Home does? They could then shoot a micropayment (say a penny or so) to the user for every work unit (say 10,000 or so links) that they verified.

    This would primarily be for folks with always on access as it might tend to clog a thin pipe.

    --
    I am very small, utmostly microscopic.
  25. Re:Then why did they refuse me on DMOZ? by leandrod · · Score: 1

    Actually something must have gone wrong there, because I was already an editor and they cancelled my account without a note, an email, nothing.

    Besides in all my attempts to reactivate my account or merely contact them I received no answer at all.

    Now the categories I created are marked "This category needs an editor"... this is absurd!

    Go check the categories I created at http://dmoz.org/Computers/Software/Databases/Relat ional/ and http://dmoz.org/World/Portugu%eas/Computadores/



    --
    Leandro Guimarães Faria Corcete Dutra
    DBA, SysAdmin

    --
    Leandro Guimarães Faria Corcete DUTRA
    DA, DBA, SysAdmin, Data Modeller
    GNU Project, Debian GNU/Lin
  26. There is a point here by Grit · · Score: 1

    I've experienced both sides of the question. Usually I can find anything I want on Google--- especially if it's technical information, but I've successfully looked up saints, theological arguments, gaming groups, etc. I occasionally supplement this with Citeseer, an excellent resource for research papers.

    On the other hand, I was looking for a replacement rack mount kit for a Cisco switch that had been donated to my research group. Google and Altavista were pretty useless, as far as I could tell; I eventually just had to go to ECost and use their search facility to find the part I wanted.

    So, I can see how users with different desires could easily develop widely divergent opinions about the utility of web search. Perhaps consumer sites are much less well searched? Perhaps one way that search engines can increase their utility is by making partnerships with online retailers to provide indexing of their product descriptions--- I'd be very happy if Amazon books or ECost electronics started showing up in response to my Google searches.

  27. Hmm... by BilldaCat · · Score: 2

    Well, I can't find anything when I search for addesses ..

    --
    BilldaCat
    1. Re:Hmm... by BilldaCat · · Score: 2

      It's amazing that there are 2,510 people who spell at least as bad as Hemos. :)

      --
      BilldaCat
  28. I second this motion! by Darth+Maul · · Score: 1

    Yes, Google has never failed to return useful and active links. What a great resource!

    AltaVista used to be good until they turned into another www.useless-portal-to-everything.com.

    -Mike

    --
    --- witty signature
    1. Re:I second this motion! by beddess · · Score: 1

      actually, altavista used to be good until a friend of mine wrote a spamming script and gave it to hundreds of people, such that a search for anything on altavista turned up nothing but porn.

      --
      "Weasling out of work is important to learn; it is what separates humans from animals. Except for weasels."
  29. Link Confirmation Bot? by PantherX · · Score: 1

    Do search engines have bots that go out and search already indexed pages to check for dead links, changed pages, and etc? I would think that the major search engines would have many of these to make sure their data was up to date... if nobody has done this yet, there's my contribution ;-)

    --
    Sig missing. Reward.
  30. Re:The power of "Word of Mouth" by rinkjustice · · Score: 1
    I haven't found /. through a search engine query, but I did manage to find Everything 2 that way. I was "polluted" at the time, so it made for a strange and lucky night.

  31. Re:Gnutella by ConceptJunkie · · Score: 2

    ...and look how well Gnutella scales.

    If you want 99.9% of Internet traffic be nodes forwarding search requests and results back and forth, that's the way to go.

    --
    You are in a maze of twisty little passages, all alike.
  32. Re:Searching via Apps by mrzaph0d · · Score: 1

    ah, but if broadband starts getting into a majority of the households, would there be a need for an offline search capability? i mean, i'm usually connected all the time, so it's never a problem to pop open a browser window and do a quick search. i guess it would depend a little on if people start leaving their pc's on all the time. anybody know of "normal" people who like leaving their machines on? i know my girfriend likes to..

    --
    this is just a placeholder till i send back my real sig from the future.
  33. bad web site design. by nchip · · Score: 2
    Unsearchable net is just a result of ignorance by search bot writers and web site creators.
    • Ignorant robots.txt usage. A site has all text in their database. Now bots start hitting. After a while, admin notices that bots a churning too cpu time. To fix the problem, admin puts a robots.txt killin bots, instead of creating a light robot-friendly area.
    • Greedy bots Robot writers usually don't help much by atacking servers in overload bursts, causing mayhem on many sites.
    • Too limited syntax on robots.txt Why can't we ask bot's to visit only on specific hours? why can't we se a sensible hit rate? Ofcourse at first most bot's would ignore such tags, but forcing
    • The unindexed. If there a know references to someones personal homepages / scientific article, how could the bot's find them? A Nightly generated site index could help, but raises some privacy questions, if done on a public server.
    --
    signatures pending - ansa@kos.to - (dont mail there)
  34. "taken by me" by scotpurl · · Score: 1

    not taken of me.

  35. yeah, but by scotpurl · · Score: 2

    Seven of the first ten have nothing to do with automo repair. Two of them are iffy at best. My grading of One right and two half-right out of 10 answers is still an "F".

  36. Ok, then here's an easy one.... by scotpurl · · Score: 2

    Try to find, using the Google Directory, pictures of Yellowstone National Park, taken by me. No fair using the search function. (However searching the directory for "pictures of yellowstone and scott purl" will result in two misses, and nothing else.)

    Yes, it's "Vanity Web Surfing", but if Google indexes my site, why doesn't it automatically categorize it? (whine whine)

    So, yes, Google is pretty derned good. But it's still not a directory, and the directory it does have covers, what, 1% of the web? 0.01%?

  37. Primitive Replacement for a Directory by scotpurl · · Score: 3

    What the web REALLY needs is a directory. An honest-to-goodness, telephone/yellow pages style directory. This whole nonsense about keyword searching is providing people who just want traffic with a lot of free advertising and listings.

    The phone company provides you with one free listing (unlisted is optional), and makes you pay for each extra category (like in the Yellow Pages -- and if you're not from the U.S., please see http://www.bigyellow.com/supertopics for an example) that you want something listed in. Search engines ought to be replaced with something similar.

    Yes, I know Yahoo and Dmoz try, but they don't go out and actively index sites, making their use limited, and the number of sites even more limited. If Google were to create a Yahoo/Dmoz style directory, that would help. Better yet, if people were forced to provide either META tags, or some information when they acquired their domain (part of whois?)....

    For example, where can I get my oil changed in Paris, France?

  38. Re:The power of "Word of Mouth" by 31eq · · Score: 1

    It was either following a link or a search result. I don't remember which, or the subject, but I bookmarked the site immediately.

  39. Re:Gnutella by patrixx · · Score: 1
    By having each server tell us what they have, we are assured that when someone searches for how to replace a broken window, they won't get what they don't want.

    Whats wrong with this then?:

    Google Search: fix a broken window Ad vanced SearchPreferences&nb sp;SearchTips
    "a" is a very common word and was not included in your search. [details]

    Searched the web for fix a broken window . Results 1 - 10 of about 189,000. Search took 0.90 seconds.
    Category:Recreation>&nb sp;Autos>MakesandModels >Mazda>RX-7&nb sp;

    Learn2 Repair a Broken Window
    ... 2torial #0515: Learn2 Repair a Broken Window. Home Run!!! As we know, windows break ... way,
    the "rabbet" is the notch in the window sash that the glass fits into. ...
    www.learn2.com/05/0515/0515.asp - 28k - Cached - Similar pages

    Remodel.com Fix-It-Smart: REPLACING BROKEN WINDOW GLASS
    ... Fix-It-Smart, Home. REPLACING BROKEN WINDOW GLASS Broken window glass can be
    replaced by regular glass or by plastic unbreakable glass. ...
    www.remodel.com/fixit2/REPLACING_BROKEN_WINDOW_GLA SS.asp - 15k - Cached - Similar pages

    Remodel.com Fix-It-Smart: REPLACE A BROKEN WINDOW
    ... Fix-It-Smart, Home. REPLACE A BROKEN WINDOW This guide
    was adapted from USDA Extension ...
    www.remodel.com/fixit2/REPLACE_A_BROKEN_WINDOW.a sp - 16k - Cached - Similar pages

    ITworld.com - Tweak columns in Explorer and fix a broken ...
    ... OPINION Tweak columns in Explorer and fix a broken Java patch Plus: Tips on drag-and ... printer:
    He drags the icon from one window to another. To do this in ...
    www.itworld.com/jita/3799Win2kFeat/0,,1_3799.htm l - 32k - Cached - Similar pages

    Glass_and_Windows, Topic 108
    ... I have a broken window, they are old wood windows,
    can anyone help with telling me how to fix it? ...
    www.doityourself.com/archives/Glass_and_Windows_ 10 8.htm - 9k - Cached - Similar pages

    Repair a Broken Window Pane with the iVillage Home How-To ...
    ... painting. Becoming soft. Remove stubborn window putty with a heat ... Take a shard of
    broken glass with you to ... STREAK-FREE GLASS CLEANSER FIX A LEAKY GUTTER CLEAN ...
    www.ivillage.com/home/howtoguide/repairandrenova te /articles/ 0,9449,167075_211955,00.html - 71k - Cached - Similar pages

    Re: Don't fix what isn't broken
    ... 2000 12:48 pm. In Response To: Don't fix what isn't broken (Terri Zamore). ... the light
    of day in OS X. For instance, window management in OS 9 is at the very ...
    www.maccentral.com/storyforum/forums/_news_0011_ 23 .upgradeguy/ ?read=10 - 6k - Cached - Similar pages

    Centre of Criminology News
    ... HOW MANY CRIMINOLOGISTS DOES IT TAKE TO FIX A BROKEN WINDOW? The following responses
    to this query were provided by faculty, staff and students at the Centre ...
    www.library.utoronto.ca/libraries_crim/centre/crim news.htm - 35k - Cached - Similar pages

    LifeMinders Home Sample
    ... Unsubscribe. Fix It Projects Replace A Broken Window.
    Maintain Your Gutters Now...Or Pay Later. Gardening ...
    www.lifeminders.com/examples/home_minder.html - 13k - Cached - Similar pages

    Home Upkeep
    ... Fix a Leaky Faucet How to fix most faucets yourself and save
    money. Repair a Broken Window Fix your own broken windows. ...
    www.frugalliving.about.com/cs/homeupkeep/ - 54k - Cached - Similar pages

    ResultPage:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    Next Searchwithinresults Try your query on: AltaVista Excite HotBot Lycos Yahoo!

    GoogleWebDirectory - CoolJobs - AdvertisewithUs! - AddGoogletoyourSite - GoogleinyourLanguage - AllAboutGoogle
    ©2001 Google

  40. This was solved years ago, but... by dublin · · Score: 3

    This is a real problem, but the fundamental reason it's a problem is one that's well-understood by library scientists: We only have addresses, not content identifiers.

    To use a book analogy, the entire web is built on Dewey Decimal addresses (URLs), when what we need is those combined with ISBN numbers (URNs).

    I didn't make up the idea of URNs - the concept was first described to me by Peter Deutsch, the inventor of Archie, at Interop sometime in the early 90's, shortly after the web got going. (Back when there were no search engines, and we found out about new web sites by visiting NCSA's What's New page, which for a while, anyway, actualy cataloged *every* new web site that appeared, and some of us could claim to have surfed the entire web...)

    The idea behind URNs is that they would be a unique identifier for the content. The same content living on different sites would have severl URLs, but only a single URN. This is still needed today, but the problems that kept it from being implemented then are even more intractable today: Who hands out URNs? (IANA didn't want to touch that!) How do you handle versioning? What about dynamic content? Who are the librarians?

    We still desperately need somthing that fills this need, but it's not likely we'll get it. One last parting thought - in discussing this with Deutsch, he pointed out that these are new problems to us, but that the library scientists had solved them quite some time ago: It is only the typical CS insistence on reinventing everything and dismissing the knowledge of those in other fields that makes the process so incredibly painful... Hubris strikes again.

    --
    "The future's good and the present is nothing to sneeze at." - Roblimo's last ./ post
    1. Re:This was solved years ago, but... by search66 · · Score: 1

      Here here.. Good post my friend. And I as you was one of those who visited the ENTIRE web. It was actually quite very. I was one who had a web page then.. (back in 92 or 93 I believe) and it included my notepad created webpage and a link to a text file...whoopeee! I agree... The web sucks. I do some web promotion, and the strings that I have to pull (and cheat) is amazing. From meta tags, to content.. to hidden links.. Silly I tell ya.. SILLY. None-the-less... Overhaul should be apparent. As for my search engine of choice would be altavista.. well.. until they started to suck..heheh.. They drop web pages out of the blue... argh!

      --
      They called us nerds and geeks.. and now they call us boss.
  41. Dizz-net by jfunk · · Score: 2

    Check out Dizz-net. It's basically an article spawned by a conversation on Slashdot over a year ago that moved to a mailing list.

    We had some cool ideas, but the infrastructure for such a thing would be huge. I have a bunch of interesting messages from the mailing list describing some pretty cool stuff, like having nodes only search for stuff that near them, network-wise, to lessen the load at critical points. There was also some talk about moderation ("Click here if this link is not relevant to your search") and heuristics to stop common abuses (spider-bait).

    It never happened, because it's pretty heavy stuff to implement properly.

    I'm sure some patent-squatter has a patent on it already, with the full intention of letting someone else do the hard work. :-)*

  42. Are libraries becoming useless? by segmond · · Score: 2

    Are libraries becoming useless?

    Posted by Hemos on 03:53 PM March 27th, 2001
    from the we-talk-and-talk-about-same-crap dept.
    segmond writes: "CNN is running a story on libraries around the world and their inablity to keep up with the growth of the number of books published. Libraries such as ones belonging to even the biggest instutions such as Harvard, Yale and MIT can take months to add a book to their collection and the queue of unreviewed books is growing. Most libraries are even further behind and are filled with off-topic and old assembly books about VAX and Z80 programming. The trend is toward pay for listing your book. Will the free, searchable library fade away?" The article gets beyond the "Wowie, so much content, libraries can't keep up" typical blather and addesses some of the reason libraries have a hard time keeping up.

    --
    ------ Curiosity killed the cat. {satisfaction brought it back | it didn't die ignorant | lack of it is killing mankind
  43. browser plug-in by iriles · · Score: 1

    What about a browser plug-in that indexes pages as you view them and submits the results to a centralized database (or decentralized if possible)? This would have the advantage of being able to index every page people go to. The database could even store more detailed information about pages that are more popular. Groups of people with special intrests could set up there own private index of the pages they visit. Individuals could even have private indexs of their history and bookmarks.

    the possibilities are endless.

    -ishmael

  44. Re:The trend is toward pay for listing by hemp · · Score: 1

    In my local, SBC(SouthWesternBell)charges extra if you don't want your phone number to appear in their published phonebooks otherwise your number will be published in the proper section(white pages for persons, yellow for business, blue for government, etc)

    I can foresee a time when people pay to *not* be included in search engines.

    --
    Skip ------ See the latest from http://www.anArchyFortWorth.com
  45. Not quite by linuxlover · · Score: 1

    that should be
    # include <math.h>
    we don't want no C++ OOPs here, just plain old C ma'm

  46. Search engines can't find everything... by cr0sh · · Score: 3

    Look up information on the "Invisible Web" - islands typically untouched by search engines, where you need another site to "hop" to these nets of information - cool stuff can abound in these disconnected areas. Here are some links to get started with:

    DirectSearch - Invisible Web Search

    The InvisibleWeb

    WebData.com - Invisible Web Search

    InfoMine - Scholarly Internet Resource Collections

    AlphaSearch - Invisible Web Search

    IIRC, Slashdot even ran an article about this not too long ago - I think this is it, not sure...

    Worldcom - Generation Duh!

    --
    Reason is the Path to God - Anon
  47. Re:possible solution by GoofyBoy · · Score: 3

    You mean the META tag already exisiting?

    --
    The surprise isn't how often we make bad choices; the surprise is how seldom they defeat us.
  48. Divide and conquer by xixax · · Score: 1

    So it's big? Do you ask every random person in the street for the best place in town to buy comics? You look for people who are likely to be clued in and see where they hang out, or where they recommend. You'll also get a very differnt answer if you ask a BDSM mistress for a "rack" than if you ask a geek (well, most of the time...).

    You just need to make sure the web has these same cues and communities built into it.

    Xix.

    --
    "Everything is adjustable, provided you have the right tools"
  49. Of course it is. by Sylvestre · · Score: 2

    What do you expect? Pay for listing is the only way search engines will make money. Think about it: Would you use a search engine that charged a little, but provided much better results (ie no dead links, no off-topic stuff)? Think NorthernLight.Com does this.

    1. Re:Of course it is. by jmccay · · Score: 1

      I wouldn't use a search engine that charge. More than likely they'd end up chargign a company some pay per slot fee to allow companies to get better positions in results. Oh course this means all the porn sites will come to the top, and they I'd have to pay to get this?

      --
      At the next eco-hypocrisy-meeting, count the private jets used to get to the meeting. Should be interesting to see that
    2. Re:Of course it is. by eldurbarn · · Score: 3
      Actually, Northern Light does not charge to access its search engine, or to access it's classification links of the web.

      It has a second, separate business re-selling articles from trade journals, professional publications, etc., for which you do pay... but less than you would pay to buy the same thing in dead-tree format from the publisher.

      What confuses people is that, by default, the main engine will return hits on both the web and the special collection.

      --
      -Eldurbarn
    3. Re:Of course it is. by Seedy2 · · Score: 1

      It's really a problem of search engine designers having to "outsmart" the wily, lying, deceptive marketing types who (try to) figure out how search engines list things and then do everything to get their sites to the top of the list. Despite the fact that their page may have nothing to do with what was searched for. A pay list just makes it easier to figure out, payment.

      This should be a computer science problem NOT a business problem.

      p.s. Applause for on topic FP! :)

      --
      Nothing to say here... move along
    4. Re:Of course it is. by patter · · Score: 1

      I suppose that people who are complete novices would find it difficult to find sites if they had to search for well known sites like hotmail.

      I would be surprised if this issue even exists when most of our parents (those that aren't techies) are no longer using the net.

      Every kid these days knows to type in companyname.com to get to someone's site, so how would hotmail (for example) be hurt by a porn site paying to be listed as hotmail in a search engine???

      Maybe I just need more coffee this AM, but I don't get it.

      --
      -- If at first you do succeed, try to hide your astonishment. -- Harry F. Banks
    5. Re:Of course it is. by rgmoore · · Score: 1

      It depends on how the system is structured. If you design it well, you may be able to set it up so that it doesn't pay porn sites to try getting listed higher than hotmail when searching for hotmail. You could do that, for instance, by charging the company for views rather than clickthroughs; that way it wouldn't make sense to try listing yourself on topics where you expect your clickthrough rate to be low. Of course if you're doing a straight pay for listing scheme you may also be able to afford to have a person screen the people trying to get a listing and denying those that don't make sense, like trying to get your porn listed pretty much anywhere except for the porn category.

      --

      There's no point in questioning authority if you aren't going to listen to the answers.

    6. Re:Of course it is. by Monkey-Man · · Score: 1

      What a waste. . .

    7. Re:Of course it is. by truelight · · Score: 1

      If porn sites were allowed to bid on terms such as "hotmail", the search engine would quickly turn into a useless dump of links. Thus - people would stop using it. In effect - the search engines need to disallow such practice to survive. I don't think that using money instead of time/skill to optimize your keywords is very different - nither ethically nor in quality.

    8. Re:Of course it is. by Sven+Tuerpe · · Score: 1

      Would you use a search engine that charged a little, but provided much better results (ie no dead links, no off-topic stuff)?

      Certainly not, since to me it happened several times that information I looked for was available only in a small number of copies on sites unlikely to pay for search engine listing, e.g. some student's homepage or sites of un-organizations like local Linux user groups. I don't want perfect search, I want to find specific information in reasonable time.

      --
      http://erichsieht.wordpress.com/category/english/
    9. Re:Of course it is. by KingAzzy · · Score: 2
      Google has a really neato ad model that anyone can afford. You basically set up a small 2-3 line ad that is linked to certain keywords or phrases. You are billed around $15 per thousand impressions of your ad. You set up the limit that you're willing to pay for a bing! it's all done.

      Very cool and clever idea. Now small businesses can promote their sites without having to invest mega-$$$$ for the traditional "banner ad".

      --

      --
      $ chown -R us:us yourbase

  50. The trend is toward pay for listing by DanThe1Man · · Score: 2
    The trend is toward pay for listing

    Is this really a big deal? Hasn't anyone used the yellow pages in a phone book before? People have to pay to be listed in that, and its very useful for finding a companies.

    1. Re:The trend is toward pay for listing by alen · · Score: 1

      I doubt it. I bet when you open an account with your local telco it automatically adds you to the database that the White Pages is made from. To unlist you requires some human labor. Of course they make money of it, by I doubt it's all as evil as some people think. It's kind of like caller ID and all the other useless services. I bet the current switches are set up for it. All somebody has to do to activate your line for it is to enter the right command in a swith OS. About a 30 second task. Yet it costs $$$ every month.

  51. Ad impressions are increasing! Increasing! by interiot · · Score: 2

    whee
    --

  52. Oh boy, more gloom and doom by Illserve · · Score: 2

    Frankly, this article doesn't depress me as much as the quality of google results impresses me. Whether it's 1% or 100% of the available space, I can very often find exactly what I'm looking for.

    Now maybe there are vast areas of the web unavailable to google searches because of language quirks or protective admins, but so what.

    They have as much a right to exist uncataloged as I do to have an unlisted phone number. If sites want to be indexed, they can register with a search engine. If they don't, and are unreachable, so be it. I don't see what the problem is.

  53. INDEXING search engines are dead by mozkill · · Score: 1

    if you own stock in an indexing search engine, you should dump it now, because distributed search engines are going to replace them. if you don't believe me, just ask all of the young peer-2-peer developers out there, because distributed computing can solve this, and there is a huge hole in the efficiency of the internet that these developers can fill, and THEY KNOW they will be famous if they solve it. its a race. run. run. run. your going to lose if your still wearing your penny loafers.

    --

    -- Betting on the survival of the media industry is a serious risk. I advise investing elsewhere.
    1. Re:INDEXING search engines are dead by Zeinfeld · · Score: 1
      if you own stock in an indexing search engine, you should dump it now,

      Agreed, but only because it is a high cost business whose revenues are nowhere near what people hoped for.

      We had distributed peer-2-peer directories back in 1993, that is what harvest did. The results were not good, anything that requires effort on the part of the Web site providers tends to be hard to get off the ground. Ultimately the central index model turned out to work after all if you threw hardware at it. It still does.

      If you want to look up a business on the web then a paid directory is actually likely to be a bit more useful than a non paid one since the companies that have the cash to buy space are more likely to have a bigger stock.

      I no longer use Yahoo and Lycos for much however because the information in their index is so outdated and baddly organized. I get the feeling that most of the data is several years out of date.

      Google works and appears to have established itself as a premium brand. It may only index a part of the web but the relevance feedback means that it is indexing the right part...

      --
      Looking for an Information Security student project suggestion?
      Try http://dotcrimeManifesto.com/
  54. Google by citizenc · · Score: 5

    I don't know WHAT they are talking about -- I can find ANYTHING that I look for on Google -- even sites that I have just created a day or two ago have been found. These people just aren't using the right search engine, dammit! =)

    ------------
    CitizenC

    1. Re:Google by persist1 · · Score: 1

      Muahahahaha... I second this sentiment.

      There is a learning curve associated with searching, but when everything is said and done the default boolean on a search is AND. If you're looking for something in particular - especially on Google - it helps to use lots of keywords.

      --
      ...When in doubt, think for yourself.
    2. Re:Google by Seedy2 · · Score: 1

      Maybe we need a way to download the entire list of hits that the search engine returns, then run a client side search on THAT information.
      (grep grep perl perl grep, awk?)

      --
      Nothing to say here... move along
    3. Re:Google by Jagasian · · Score: 1

      Yeah, I get the joke. If you can search on google, you are already on the internet!

    4. Re:Google by rgmoore · · Score: 2

      Sure this is a problem, but it's more an example of applying the wrong tool. Google was never intended for comprehensively finding every scrap of information about a particular topic; it was designed to find the few most relevant and interesting sites discussing a particular topic. Using a general purpose tool for a highly specific task is a wonderful way of getting frustrated but not an efficient approach to solving your problems.

      In fact, there are specialized search engines for dealing with specific topics. There are engines specifically for looking for images, ones for looking at specialized topics, and so on. There are also specialized, classified catalogues of information of exactly the kind you suggest are needed out there for people who need to know about them. If, for instance, I want to learn about a specific topic in biology, I might very well start out by looking at PubMed, a special purpose index of biological research articles. You just have to know where to look for the special purpose tools.

      --

      There's no point in questioning authority if you aren't going to listen to the answers.

    5. Re:Google by akira2001 · · Score: 1

      sorry to inform you, but yahoo and google are two different things. Yahoo is a web directory and google is a search engine. When you search on Yahoo, it searchs all of yahoo's directories and displays the results. Recently, Yahoo has started to use Google as a secondary search for searchs that turn up very few results. Google is a search engine that uses a spidering technique to cache websites and reference them by keywords.

    6. Re:Google by stain+ain · · Score: 2

      Well, the article has a point: finding the right webpage is difficult, but it has been since all this ever started and is not something particular to Internet.
      The most difficult thing on the internet, to my belief, is to find the very specialized article that you are looking for. The problem is that it may not even exist. Finding the same very specialized article in a huge library full of journals is even more complicated. So what? Next article.

    7. Re:Google by weinerdog · · Score: 2

      Known item searching is dead easy using any search engine, so long as the item is in the database. It's also easy to find something about anything, so people who just want some information without being overly concerned with how accurate or complete that information is can also easily find something to keep them happy.

      Serious research, on the other hand, requires a more quality-conscious search. A researcher will want all of the most relevant information about a topic, and Web search engines do not provide this very well at all. Weighted keyword searching is no substitute for professionally catalogued and classified documents in cases like this. In some cases, researchers will want an exhaustive search: everything relevant about the topic. For example, a Ph.D. candidate would almost certainly begin their thesis by locating everything academic published in their field of study. This is downright impossible with Web search engines: even if their databases were complete, relevancy is so bad that you would probably have to wade through thousands upon thousands of hits to find a hundred or so truly relevant sites. This is especially true of any subject that is susceptible to search engine spamming.

      --
      There's no such thing as Scotchtoberfest!
    8. Re:Google by RedWizzard · · Score: 2

      WTF are you looking for?

    9. Re:Google by RedWizzard · · Score: 2
      I don't know WHAT they are talking about -- I can find ANYTHING that I look for on Google -- even sites that I have just created a day or two ago have been found. These people just aren't using the right search engine, dammit! =)
      They're talking about using the Web for serious research. The article actually misrepresents the problem for hardcore researchers on the Web. The problem is not so much finding information, it's finding information you can trust. But for most other people the problem is just finding the information and it's not just that they're not all using Google, it's also that they don't know how to search properly. They don't know who to formulate queries which are specific enough to weed out the bad pages.
    10. Re:Google by Confound · · Score: 1

      Amen, Persist.

      The only thing I can't find on the internet is decent fanart of Legend of Dragoon. Anything else is easy when you know what search engines to use, or even which sites might link to the stuff you want.

      --
      !-- wit --!
    11. Re:Google by BlowCat · · Score: 1
      This is an example where Google cannot find what I'm looking for, but it finds 14 pages with those words in absolutely unpredictable combinations.

      It looks like you can find everything in all combinations in the death notices.

    12. Re:Google by MeowMeow+Jones · · Score: 4
      A google search on the word "internet" just returned 65,500,000 hits. With that many hits, it makes it hard to figure out how to even get on the internet in the first place, let alone use a search engine!

      Trolls throughout history:

      --

      Trolls throughout history:
      Jonathan Swift

    13. Re:Google by entraxon · · Score: 1

      I third the motion. Google is pretty good, most of the time, for serious research. At least it's fast. The big problem is that you can't do Booleans. HotBot is good if I want to make sure I get a phrase and words in proximity, but it's almost always way out of date. Sometimes you get lucky, though. I never know what to expect from Northern Light. DogPile is just that.

      --
      Cogito Tute (desiderata nostra eriximus, vestra nunc erigite)
    14. Re:Google by CKW · · Score: 1

      Yeah, but a couple months ago I was just astounded to find out that only 5-20% of all the people on the net know about or use Google. (It must be better now that Yahoo uses Google, so we can add those two numbers up..)

      Even then, searching can be a bit of an art or science. There are people that sit next to me at work, whose IQs must rival mine, who complain bitterly about not being able to find anything using Google. I can find anything within minutes or less.

      Remember all those years in school when people bitched and bitched about "why do we have to learn this shit??".

    15. Re:Google by Tar+Ciryatan · · Score: 1

      Nah, they just have tiny little brains that rattle inside their heads...maybe it will somehow grow...but I seriously doubt it

      --
      -Tar Ciryatan, Angry Hermit-
  55. Progress between Yahoo! and Google by ZahrGnosis · · Score: 4

    The article skims over the fact that search engine technology is progressing fairly rapidly, and that some companies (Google) are creating new technologies that exploit the way the web works while Yahoo! and some others are relying on older technology for some things (like filtering pages by hand for their directory!).

    Google's approach is novel; make the web pages rank themselves. If more people link to your site, it's probably a better site. If few enough people link to it, it probably isn't and besides that it'll probably never be found.

    Web site creators have to do the legwork to get their sites recognized, and going to a general search engine to do it isn't the way. If someone makes a site and tells their friends about it, and their friends like it and link to it, it'll get picked up; that's the way of the web. (At least, it'll get picked up by crawlers like Google, and even ranked highly if enough people link to it).

    Search enginge tech has to catch up to dynamic pages yet, but it's the fault of the content creators if they want their pages on search engines but can't code enough alt tags to make their stuff show up.

    In any case, the bulk of the web does work, and good pages get recognition. I've always eventually been able to find what I'm looking for on the web, no matter what the topic. Search engines have to grow like everything else, but so far they're the best thing going and getting better.

    1. Re:Progress between Yahoo! and Google by Jagasian · · Score: 2

      Yeah, but without a truely intelligent AI, search algorithms will always be exploitable. Keyword spamming is the old school method, and with google, maybe a combination of keyword spamming and link spamming (have tons of other bogus sites link to yours) would work.

    2. Re:Progress between Yahoo! and Google by a_hofmann · · Score: 1
      Google's approach is novel; make the web pages rank themselves. If more people link to your site, it's probably a better site. If few enough people link to it, it probably isn't and besides that it'll probably never be found.

      Web site creators have to do the legwork to get their sites recognized, and going to a general search engine to do it isn't the way. If someone makes a site and tells their friends about it, and their friends like it and link to it, it'll get picked up; that's the way of the web. (At least, it'll get picked up by crawlers like Google, and even ranked highly if enough people link to it).

      I agree with you that this approach makes Google a useful search engine to get good hits about a specific topic.

      The problem that many people oversee is, there are many great, informative web pages about topics that just don't draw enough attention to get linked very often. Advanced scientific sites, seldom occuring computer issues and the like don't get linked because there are not many people interested in that kind of stuff...

      The same with mailing list archives. I guess that most software issues have already been discussed and solved, and answers wait to be read in the archives, but then try finding the solution to "extra seldom compilation problem X on quite popular program Y with special system configuration Z"

      As soon as your search involves popular keywords your result gets overranked thousands of often-linked results about the wrong stuff. (And no, there is not always that "great" combination of keywords to reduce results to I love Google as anyone else does, but it surely has its limitations...

    3. Re:Progress between Yahoo! and Google by NaturePhotog · · Score: 1

      In any case, the bulk of the web does work, and good pages get recognition. I've always eventually been able to find what I'm looking for on the web, no matter what the topic. I agree that the bulk of the web does work. But one question: how do you know that all the good pages get recognition? There may be a brilliant page on some topic you want, but it's amongst the estimated 99% of the web that Google, Altavista, etc. don't index, so you'll probably never see it. The odds suggest there's a lot of good pages (and thankfully, a huge number of bad pages) that they're missing.

  56. The power of "Word of Mouth" by decipher_saint · · Score: 4
    In the beginning all the best stuff was "word of mouth"... it still is ;-)

    This is how I found /. originally, many moons ago a fellow nerd clued me in.

    Did anyone out there get hooked up to /. through a Search Engine result?

    -----

    --
    crazy dynamite monkey
    1. Re:The power of "Word of Mouth" by cnkeller · · Score: 1

      Basically, using the peer-2-peer revolution (buzzword alert) in advertising is the next thing. Since people tend to key out traditional advertising, read doubleclick, some companies are try to combine the peer to peer aspect of traditional word of mouth and the web. I don't completely understand it, but someone else out there probably does.

      --

      there are no stupid questions, but there are a lot of inquisitive idiots

    2. Re:The power of "Word of Mouth" by autocracy · · Score: 2

      No, but I've done some searches after finding /. that would have led me to the site.

      I can't be karma whoring - I've already hit 50!

      --
      SIG: HUP
    3. Re:The power of "Word of Mouth" by ConsumedByTV · · Score: 1

      I found it out from a 2600 meeting.


      Fight censors!

      --


      "Not my manner of thinking but the manner of thinking of others has been the source of my unhappiness." - M
    4. Re:The power of "Word of Mouth" by rodrigo1979 · · Score: 1

      I found slashdot while participating in distributed.net's rc5 encryption challenge... Who the hell is slashdot, and why are they in first in the rank? after visiting the site, my prayers were answered

    5. Re:The power of "Word of Mouth" by frostman · · Score: 1

      actually, sorta. i can't remember whether it was google (i think it was) or maybe a link somewhere else (which is maybe link-of-mouth), but i definitely had never heard of slashdot when i first stumbled upon it last year.

      and i guess i'm a medium-geek, not a real guru or anything, but i do know a _lot_ of hard-core geeks in various disciplines (CS, aerospace, biotech, etc) and none of them clued me in to it.

      sure am glad i stumbled though... this is my third main news feed, after the reuters/AP feed on yahoo and, of course, freeB92.net ;-)

      --

      This Like That - fun with words!

  57. Then why did they refuse me on DMOZ? by moonkhan · · Score: 2

    I have been a PHP programmer for 2 years now and I applied to review the PHP sites. They rejected me citing an overabundance of PHP reviewers. Does this mean that they want people to review anything instead of what they know?

    1. Re:Then why did they refuse me on DMOZ? by Phrogman · · Score: 2

      I too applied for that category more than a year ago, and despite the fact that I am both a PHP programmer, and worked for a Canadian Search Engine called Maplesquare as the resident "Cybrarian" in charge of maintaining the database of links and descriptions, I was summarily refused.

      --
      "The first time I got drunk, I got married. The second time I bought a chimpanzee, after that I stayed sober" Arian Seid
    2. Re:Then why did they refuse me on DMOZ? by err666 · · Score: 1

      Did this happen recently? My entry page says:

      "And for those of you who don't already know, editor feedback has been down for about a week. It is the highest priority of all problems with dmoz,
      and I am working on it."

      You should perhaps try again at a later time or contact a meta editor directly via email. I think cc'ing staff@dmoz.org would be ok, too.

      I am a new editor there, too and I think we need more good editors :-)

      --
      reduce(lambda x,y:x+y,map(lambda x:chr(ord(x)^42),tuple('zS^BED\nX_FOY\x0b')))
    3. Re:Then why did they refuse me on DMOZ? by blonde+rser · · Score: 1

      Two jews are sitting at a bar.

      One says to the other "How did that interview for Radio Show host go?"

      "Th-th-they di-di-didn't hi-hire mu-me... da-damn anit se-semites"

  58. Topic Specific Search Engines by Phrogman · · Score: 2

    If you need to find more relevant documents on specific subjects, I recommend using topic-specific search engines. I maintain one for all subjects relating to Paganism and Wicca on my Omphalos website. True, the site submissions have to be manually approved and this can lead to backlogs of site submissions, but since I spider all of the websites I have included in the directory (totalling over 140,000 webpages so far) the relevancy of any search results is raised by the lack of clutter from unrelated websites.

    Similarly, if you are searching for information on Space Exploration try Spaceref where I used to work. Again, the directory is manually generated, and the results are greatly improved overall.

    Nothing guarantees improved relevancy (for general purposes nothing beats Google in this respect), but using specialty search sites helps immensely in many cases.

    --
    "The first time I got drunk, I got married. The second time I bought a chimpanzee, after that I stayed sober" Arian Seid
  59. Micropayments and Minipayments by Speare · · Score: 2

    If I were to set up a search engine:

    Every unique domain name found would get crawled for free. You paid for a domain name, you must care about your content.

    Every geocities-style cheap personal page would require a small fee to get crawled. Too much schlock; scan only the stuff people care about. You don't wanna pay your own fee? Ask a visitor to pay the fee. PayPal or something newer/better should do the trick.

    Every dynamic page like slashdot, everything2, or real estate listings, would have to have a more expensive agreement in place to get anything indexed. The buck stops at cgi. Waste no time on something that will probably be gone tomorrow.

    Commit on the resources it will take to prune and groom the stale dead stuff out of the index, regularly. Dead links are bad business.

    --
    [ .sig file not found ]
    1. Re:Micropayments and Minipayments by jaydub99 · · Score: 1

      You paid for a domain name, you must care about your content.

      Oh yeah. One only need look at the nearest domain to see that in practice. Now why does that site never turn up on google? It certainly gets enough links from /. so it must be useful!

      --

      Please mod me up. My grandma might not make it to the weekend and she always wanted me to hit karma cap.
  60. The Poor Man's Site Announcement Service by goingware · · Score: 2
    Are you tired of all those annoying paid search engine placement services? Ever tried using the free ones, only to be annoyed with tons of ads and to find your URL submissions blocked by the robosubmission filters on the search sites?

    Well, I'm tired of them too, and I write pages that I submit to search engines from time to time, and I've come up with what I feel is the best way to submit links to a bunch of sites:

    Direct links into the pages that have the URL submission forms on a bunch of search engines.

    Keep a text window open with your URL, title, description, for-public-consumption email address and the like, and use "Open Page in New Window" on all these links to manually copy and paste your information into a bunch of search engine submission forms.

    That's it!

    I got all these search engines off the Search Engines Category at the Open Directory Project. If you know of any pages that list a bunch of other search engines (there are many smaller ones, and a lot of special purpose ones) then drop me a line at crawford@goingware.com.

    In my index I provide brief notes about some of the engines, including mentioning whether they refuse to accept submissions without payment. I don't provide links to submission forms for the engines that won't list a site for free, and I'd like to ask you not to support the trend towards paid index and spider placement.

    You should understand that the vast majority of visitors to your sites don't get there through search engines, they get there because other people like your page and give you a link. The main value of search engines is to "prime the pump" so a few people start finding your site and then know to create a link for it.

    Create successful web sites by writing good web sites - see Some Web Application Design Basics for links to a few good pages written by experts that will start you well on the road to an appealing, successful website.

    Thank you for your attention.


    Mike

    --
    -- Could you use my software consulting serv
  61. Problems with DMOZ by frankie · · Score: 2
    the biggest problem is just that a lot of editors aren't active

    Also, I'm not impressed with ODP's handling of new applicants. I applied once last year and received NO reply, not even a rejection letter. I had applied to edit the category of "Personal Pages -- Surnames starting with U". It was to get my feet wet, learn how to be an editor, see how time consuming it might be before adding a more serious category. I mentioned that in my application.

    I resubmitted it in February and successfully received . . . a rejection letter! They decided I have a personal stake in the category (note my last name) and might be biased. Oh no! We must prevent the potential for abuse of Web Pages about people named U*!

    If I'm not allowed to edit for categories that I know something about and I'm interested in, then what exactly should I volunteer for, and why should I?

    1. Re:Problems with DMOZ by bluebomber · · Score: 1

      I had a very similar experience. Didn't thrill me too much. I wouldn't have had any problem with a rejection, but the reasoning was kind of dumb -- I had a connection to the category. Duh. Of course I have a connection to the category! Why the hell else would I be interested in becoming an editor???

      -bluebomber

  62. Re:No one expected Yahoo to scale infinitely by hvoss · · Score: 1

    Does anyone know if there are any search engines 'out there' that help implement this 'two step' process?

    You know, suppose I am looking for something that requires these two steps, but I don't know anything about the subject (That's why I'm searching in the first place). So I don't know what my first search should be like.

    Hans Voss
    ---

    --
    Hans Voss
    ---
    "I have no special talents, I am just passionately curious" -- Albert Einstein
  63. A problem of scope? by debaere · · Score: 1

    I think the problem with the current search engines/directories is that they are trying to index the entire web into one handy-dandy catch all whiz-bang database. IMHO its too much for a single system to deal with.

    It seems to me that search engines/directories should start to specialize in specific topics. For example, science, pop culture etc.

    We might have a fighting chance this way.

    Thoughts?

    Dave

    DOS is dead, and no one cares...

    --

    DOS is dead, and no one cares...
    If there's a Bourne Shell, I'll see you there
  64. That's part of the problem. by Smarmy_1 · · Score: 1

    Yellow pages list companies. On the Web, sometimes you don't want that, you want to find non-commerical sites. Fan sites, non-biased reviews and information, etc. That's getting harder and harder to find via search engines. All the commerical sites appear first. Obviously, more and more sites are going commericial to cover costs, but there's still a lot of quality information on non-profit sites, and it's getting harder to find. At least, that's my experience, even with Google. Who knows, it might just be the harsh reality of the future of the 'net. Surfers standards' of quality have risen much higher, and non-commericial sites have a harder time keeping up with the companies that have a whole team running a site to make it look pretty. That doesn't mean the information won't be missed, though. So is it really a big deal? Yes, I think so.

    1. Re:That's part of the problem. by Smarmy_1 · · Score: 1

      No, but thanks for the compliment -- that's how I'll take it. I don't agree with a some of the viewpoints that Katz spouts, but the man knows how to write!

      Moderators, I prounounce this post wholeheartedly OT.

  65. Google by Steeltoe · · Score: 1

    If you really think about it, being able to search up on cached contents on Google is actually a GOOD THING. Now if they only would make it an option by the search-criteria, and make their spiders check the more popular links more often, it could really improve their search-results.

    On another note, they should probably spider more news-sites like Slashdot and Freshmeat frequently. There they can get the new links as they arrive, good or goatse.cx.

    On the last note (Yes I promise ;), specialized search-engines are probably the best option if you want some obscure student-paper or 4-year-old newsflash.

    - Steeltoe

  66. Re:Directories are not search engines by dingbat_hp · · Score: 1

    doomed to failure until someone implements something like the Dewey Decimal System for web pages

    Yes, we're stuffed -- but Dewey Decimal isn't the answer (we can do a lot better than that).

    There's an initiative around that's gaining considerable momentum - the Semantic Web. It starts from one bright idea by one guy, but as the guy in question is Tim B-L, then he gets listened to. There are solutions to all this. We've barely started on what we could easily achieve for indexing the web, without even trying for the really hard stuff.

    Once basic semantic level indexing becoms commonplace, through tools like Dublin Core, then take a look at ontological descriptions and projects like DAML.

    There's a huge amount happening in this field research-wise, it just hasn't hit the punter's web yet.

  67. possible solution by zpengo · · Score: 2
    Actually, I've been working on a proposal for a possible solution to this mess. It will never be implemented, of course, because the web is based on tradition and archaic protocols, not on innovation, but I think it nifty food for thought anyway.

    My idea is to come up with a standard set of headers that provide directory/hierarchy information for search engines. This is much more useful than keywords, et al., because they allow for top-down directories such as Yahoo! and the Open Directory project. Sites like this could be automatically created simply by crawling the web and organizing sites according to a category specified in their header.

    The problem with keywords is that it's easy to spam them. If you need more hits, just add "bestiality", "Natalie Portman", and "hot sluts" to your keywords. The keywords often have nothing to do with the actual site.

    It would be much harder, however, to spam a directory structure, especially if most search engines limited the amount of directories a page could specify to, say, two or three.

    The header would be easy to implement. It could be done very easily within the comment tags of existing HTML. The only problem is getting people to do it. It would work beautifully if Yahoo! or another large site were to give up on "hand-picked" sites and start letting people specify their own location on the structure. Then anyone who wanted their site to be locatable would specify a hierarchical subject category in their header.

    Great idea. It'll never happen.

    --


    Got Rhinos?
    1. Re:possible solution by SirWhoopass · · Score: 1
      From the original post:
      It would be much harder, however, to spam a directory structure, especially if most search engines limited the amount of directories a page could specify to, say, two or three.
      The idea is that you only get listed under a few entries.

      Of course, a search engine could use this type of protection with keywords by only using the first two or three keywords listed.

      I would say that the bigger problem with this scheme is limiting the entries to only the "top" page of a site. The directory would be useless if, under Dr Strangelove, it listed each of the dozen or so imdb pages related to the movie instead of only the overview page.

    2. Re:possible solution by ichimunki · · Score: 3

      This also brings up the problem of being able to use multiple pages that are essentially redirects to get around the listing limits. For instance, I make http://www.hotgrits.com/natalie1.html, .../natalie2.html, .../natalie3.html, etc which all are really mirrors of http://www.hotgrits.com/portman.html, which is the main page for my site. The only thing I change is the category for each page so that my site effectively shows up in numerous places in the directory. With a properly constructed CGI program I could be listed in every category without having to work that hard.

      --
      I do not have a signature
    3. Re:possible solution by Big+Sean+O · · Score: 1

      Your http://www.hotgrits.com/natalie1.html link is dead...

      --
      My father is a blogger.
  68. reorganize by daevt · · Score: 1

    if the web is becoming unsearchable:
    make smarter search engines:
    only search part of the net
    very specialised
    reorganize search engines
    reoranganize the web

  69. SearchEdu.com by webdoyenne · · Score: 1
    You may want to try this one. "SearchEdu.com index, over 20 million pages in size, covers exclusively education related web sites."

    Company behind it -- MaxBot.com -- also offer SearchMil.com ("Over 1 million military pages indexed and ranked in order of popularity."), SearchGov.com and Search eBooks.com.

  70. You must be joking by lythe · · Score: 1

    I don't know WHAT they are talking about -- I can find ANYTHING that I look for on Google -- even sites that I have just created a day or two ago have been found.

    You're kidding, right? Or have you just not tried it in the last year or two? I submitted my site to Google -- and everywhere else -- two months ago and have yet to see it. And I wasn't about to pay the $199 to Yahoo or Lycos to get it listed. Bastards.

    --

    Slash has nothing to do with Slashdot.

  71. Re:Directories are not search engines by heikkile · · Score: 2
    If yahoo had an option where you could submit a site that you think had off-topic keywords [...] and they wouldc ompletely remove all occurances of an offending site from their database [...]

    This would require a lot of human verification, for there are many possibilities for abuse. I could always report my competitors for false keywords, just to keep them out of the listings. And as soon as we get to more exotic topics, who can say if a keyword is relevant or not? And how relevant is relevant anyway - if a porn site does have many pictures of women getting out of girl-scout uniforms, is "girl-scout" a valid keyword?

    There are simple ranking algorithms, that weigh uncommon keywords more, and take into consideration how many keywords the site claims to relate to. These might be more effective.

    --

    In Murphy We Turst

  72. Combine Efforts / Engines? by ClubStew · · Score: 1

    What about combinin efforts? AltaVista already wants to own all search engines (hence the patent), why don't they form deals with other search engines that quite frankly suck (like excite or lycos) and distribute the work load?

    Of course, leave Google out of the mix. They already kick ass.

  73. Re:Web directories could be automated. by BobDowling · · Score: 1

    If META tag spamming is so much an issue then there are algorithms that might help. A simple one would be for the "value" of a particular META keyword to be reduced according to the number of keywords provided.

    So, in a page with a single META keyword "sex" the sex would count as value1.00. In a page with META keywords "sex, drugs, rock-n-roll" the keyword "sex" would have value0.33.

    --
    Those who do not learn from Dilbert are doomed to repeat it.
  74. Re:Learn the syntax! (RTFM!!!!) by rjamestaylor · · Score: 1
    --
    -- @rjamestaylor on Ello
  75. If you want "information," ask for it. by yerricde · · Score: 2

    Most searches for herbal medicines (e.g. "5-HTP") find you way more hits (especially the high ranking ones) from companies trying to sell you it than actual objective information about it.

    Had you typed 5-htp information into Google, you would see 5-htp information, with Harvard as result #2.

    --
    Will I retire or break 10K?
  76. HTML; +the by yerricde · · Score: 2

    "html" 188,000,000

    But, as usual for Google, the first three results are highly relevant for at least one common sense of the search term. (The first is W3C's official HTML standards site.) I didn't realize how bad AltaVista sucked until I tried it after using Google for a year.

    does anyone find anything better than "and"???

    +a comes close. It seems they're blocking searches for +the.

    --
    Will I retire or break 10K?
  77. Sites you may have missed by yerricde · · Score: 2

    Yep, all that content, and yet when there's a slow day at work I can still run out of interesting stuff to look at on the internet.

    little gamers, penny arcade, goats (not goatse), and badtech: online comics. It'll take a while to browse the entire archive.

    everything 2: nearly half a million writeups on topics from aardvarks to zzyzx.

    --
    Will I retire or break 10K?
  78. P2P advertising explained by yerricde · · Score: 2

    Basically, using the peer-2-peer revolution (buzzword alert) in advertising is the next thing.

    I hope you're not talking about spamming Gnutella.

    some companies are try to combine the peer to peer aspect of traditional word of mouth and the web.

    In this model, surfers are paid to recommend the sites to other surfers. Spedia is a prime example, as was AllAdvantage until it went to a "sweepstakes" scheme. Other examples can be found in the many sites that use Recommend-It.

    Hatten är din, hatten är din, habeetik, habeetik.
    --
    Will I retire or break 10K?
  79. AltaVista hates Lynx by yerricde · · Score: 3

    Of course, you'd need to use this technique with a search engine who takes dead link submissions. Eg., Altavista and its "Add or Remove a Page" link

    AltaVista does not allow submissions from visually impaired users or users of text-based web browsers such as Lynx, Links, or w3m. Its submission page uses a GIF image (burn all GIFs) to display rotated text in various fonts. The user is supposed to read the text and enter it into a field below. But visually impaired users, users on text browsers, and users on browsers whose developers have been cease-and-desisted by Unisys never see the GIF and cannot contribute links to AltaVista.

    --
    Will I retire or break 10K?
  80. Re:I think it can be good by Grab · · Score: 2

    Not quite. Disney can pay zillions to be top in a search for "animation techniques", but they're actually not a reference site for learning how to do animation. Ditto a search for "electronic circuit design" - Intel could pay to be listed on there, but you're not going to find much info about designing electronics on their site. Paying for listing on those kind of things simply increases the noise, whereas Google's system looks for sites which are popular references on a subject.

    But you're right in some ways, too. If you search for "children's toy company" or something (and temporarily ignoring the other 'toys' listed ;-) then pay-per-listing is more likely to show you ToySmart or whoever (are they still going? can't remember), which you actually want.

    Good points and bad points about both. I think the best would be a two-tier system - a pay-per-listing one for commercial stuff (Amazon, etc) and a free one with a reference-check system for information-search purposes. Maybe the pay-per-listing could subsidise the free one?

    Grab.

  81. Re:No it's not. Searchabel indicies == IP theft. by Seedy2 · · Score: 1

    " Wrong answer."

    The web is a publicly accessable resource, if you don't restrict acces to your pages then I can bookmark a page deeply imbeded in your site, and go directly there anytime I want. I will dispute your "right" to tell me I can't give my list of bookmarks to anyone I want.

    It is the web designer's responsibility to restrict access, if such is needed. If I can go directly to a page and skip your ads (and you don't want that) then you need to redesign your page. Besides I can filter out all the ads so you don't get any hits anyway. (I don't though)

    I don't know where you got your definitions, but you need to look up hacking and IP, I do not think they mean what you think. :)

    Last time I checked theft ment take something from someone, I can't take something from you if you don't have it.

    Actually many of these sites are stealing from the companies that pay them for ads. At least I would consider it stealing if I paid money for an ad that merely sat on a "gateway" page, or any page that dosn't have content that will hold the browser's attention.

    --
    Nothing to say here... move along
  82. Re:No one expected Yahoo to scale infinitely by Puck3D · · Score: 1

    Google's only about 2 months behind on their indexing, their cached copy of slashdot is from january 23

  83. Specialized Indexing by Code+Archeologist · · Score: 1

    searching for a specific site through a search engine is pretty well useless, unless you are a wizz at forming your querry properly. The only way I have ever been able to find what I was looking for was to search for index pages relating to the topic I was looking for and then jumping from index to index until I found what I needed.

    These indexes tend to keep a small list of sites and and tend to check on these sites often for dead links or being off topic.

    The sad fact though is that the larger and more complex that a system becomes (such as the Internet is becoming) the more chaotic and disorderly it will become. Like watching five buterflies fly around is alot easier than watching a million. And search engines are not going to be able to keep up, because the enviroment that we ask them to keep track of is billions of lines of text that are constantly changing. Then in this enviroment we want it to find a specific word that has context to what we want, while at the same time cutting out the superflous chatter. Impossible... at least impossible now, it is going to take a major breakthrough in search algorythms for this puzzle to be cracked.

  84. Death of the Internet Predicted! by Rimbo · · Score: 1

    News at 11...

  85. guide to guides by fleener · · Score: 2
    The answer to keeping pace with web growth is to have sites like Yahoo be a "guide to guides" instead of "guides to everything." Instead of listing 50 links in a "Cheese" category, list one or two or three links to web sites that are their own mini-portals to cheese.

    The content on mini-portals is a million times better than Yahoo's old haphazard system. I gave up submitting non-commercial links to Yahoo because you wait months before being sure they didn't list you, then resubmit and wait months, then resubmit... etc.

  86. It's not engines which can't keep up, it's users by kalifa · · Score: 1

    When I use google to make a search on a technical topic related to my work, usually the vast majority of the links provided by google are relevant.

    The problem is, there are too many of them for my little hands, my little head, and my little time allocated on earth. Google scales, I don't.

  87. Manufacturing the news by Alomex · · Score: 1
    The article is quite crappy.

    I'm somewhat familiar with web search engine technology since its inception. Over these six years the quality of the results has gone up while the precentage coverage has remained steady.

    So the main premise of the article is moot.

    Also their data is flawed. The article quotes a bogus 550 billion pages which includes dynamic content not meant to be indexed. If we used a realistic definition of what a web page is, the total number of pages out there would rate in the 10-30 billion, tops.

    As computational linguistics improve, as well as usage pattern agents such as copernicus and firefly are refined I expect the quality of searches to continue improving.... Just recently I came across an article demonstrating amazing automated content subject classification (coming soon to a search engine near you).

    Are these "the sky is falling" printed press articles the forerunner of trolls?

  88. Gnutella by Alexius · · Score: 3

    What ever happened to the peer to peer idea of searching? I remember when Napster and GNUtella started, people were talking about how this might actually alter the way searching was happening on the web. By having each server tell us what they have, we are assured that when someone searches for how to replace a broken window, they won't get what they don't want.
    --------------------

    --
    `Lex - Find Me Here: Text Appeal
  89. Re:What about the Yellow Pages? by cyber-vandal · · Score: 2

    Libraries are government-funded, so everyone has paid for them already. A government-funded search engine might not be a bad idea though.

  90. Re:Directories are not search engines by cyber-vandal · · Score: 2

    How does the Dewey system address that, since a book can also fall into more than one category?

  91. Re:Another way by cyber-vandal · · Score: 2

    And how long do you think it will be before microsoft.com, mpaa.org and riaa.org disappear from all search engines?

  92. Re:The Grumbling public? by frisket · · Score: 1
    One simple and obvious way which they have missed is to run a standard parse on submitted pages: if the pass through without error, add them immediately, else they go into the queue.

    That way we would at least get reusable info. Still doesn't address quality of content directly, but IMHE those sites which take care of their information format tend to be the ones taking care of their information content as well.

    The fundamental problem right now is that the search engines don't give a damn about content quality or format, just raw hit rates.

    ///Peter

  93. Quality vs. Quantity... by way2slo · · Score: 1
    As far as search engines go, I would much rather have a few results that point to good, informative sites rather than thousands of possibilities. I would think that a project like the ODP has the potential to be more useful in the long run than a webcrawler or a vast indexing of pages. That is if the focus stayed on the quality of the pages they linked to. Let us think for a minute. Information on a web page is either:
    • New and Original
    • A copy, mirror, or just links to something Original
    How much of the web is truly new and original information? You got me. A lot of it, I would guess. However, if a search engine or directory structure would be able to pick the best and most informative sites and link to those they could accurately address the majority of its searches. How do you determine that something is the best site?

    Could an algorithim do this? Perhaps, but so could a staff of people. A web crawler brings in new sites. Then someone on the staff looks at the sight and asks "Is this one of the best sources for original information on any topic?" If it is, they add it to the database and associate it with the proper nouns.

    Let's say I try to search for a book titled "foo bar" by "john doe". In the search for "foo bar", I may not find anything if "foo bar" is a common phrase or common words. Same goes for the name. I would get loads of links to sites that mention people with that same name. Why not have the search engine ask for more information if the first search comes up too big. Instead of trying to find it directly, which you may have about the same odds as winning the lottery, ask the user to describe the thing that they are searching for. The user might enter something like "It's a book about widgets."

    From there, the search engine might see the word "book" and pop up a link to Amazon.com or even do a search on Amazon.com and return those results. Have some built in intelligence that can match up a noun with the best sites about that noun in it's database. It could search for "widgets" and return those results. Or even apply the first search to the results of the "book" and/or "widgets" search. How do you design a search engine that can make the association between the noun and the best sites about that noun? (best being the key word in that sentence) Also, the engine would need to have an intuition about how much information is needed. Do I have enough information to give them a highly accurate link or do I need to ask them to describe it more? Do I need them to describe a description? From what perspective are they comming from? (perspective does determine relavance to a degree)

    It's like that old saying "you have to have money to make money." In this case, you have to have information to get information. More specifically, you have to pass information about the information that you require. In other words, let the user provide the meta data instead of the database. The database would focus on the "noun-to-best-links" matches. The search engine asks the user questions and breaks the search request down into a set of noun searches which it can reference in its database. The best answers float to the top and if they are not what the user is looking for, odds are they can go to a site and find it there or get more meta data and try again.

  94. Re:Directories are not search engines by Lord+Ender · · Score: 2

    You know solving the keyword problem woulnt be too hard. I mean if yahoo had an option where you could submit a site that you think had off-topic keywords (like if it were a porn site and it had the keyword 'girl scouts' or something children might be searching for) and they would completely remove all occurances of an offending site from their database, then maybe things could be well classified. People would only use on-topic keywords so that they dont get banned from yahoo.

    This would make searches SO much more acurate. It would just take someone to have the balls to say "you are abusing the keywords so now nobody will ever get to your site from our search engine."

    --
    A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
  95. think. by jborg · · Score: 1

    It is simply not possible in the future having a single search engine categorizing the whole web. What we need is specialized search-engines. Portals go through the exact same at the moment. No single portal can satisfy all web-related needs of a single person. Therefore, more specialized enginges is the answer, and when it comes down to it, i'm ready to pay for such a service. As long as I get what I need..

    --
    /JacQ "Find the metaclass of everything and find God.."
  96. Web directories could be automated. by Pinball+Wizard · · Score: 3
    I think most search engines are ignoring metatags these days because they are so commonly spammed. So, the only way to have a directory these days is to have it be completely manually built. Thus, it is impossible to have a comprehensive web directory, unless you are willing to put up with spam.

    I have a suggestion to anyone who is thinking of implementing a better directory. First, define the categories, and allow any site to submit their site to their categories. Then, introduce moderation to the mix. Allow users of your directory to rank sites in terms of suitability to the category. Allow them to create red flags for people submitting porn to health->teens->sexuality, and so forth. Let the users do the work!

    I think moderation works well for sites like slashdot, why not a moderated web directory?

    --

    No, Thursday's out. How about never - is never good for you?

  97. Pay For Placement Engines by wdavies · · Score: 2
    [Disclaimer, I work for GOTO].

    Whilst Google is clearly the best for non-commercial searches, GoTo is apparently the best for commercial searches (if you want a service someone will make money from supplying).

    It nicely gets around the problem of manual classification, by effectivley using market forces to make an advertiser classify themselves correctly (or pay for referrals which make them no money).

    Let say I have a hotel in San Francisco, but bid for the general term Hotel ($1.03). Now I will presumably only get some custom if they were looking for a Hotel in SF - otherwise I just paid GOTO $1.00 for a useless referral. Better I list myself as HOTEL SAN FRANCISCO, even though this costs ($1.71), I will have a much higher conversion ratio.

    Of course, if I am a US Hotel Chain or Broker, then maybe I would bid on the general Hotel keyword.

    End of self serving Sales Pitch :) Personally I'd like to see us create a GoTogle (TM) :-) that combines the best of both approaches.

    Winton

  98. Search engines won't die by fish4242 · · Score: 1

    In the article, the author talked about search engines dying from lack of use, but I don't think they will. There are always going to people on the internet who don't know much about the internet, and they turn to search engines to find what they need. They might loose the more experienced clientel, but they will always retain the novice clientel

    --
    "The heresy of one age becomes the orthodoxy of the next" - Helen Keller
  99. Learn the syntax! (RTFM!!!!) by EvlPenguin · · Score: 1

    That's really all it comes down to. You just need to learn what method of searching gets the best results. Personally, I use google and almost always find relevant pages on the first listing of results. After the second page, however, they get off-topic (what do you expect when you get 2,560,009 results?)

    It helps if you just search for an exact phrase (i.e. "amount of trolling on slashdot in relation to the vernal equinox"). But then again, I didn't need to tell you that.
    --

    --

    --
    #nohup cat /dev/dsp > /dev/hda & killall -9 getty
  100. I think it can be good by truelight · · Score: 1

    I do not think that the pay-for-placement in search engines must be a bad thing. If you have to pay to get into the top positions of a search engine - you are simply more likely to spend time on the site, and the other way around (you spend time - you pay). This makes sure that the top listings are truly relevant for your search, and if they aren't just scroll past the paid ones until you find good ones. By the way - let's face it - getting high in search engines is not about having a good site. It's about knowing how to optimize you site for the search engines (of course, having a good site will help you a lot). Pay-per-listing is really just making use of money, instead of skill/time.

  101. Re:Not unsearchable yet by truelight · · Score: 1

    The interesting part is here:

    PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important."

  102. Re:Dynamic sites by truelight · · Score: 1

    For you non-search eninge gurus - many search engines stop at "?" in URL:s. Look at the slashdot url in your browser window. Most search engines refuse to index past the "?" sign. Stupid. I think an easier solution would be to make the search engines stop stopping at "?". Google already does this, of course.

    Why in the hells should dynamic content NOT be index. All the best sites ARE dynamic with databases.

  103. Re:Directories are not search engines by bcrowell · · Score: 1
    As far as I can tell, there's no oversight or "meta editors" and they are sorely needed.

    You're wrong. There are metaeditors, and they have quite a bit of power. They can delete entire categories, for example.

    I know that many of the active editors are people pimping their own sites and ignoring submissions.

    Listing your own site is OK, even encouraged -- having a site in the category is considered a sign that you know something about it. Keep in mind that just getting listed on DMOZ isn't supposed to be hard; it's not like Yahoo, which makes more of a point of being selective.

    Listing your own site as a "cool" site is not OK -- if you know of such a case, you should complain to the editor of the more general category of which it's a subcat. Keep complaining up the tree until you hear back.

    Actually, you hear a lot of people whine about how their applications to become editors got turned down, but the people at DMOZ who review the applications say that by far the most common reason is that the person's application makes it clear they're only interested in self-promotion. So yes, people do try to abuse it, but DMOZ tries to stop it from happening, takes complaints seriously, and is in fact getting criticized all the time for taking it too seriously.

    Open Directory sucks. I work for a fairly large international B2B company and not only is our company not listed, neither are several of our competitors. I eagerly await the day that AOL stops using so I can stop caring what Open Directory does.

    Maybe you'd have more luck getting listed if you'd learn more about how Open Directory works.


    The Assayer - free-information book reviews

  104. Re:Directories are not search engines by bcrowell · · Score: 4
    Yahoo and the like are doomed to failure until someone implements something like the Dewey Decimal System for web pages and then convinces a large number of webmasters to correctly classify their pages using it. That way a machine can do the hard work and only the person designing the page need do the actual work of making sure the page is classified correctly.

    Well, what you're describing sounds a lot like META KEYWORD tags.

    Having been an Open Directory editor in the past, I don't really think the problem is finding the right pages. Actually the biggest problem is just that a lot of editors aren't active, and it's hard to know who's active, because they're listed as editors even if they haven't logged in or checked submissions for a year. This creates problems for editors who have to cooperate with other editors, and may also give outsiders the impression that Open Directory is overwhelmed in general, when really it's just that the editor they submitted to is AWOL.

    Yahoo is doomed to failure because they don't have enough people working for them. Open Directory works just fine, because they have orders of magnitude more eyeballs working in parallel. No, Open Directory doesn't list every page on the web, and that's just fine with me as a user -- it's more useful because it's selective.


    The Assayer - free-information book reviews

  105. Re:Directories are not search engines by cthugha · · Score: 1

    Well, what you're describing sounds a lot like META KEYWORD tags.

    The problem with meta tags is that everyone has their own idea about how they should be used. I think Ichimunki had something like RDF or Dublin Core in mind when talking about a Dewey system equivalent for the Web. They define standard document properties which make searching through metadata a much easier process.

  106. most common word on the internet is... by ponxx · · Score: 1
    "and" according to google on 847,000,000 pages!
    "www" makes 326,000,000
    "it" 262,000,000
    "html" 188,000,000

    (all out of a total of 1,346,966,000 pages according to google, so more than 60% of pages include "and", having fun with stats :) )

    does anyone find anything better than "and"??? (to search type "+and" otherwise it will be ignored as it is a "common word")

    1. Re:most common word on the internet is... by ponxx · · Score: 1
      numbers are cool!
      even 6-digit numbers have around 50 matches!
      most 7-digit numbers have a couple...

      the higher the number the less frequent it is, expceptions being 10, 100, etc.

      it surprised me though that there were not significantly more instances of 42 :), I guess with 23 million matches the couple of HGTTG references don't have much weight ....

    2. Re:most common word on the internet is... by ponxx · · Score: 1

      I concede defeat...

    3. Re:most common word on the internet is... by skbenolkin · · Score: 1

      "+to": 871,000,000
      "+of": 901,000,000

      Be sure to mod this up, since it required so much time and original thought ;-)

      --Scott

      --
      "Frederick, is God dead?" --Sojourner Truth
  107. Re:Directories are not search engines by ichimunki · · Score: 2

    Keywords are not especially helpful in auto-creating directories. They are of limited value because only about 10% of web sites use them at all. Of those that do use them, there is no limit or structure to them. They are easily spammed. This is exactly why they were discarded as useful by SEs a long time ago. I have found keywords and descriptions helpful in my own efforts at classifying web pages because, once verified by a human (me), they could be used as a partial basis for text based searches (in which I also included META descriptions). If no keywords were given I frequently resorted to duplicating the description. If keywords were given, but no description, I could usually find a short excerpt from the site that could be copied and pasted.

    Open Directory works rather well, IMHO, as a directory because the editors have a strong sense of ownership and are given small enough chunks to do that the work is very manageable at the individual level (and they can do it in their spare time easily). But the human element is always going to be a potential issue with any directory. A problem you just don't have with Google.

    --
    I do not have a signature
  108. Re:Directories are not search engines by ichimunki · · Score: 2

    A book can be cross-listed in a card catalog, from my understanding, but since the book can only be in one place on the shelf, it's not a big concern. The librarian simply chooses the dominant topic, or uses one of 000 general classes (for things like encyclopedias, periodicals, etc).

    --
    I do not have a signature
  109. Re:Directories are not search engines by ichimunki · · Score: 3

    I never said it would be easy! :)

    Having actually tried to implement a DDC based web directory once, I am familiar with the problem that many pages would possibly fall under many categories. This is a problem with any directory-based approach, especially if you list a page in one category and then the page changes enough so that the category no longer applies.

    In your example, I would hope it would not be too much trouble for you to put a different class number into the pages that make up each logical section of your site. Or if the site is small enough, it would likely fall under something like "personal web pages", which may have a number of subclasses itself, and then you'd choose the one you felt appropriate.

    Again, this is a common issue among all directories, where do you put stuff? Do you allow multiple listings/classes per site/page? You still end up having to include some sort of keyword or text-based search so that users are not forced to browse the directory structure, guessing at the classification they are looking for or where it lies in the hierarchy. Text searches also allow for the possibility of searching based on content rather than metadata.

    Most of this is a non-issue, given that Google seems to have rather successfully implemented a non-directory type of engine-- succeeding where Altavista was simply unwieldy. At least that's my impression. I usually find what I want with Google.

    --
    I do not have a signature
  110. Directories are not search engines by ichimunki · · Score: 5

    Yahoo and DMOZ are web directories. This is a very human labor intensive way to categorize the web. Google is actually a search engine. It spiders out and runs an indexing algorithm of some sort to help it respond to queries. These are very different approaches.

    Yahoo and the like are doomed to failure until someone implements something like the Dewey Decimal System for web pages and then convinces a large number of webmasters to correctly classify their pages using it. That way a machine can do the hard work and only the person designing the page need do the actual work of making sure the page is classified correctly.

    Obviously this is fraught with problems similar to those of keyword spamming, but it's either that or build something like DMOZ on a decentralized basis, so that any individual maintainer builds a set of links that are tailored to his/her interests and either uploads them to a central sever or provides them as an XML document for an engine to work with.

    --
    I do not have a signature
    1. Re:Directories are not search engines by dagoalieman · · Score: 1

      Yes, but how can you clasify my page? You have the small problem of that I have a music review section, book review section, but also have my own texts out there.. I would either fall into one super broad category that 90% of websites would also likely follow, or several small ones, but the purpose of such a system is to singularly file a page.

      Of course, you could have meant a decimal for each page. But that could also prove impractical. Good idea, but implimentation would most likely fall into a category of FUD.

      --
      We don't need no Net Explorer We don't need no Thought control
    2. Re:Directories are not search engines by wizard97 · · Score: 1

      As a Dmoz's editor, I could say that Dmoz would be doomed to failure if classifying the internet were its intention. However, Dmoz's objective are providing a searchable directorie of GOOD sites. It will never contain all of them, but it gets the work done when you're searching for a useful site (and I talk by experience, since this was the exact way I found about Dmoz)

    3. Re:Directories are not search engines by WinterSolstice · · Score: 1

      Well, one thing to point out... Most web sites do not need to show up in searches. I for one could do with fewer sites with misleading meta tags, and a better way to find the sites I'm ACTUALLY searching for. Many of the web sites out there are very cool, and are buried beneath other sites that have better 'listing' services.

      Maybe the answer is not to have a bot find all sites, but to allow webmasters to register with the Library of Congress or something similar. The Dewey Decimal System idea is pretty cool. I would like to see something where I could free-form SQL search an index, then get the (reasonably) small number of sites that matched re-checked for searching against.

      That way, you could continue to scan a tightening set of web sites, all of which actually still exist. The search within search feature on some engines is nice, but it tends to be out of date. I'd be happy to let my computer (or a fee-based agent) scour the web for my personal searches on a daily basis.

      Sure beats looking up latex paint and getting all porno sites. It would be nice to have a NOT or ! feature.

      -WS
      --
      An operating system should be like a light switch... simple, effective, easy to use, and designed for everyone.
    4. Re:Directories are not search engines by Ayende+Rahien · · Score: 1

      Good News Unlimited Image Manipulation Program Tool Kit :-> :->

      --

      --
      Two witches watched two watches.
      Which witch watched which watch?
  111. use pay for listing - if looking for commercials by Garry+Anderson · · Score: 1

    I ignore pay for listing sites - they are nothing but commercials for big business.

  112. Re:Not unsearchable yet by ilsa · · Score: 1
    I read this article earlier and was frankly amazed. In the course of my job I use the Internet as my primary research tool every day. I guess I must just be better at searching than those people!

    We all know that a computer does what you tell it, not what you want. If you search for "cars" you will of course get much more useless information than if you search for "cars sedan mid-sized" or whatever other modifiers you have in mind. I remember once talking to a new internet user who tried to find information about "ants" through some search engine and ended up wading through links about restarurANTS and consultANTS. Okay, so all search engines are not created equal. Isn't that why most of us like Google?

    Perhaps it helps that I am researching very specific topics. Yes the Web is getting bigger, yes there are things it is hard to search for, but I don't really think it is getting worse. As the song says, "If she knew what she wants he'd be giving it to her now."

    --
    -- I Am Not A Terrorist.
  113. Re:Google's crawlers are part of the problem by Everyman · · Score: 1

    This idea is worth some thought. The basic problem is that the richness of the pages we produce in response to a name search is the very thing that is making it worthwhile to have our names represented on Google. A Google-referred user immediately appreciates what our site has to offer -- data visualization of interlinks between names, with clustering, cluster-click selection, etc.

    If this richness is available to the Google user who arrives at our search results page via Google, then the same richness is available to the original crawler that put up the page.

    But I appreciate the suggestion, and it may well be that some balance could be achieved that would bar Google from the "richness" but keep it open, available, and apparent for everyone else. We already do something like this -- the program that does the visualization is blocked to Google, so that the links Google gets are from a program that doesn't have to generate GIFs with client-side image maps, nor Java applets with cluster-clicking.

  114. Google's crawlers are part of the problem by Everyman · · Score: 2

    I run a site that's a cumulative name index of 700 books
    and thousands of clippings. The indexing started in 1983.
    For any name, you can get all the other names that share
    pages with that name throughout the entire database. In
    other words, each name search produces a page that contains
    anywhere from several to several hundred additional names
    -- all pre-linked directly to their own searches, which do
    the same thing. You get the idea.

    It's a bot's worst nightmare. But if you are Google, with
    lots of crawlers to sic on the task, it quickly can become
    my nightmare instead of Google's. Indeed, Google doesn't
    seem to care much.

    Last October I noticed that Google was inclined to stumble
    into our cgi-bin on rare occasions, and actually do a
    decent job of delivering referrals to the name data that it
    got from us. I lifted the robots.txt exclusion to see what
    would happen. No other bots have even delivered referrals
    as consistently as Google, so I can only assume that Google
    is the only bot that's even serious about going after the
    dynamic web.

    Either that, or their algorithms do a much better job on
    our names, which are all listed as surname-first throughout
    our site. If you search for a name in the news as Firstname
    Lastname without quotes, Google will put our Lastname,
    Firstname high on the list due to two facts: Our name is
    part of the anchor description and they give link data more
    points, and secondly, the two words are close to each other
    and this adds to the score (even though they are backwards).

    Google has come by once a month since ever since I lifted
    the robots.txt. Each time they spend about 10 days solid,
    24/7, with from three to five crawlers, chasing all the
    name searches. The rate from all the crawlers together for
    those 10 days varies from about two name searches per
    second to several per minute.

    It's very erratic during that time; the crawlers don't talk
    to each other, and there's no detectable pattern that
    they're following. They don't manage to get through the
    entire database of 115,000 names by any means. There is an
    incredible amount of waste and duplication.

    I had to install a load-sensitive thermostat so that when
    our server hits a certain load threshhold and it's Google
    calling, it starts delivering "Server too busy" responses
    instead of the search that was requested. That seems to
    work pretty well, but they get all those "Server too busy"
    messages stored in their cache copy for that name.

    To put it bluntly, their bots are dumber than toast, and
    if you don't watch them, they can turn your server into
    toast.

    Last November I wrote to Larry Page and offered to send him
    the damn database on CD-ROMs, in discrete HTML files using
    any specification he cared to define, so that his crawlers
    wouldn't have to load down our servers once per month.

    Mr. Page never responded. The letter was e-mailed, faxed,
    and snail-mailed. Someone from google.com did a Larry Page
    search shortly after I faxed it, so I'm pretty sure they
    read the thing. I offered these CD-ROMs for free, and I
    didn't ask for any changes in PageRank or any other
    considerations. It would simply mean that I can get my
    names onto Google efficiently and comprehensively, without
    enduring that 10-day orgy once a month.

    My point is that there is no real effort at Google to make
    any sort of accommodation on a case-by-case basis with the
    so-called "deep web." Until that happens, sites such as
    mine have difficulty in allowing Google's crawlers to run
    amuck once per month. We have other customers to consider.

    1. Re:Google's crawlers are part of the problem by maccallr · · Score: 1

      Can't you easily generate some 'flatter' version of your database and point robots.txt to that? You know, like gateway pages which give links into your database proper, but robots are banned from following. Much easier than burning CDs and so on.

  115. searching is always hard by CharmQuark · · Score: 2
    Though this article is more fluff than the useful information it would indicate to a search engine, it does ask a good question. Are technological advances reducing the ability of search engines? I would say no. Rather it is incompetent and malicious web page designers that are the problem.

    Although technologies such as frames, ASP and JSP, cold fusion, or Flash may make it harder to design a crawler friendly web page, such pages need not be crawler hostile. As the article points out, the issue is how the site handles requests that contain no parameters. The incompetent designer will treat such a request as an error. The more thoughtful designer will display a useful page with appropriate meta tags.

    The second issue is intellectual property and the true number of pages on the web. Suppose we create a site on the history of widgets. This site contains 10 base pages backed by a database of 100,000 widgets. Is the true size of the site 10 or 1 million pages? I would say that their size is 10 pages and indexing 0.001% of the possible pages in a complete index. The problem is how to make these 10 pages representative of the site. It may be reasonble that a search of '1145 crusade keepsake widget' might fail, but our design should allow the more general search 'history widgets' to succeed.

    Anyone who has done library research in the pre-computer age knows that is takes skill and determination to find citations. The fact that we have replaced 1 million tiny cards and 1 thousand volumes of indexes with an online database does not mean that search and design skills are no longer necessary. Unfortunately, we cannot assume that user will have the proper search skills, so we, as designers, must learn better design skills.

  116. No one expected Yahoo to scale infinitely by mblase · · Score: 4

    The only "problem" is that the Internet is simply too large for one engine to index. People go to Google expecting to search every web document that's online, a labor comparable to going to your local library and expecting their database to tell you about every book in existence on a particular topic or by a particular author. Even the Library of Congress isn't that comprehensive.

    I disagree with the article's claim that "much of the most interesting and valuable content [on the Web] remains hard to find." I think that the most interesting and valuable content is easy to find, provided that you start looking in the right place. Which means that if I want information on the latest US school shootings, I don't go to Yahoo or Google and search for "school shootings", I go to those sites and search for major news sources (BBC, CNN, Reuters, etc.) and use their up-to-the-minute search engines.

    The role of search engines isn't "shrinking" by a long shot; it's just becoming less comprehensive. Searching on the Web is now a two-step process instead of a one-step process, and you have to apply a little more intelligence than you could back in 1995. If high school students researching their latest humanities paper have a problem with that, well, they should ask us twentysomethings what it was like to have to use card catalogs and microfiche for our own high school projects.

  117. Huh? by Jaysyn · · Score: 1

    Are there people that actually have problems finding stuff on the web? I can't think of one single time that I haven't found at least some relevant information on something I was looking for. Sure you'll get a few dead links, but my problem is usually that there is too much info to sort thru.

    Jaysyn

    --
    There is a war going on for your mind.
  118. Re:No it's not. Searchabel indicies == IP theft. by Zero+Sum · · Score: 1
    >Deeply linking int a site == monetary theft (users can skip many levels of ads and jump right to relevant web page.

    Easy to fix. Session cookies that control how you are allowed to move around a site. If you have "value" and wish to protect it, then do so. Let others who want to be philanthropic do so.

    --

    Zero Sum (don't amount to much). [root@localhost]

  119. Not unsearchable yet by vslashg · · Score: 2
    Sorry, folks, but the web is clearly not unsearchable, at least not yet.

    Google consistantly returns good information on every search I make. A fairly superficial, PR-ish overview of their technology is here. The gist of it is that, among other things, the number of links TO a page is considered part of the criteria for ranking. (The theory is that an important or well established page will have many links to it.)

    OTOH, human-edited directories like Yahoo and dmoz are going to have a really tough time as the web continues its exponential growts. I get so many dead links from these services that it's not worth the bother.

  120. Context, not Content by cprael · · Score: 1
    Cool - 242 messages before this, and nobody's bothered to notice the simple fact we worked out 8 years ago. Content means very little without context to wrap around it.

    Sounds pretty trivial and stupid, right? Think it through - people are willing to pay >$100/year to subscribe to something like stratfor, which pretty much recapitulates information you could find with a broad sampling of news feeds. Why are people willing to pay for it? Because it sorts the wheat from the chaff, and puts it into a context that makes sense.

    What's that mean in a concrete sense? Anyone care to take bets on how long it'll take Yahoo to move to a subscription model (very small, sez my money), probably one not too different from the phone book or the newspaper.

  121. Simple Solution by Aciel · · Score: 1

    Add more spidering servers to look through stuff.

    One problem is this, though. It takes hours to find what you're looking for, in most cases. You search for C++ and it finds nothing, because it uses the plus sign for something else. You search for breast cancer and it gives you free XXX hot porno sex sex sex only 500 dollars.
    Why don't they improve this, one might ask.
    Easy: You see those little banners at the top of the screen? Every time you load a page, new ones show up, and therefore they get more advertising in when you load their pages more. Thus, if it's harder to find things, it's your loss, not theirs. Indeed, they gain, as they get more advertising done.

    Aciel
    aciel@speakeasy.net

  122. I'll keep my privacy by KarmaBlackballed · · Score: 1

    The last thing I want my browser doing is reporting my whereabouts to a central registry. My visits to \. are my secret --- even when the page times out.


    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    ~~ the real world is much simpler ~~

    --

    --- -- - -
    Give me LIBERTY, or give me a check.
  123. It's all a matter of using the correct phrase by Bob+Abooey · · Score: 1

    That's the deciding factor for the most part. Although I can search for "Lexus dealership New York" and still get a hit for Hanks homepage. Maybe hank has the word dealership on it somewhere and that was all it took. However for the most part if you know the correct words to search on you will get the proper results.


    Yours,
    Bob

    --

    All the best,
    --Bob

  124. Authoritative vs. Popular by charvolant · · Score: 1
    I find Google pretty good at finding leads for things and sites which are actually about the subject I'm interested in. Then comes the process of weeding out lightweight press releases and articles in the search for that elusive auhoritative article or reference.

    The lightweight stuff is, justifiably, more popular and gets more links. It's just that when one wants to go beyond Introduction to Duckspeak or Duckspeak Tutorial to Deep Duckspeak Analysis the popularity weighting starts hindering things.

    Working out what's reference material still needs an understanding of context and content. Librarian is probably going to be the job of the 21st century.

  125. What about the Yellow Pages? by alen · · Score: 2

    People pay to advertise in the yellow pages, what's wrong with being charged to list on the Internet?

    1. Re:What about the Yellow Pages? by HongPong · · Score: 1

      What about the white pages?

      --

    2. Re:What about the Yellow Pages? by blonde+rser · · Score: 1

      There is nothing wrong with it but it's not what I (we) want. I want to use a search engine that returns the most relevant and diversified articles... which may or may not be the sites that pay the most money.

      Does this desire of mine some how morally obligate the search engine not to accept payment for links? Ofcourse not; they are in their legal - and moral - rights to run what ever bussiness model they desire. It just makes it a service I don't want to use; again not because I am ethically opposed to them not listing certain sites - I just would find it more useful to know those sites existed and know how to find them.

      When I'm using the Yellow pages I'm looking for someone to do a service for me which I will pay them for and I know that by looking in the yellow pages I will find some people who can do that. But I don't rely on the yellow pages when I'm looking for someone who can do the service the best. I know that the best person to do this job may or may not be in there. When using a search engine I may just be looking for anyone to answer my question / provide me a service, in which case I don't care if I don't see the little guy. But if I want to use a search engine to answer a very specific question or find the best paper to match my query I will want to use the search engine that is rating pages without outside influence.

    3. Re:What about the Yellow Pages? by stupid_little_rocker · · Score: 1

      Yeah. And what about libraries? Should they only provide resourses and information if the publisher of such things pays them to provide it? Or maybe those that pay will have their information shown first to those seeking knowledge. The rest of it can be shoved in some box in the dark corner. . .

    4. Re:What about the Yellow Pages? by defclaw · · Score: 1

      Why should we pay for a a search listing??? The internat was supposed to be free. Plus i do not think that paying search engines will make them any better, because we have already seen the effects of comercializing products. The quality of the search engines will probably decrease because people will be much more interested in making money than developing a good product that returns coherent listings (like google).

  126. mujen.com is a decent search engine by oplspopo112 · · Score: 1

    I came across http://www.mujen.com a wile back. Its pretty acurate. Not as fast as others, but probably more accurate.

    1. Re:mujen.com is a decent search engine by zombie_13 · · Score: 1

      mujen is a pretty good one. I don't know anyone else that know about it. I found it a few months ago, and its accurate as sh@t. I use to use hotbot and metacrawler, but mujen is better. The only negative is sometimes its slow.

  127. Re:mujen.com is a very decent search engine by oplspopo112 · · Score: 1

    I think so. google is fast, but I think mujen is better. Plus I don't want to use the same searchengine that my moron users use.

  128. Google thriving on bloat... by wrinkledshirt · · Score: 2

    Maybe search engines relying on older methods are having problems, but using Google, I honestly haven't had a problem locating material quickly at all. You just have to have the right approach in searching for things...

    • Forget about possible titles of the page. Choose one to three words that you think will be with the body of the page you're looking for. Choose words that will describe the theme of the website. Use nouns, mostly, verbs have too many modifiers.
    • Avoid negatives, articles ("a", "the") or words that are frequently misspelled or have different international spellings ("colour" vs. "color").
    • Use correct spellings of last names for celebrities. If you can't figure out the correct spelling of a celebrity from the entertainment industry, figure out the name of an associative body of work (movie, tv show), and check out imdb. If you know how to spell "Mystery Men", you'll know how many L's there are in "Garofalo" (or is it "Gerafallo"?). Then head to your search engine armed with that correct spelling.
    • To narrow the search (ie: "Jordan" might turn up a ton of different references), try to use a second word that will narrow the context (ie: "Jordan Bulls").
    • Avoid using brand names unless you want .com sites returned first. Chances are they'll show up on searches anyway.
    • Search only on Google (or Google-based engines) as it uses IMO the best methodology for ranking sites -- chances are you'll want to see what everyone else is seeing too, and it's based on referential merit that the sites are ranked.
    • Heh, and if you're searching for a specific porn site, good luck. Pretty much every method possible of getting your site ranked higher in the searches has been used. I mean, have you seen some of those Meta tag listings?

    Like I said, most of this is common sense and redundant to most people who've searched for stuff before. But you'd be amazed how many people have no idea how to find the information they need, when you can get it in less than ten seconds, including the time needed to plan the search and type in the query. I try to use this sort of list when telling people how to find info., sort of like teaching a person to fish so they can feed themselves for a lifetime.

    --

    --------
    Bleah! Heh heh heh... BLEAH BLEAH!!! Ha ha ha ha...

  129. Fluff by vodoolady · · Score: 2

    I was always very happy with Internet searching, so I was surprised to see an article talking about some big Internet content crisis. I see their point about the 'surface' and the 'deep' web, but these are also the same terms used in BrightPlanet's whatepaper on the subject. Since it's pretty obvious that BrightPlanet invented the term, the entire article comes into question: why didn't they draw a distinction between the company whitepaper's thoughts and facts?

    And in the fourth paragraph:

    Despite the ever-ballooning size of the World Wide Web, which some experts claim is on the order of 550 billion Web pages, much of the most interesting and valuable content remains hard to find.

    An unsubstantiated 550 billion pages, or about 100 pages for every living human being? I'm no expert, but that's ridiculous.

    They quoted the Google people saying how hard it is to search for anything besides text, and then spruced some BrightPlanet PR. It sounds like someone's meeting the quota at Reuters, more of that fantastic deep content we should all pay for.

  130. Re:Searching via Apps by NineNine · · Score: 1

    Offline would only work if you downloaded a massive database of web sites. There IS a lot of desktop searching already, usually though IE plugins (like Yahoo Companion).

  131. Yes by NineNine · · Score: 2

    Yes, free, independent sites ARE tough to find, even with Slashdot's favorite Google. Eveyr time you search for ANYTHING, the first 1000 hits are always for a commercial site. the thing is, it's because the big commercial sites have most information that most people find most useful. Is there a good way to change this? Not that I can come up with, unless an 'alternative' search engine is created that doesn't accept large corporate sites. But realistically, that WAS Google, but even they couldn't live on 0 revenue.

  132. Re:Searching via Apps by MaxQuordlepleen · · Score: 1

    You would be able to even search offline, etc. Price tag included.

    Offline? What's that?

  133. Internet Yellow Pages by On3 · · Score: 1

    Check out this link to the Internet & Web Yellow Pages you can get on amazon.com, I've also found it on the shelves at Walden Books. Hope this helps a bit.

    --
    Microsoft is not the answer. Linux is the answer. Microsoft is the question.
  134. Web-based decision support by maccallr · · Score: 1

    I've found that the engines cover enough to make my new site work pretty well.
    Not for everything, but enough to be getting along with...

  135. Time parameter by sacremon · · Score: 1
    What is needed in many of the present search engines is a way to specify pages that are no more that x days/months/years old.

    Some of the search engines have this, but Google in particular does not. Having this feature would allow one to potentially cull out a lot of dead links, given the half-life of the average link

    --
    If you can't beat them, embrace and extend them.
  136. i dont have any problems with google by guest12 · · Score: 1

    not true at all. some of the older search engines (most of you guys havent heard of them i suspect) dont seem to throw up relevant results but thats because the search strings are not made properly. google seems to use a different technology and the results are FANTASTIC!! av and lycos are getting selective about what they index. yahoo is a directory and not a search engine really. dmoz is so -so. newer ones similar to yahoo are still popping up and do a creditable job given their resources. the way out is to specialise in some group of subjects.

  137. PCs on all the time... by 30F06950 · · Score: 1
    i guess it would depend a little on if people start leaving their pc's on all the time. anybody know of "normal" people who like leaving their machines on?
    My experience has always been, that normal people react with amazement when they here that I leave my machines on all the time, and then a few weeks later start doing it themselves...
  138. Re:Searching via Apps by 30F06950 · · Score: 1
    IMHO, the web searching applications for your desktop are going to be the next wave.
    Sherlock on the Mac has done that for several years... It is pretty easy to extend it to work on arbitrary search engines, too. (via an XML-like config file)
  139. Dogpile by delstar · · Score: 1

    Right now i just use Dogpile.com for my searches. It sends the query to about 15 other search engines (Google included) and shows the top results of each. Works pretty well...

  140. Google IS indexing dynamic pages by malibucreek · · Score: 2
    The article asserts that crawlers "can easily get trapped in a dynamically driven site."

    Not so fast.

    While that is true of older, cr@ppier search engines like AltaVista and Inktomi, Google can and does index dynamic pages. (Indeed, more than 60 percent of new users to one of my sites come in via dynamically generated .cfm detail pages that have been indexed on Google.)

    It seems to me that if you want your content to be indexed, getting on Google (and by extension, Yahoo, since Yahoo uses Google results in addition to its directory), is pretty darn easy. I have to say, I'm not nearly as frustrated with search engines as I was in the days B.G. (Before Google)

    --

    Why is it called COMMON sense when so few people have it?

  141. Unsearchable? by spookyfluke · · Score: 2

    Gee, FUCKING TEENAGE SLUTS I wonnder SEX CUNT COCK why PUSSY the PUSSY net PUSSY is GAND-BANG ANAL SLUTS getting so CUM FACIALS hard GOAT SEX to search and TITIES index? They're ASS probably using the wrong search engines and PUSSY aren't "web-savy" PUSSY enough. I can find anyting I LOLITA want GOAT SEX on the THREESOME net... You DOUBLE PENETRATION just have BOOBS to GOAT SEX know ASIAN ANAL SLUTS where and how CUM DRENCHED WHORES to look for it. GOLDEN SHOWERS.

    --
    you.bases.each{|base|base.are_belong_to=us}
  142. Re:Another way by FatHogByTheAss · · Score: 1
    How about a system for pre-indexing an entire site, such that the person who runs it can have a single document at the root of their domain with the index results? A standard could be developed that would even go so far as to map out the existing sub-sites (for AOL personal sites, for example) so that the engine could go to each one for the index documents.

    Already been done to a certain degree. Unfortunately, these guys are about to be inducted into the Fucked Company Hall of Fame.

    --

    --

    --
    You sure got a purty mouth...

  143. Yahoo.com versus uk and ireland by GruffDavies · · Score: 1

    Google gets relevant and recent additions as everyone knows. But what I didn't know until recently is that the UK and Ireland Yahoo uses Google now. A wise move. So why doesn't Yahoo.com? Anyone know? Check out the differences between searches on yahoo.co.uk and yahoo.com.

  144. Time for a distributed search engine design? by Peer+K · · Score: 1

    If this trend continues, would it be an idea to make a search engine design similar to DNS?

  145. Another way by Shoten · · Score: 4

    I think that neither the people who claim that this is impossible nor the people who want to dismiss it are correct. There is undoubtedly a major problem, and it is only getting worse. The flip side of that, however, is that while we are getting farther and farther from having a complete listing of the web in search engines, the ability of end users to find what they are looking for appears to be improving, particularly with the advent of better search engines like Google.

    The solution to indexing the web completely, or much more completely, has to lie in another methodology. How about a distributed solution? Google@home? distributedYahoo!.net? Honestly...there are ways to tackle the problems, and the reason why this entire system exists is because people refused to just shake their heads and say, "Nope, can't do it...sorry!"

    How about a button in browsers that enables you to mark a page as a dead link? Just hit that button and a centralized system gets a reference to the URL currently in your browser. That centralized system is funded by all search engines and all search engines draw from it. Yes, I know..."What if a user falsely claims a site to be dead?" Well, what if it took 100 different IPs claiming it to be dead before it really was considered dead? If you don't get many people hitting the site from a search engine in the first place, then you probably aren't serving it up to too many people.

    How about a system for pre-indexing an entire site, such that the person who runs it can have a single document at the root of their domain with the index results? A standard could be developed that would even go so far as to map out the existing sub-sites (for AOL personal sites, for example) so that the engine could go to each one for the index documents.

    I guess that what I mean to say here is that the problem is largely based around the hugeness of the web, and how brute force is no longer enough. But that's not really that big a problem...all that's needed is a bit of creativity.

    --

    For your security, this post has been encrypted with ROT-13, twice.
  146. Site sharing by MeltyMan · · Score: 1

    This is not a well developed idea, but i've been envisioning a gnutella-type sharing community where links are shared and pages mirrored. (i usually think of something like this every time a site is slash-dotted...:) a plugin in your browser could keep track of where you go, and what is to be found there. Then searches could be run against those individual db's by those in the community. Generous individuals could choose to mirror sites that you think are popular or overloaded. Kinda like google, i guess, but with a community of people doing this, it could be more scaleable (possibly directly proportional to the growth of the web.) Please lemme know it y'all think this is lame, or potentially feasible.

    --
    "Ummmm..." ...The programmer's "Om."
  147. The Holy Bible is MY search engine of choice. by Flabdabb+Hubbard · · Score: 2

    It may be thousands of years old, but it has stood the test of time. It has no annoying banner ads and very little porn to distract you from what you were actually searching for. What is the name of this search engine ? The Holy Bible. No matter what subject you are interested in, the Holy Bible has something to say. From Geekiness, to Installing Linux, to how to get a date, to what to eat. Its all in there. I realise I will be marked as flamebait by the anti-religious slashdot zealots but if just one person is saved my my advice it will have been worth all the negative moderation in the world.

  148. Google seems to be doing ok... by simonsays · · Score: 1

    Now they haven't indexed the entire web but 1.3 billion pages is pretty impressive... I don't need more than that... And if you can't find a page that you like in there then bugger to yah...

  149. Re:recommendation instead of seeking by Sven+Tuerpe · · Score: 1

    A totally new approach could be that you don't search but interesting web resources gets recommended to you by your personal agent. We are currently working on a peer-to-peer system that doesn't exchange files but exchanges recommendations for web sites.

    Nice, but no replacement for traditional web search. When I search the web, I usually search for very specific information, e.g. an XF86Config file for my laptop computer, scientific papers on 3D user interfaces, or a manual for my office telephone. Search engines like Google do a good job pointing me directly at such resources and I believe they do because of their KISS approach of indexing every page they can get hold of and ranking the search results.

    When searching for specific stuff, I'm interested in exactly the stuff I search for, sometimes only a few bits of information, not sites which may contain that stuff. I think it is quite unlikely for my friends other competent persons to recommend exactly what I'm searching for. They are more likely to recommend sites, i.e. collections of interesting information, and a few outstandingly interesting single items.

    What I'd expect to get recommended with respect to my examples above would be Linux on Laptops, Citeseer, and some Siemens or telecommunications site. But compared to a traditional search engine, these recommendations would not make my life easier. Instead, they would add an unnecessary level of indirection to my search.

    This does not mean your approach is useless, but it covers a different field of gathering information. I think a recommendation system is more suitable for keeping track of what's going on in the world, i.e. find out what's new and cool in one's fields of interest. Your concept is just closer to /. than to a traditional search engine, so it will be used more like /..

    --
    http://erichsieht.wordpress.com/category/english/
  150. searching by deran9ed · · Score: 2

    The trend is toward pay for listing. Will the free, searchable web fade away?"

    Its not a trend, its companies attempting to keep afloat in whats becoming a bull market. Its amazing to see how companies like google stay in business when they show little methods of collecting any kind of revenue. E.g., the only means of Google obtaining revenue is what? Charging for a company for a copy of its search engine? Why would a company pay for a search engine when the market if overflooded with them?

    Ad based revenue, we all know where those click me businesses are going.

    We also know most of the "web rings" never went anywhere, but for a search company to think people would pay for finding something on the net, they'd be shit out of luck, maybe corporations may do this, but I'd just make my own search engine (freely distributed) post it somewhere and let the whole "submit your site for free" revolution take place again.

    Privacy Info

  151. Ever heard about this one? by Henk+Poley · · Score: 1

    Maybe you should try Subme. They use a intelligent search-engine. It's quite handy some times. You need to know little about the subject and when you find anything that is you just select one of the buttons -, +/-, + That's easy...

  152. Deep web content and other searching problems. by Anemophilous+Coward · · Score: 2
    The article touches on "deep web content" hidden by new technologies such as Active server pages and Cold Fusion. Is this seen solely as a problem from the search engines themselves, or are the sites designed as such the ones complaining?

    If the sites themselves are complaining about no one able to find their content, aren't there ways to help that? Run a query on their database site to generate a possible site list of the content and then provide that list to the search engines. The search engines could then provide a link (found based on a content search) that would put the user on the page where they enter the form (or whatever) information to generate the page needed. Not being familiar with XML, but knowing that it has some features to aid in content grouping, could this be needed to recode the sites in?

    Obviously if the sites themselves dont want this deep content easily viewed except by deep clicking through their whole site, or some pay-per-view system, that is their choice. I feel that they are limiting themselves however. If they think they have robust enough content to useful to users, they should strive to make that content as widely available as possible.

    Should proprietory websites even be considered as 'Internet-web content'? Those seem to me to be 'Intranet content' which most often should not be seen by the general public (ie: internal company policies only needed by employess of company X). For that information to be set free you should either need a very savvy person to break in from the outside or a traitor from the inside. If its only certain products listed that the company doesn't want to available to the public, well that is too bad for them, I'll just get a quote elsewhere and pay someone else my money.

    "evidence of a widening gap between the deep Web and the freely-accessible 'surface Web,' which could become a clutter of recreational and amateur-oriented content -- the online equivalent of public cable access television or self-published novels." Funny, ever since the late eighties, I've always seen the whole web like this. It's more like the big corporations tried to muscle in on the public cable channel and realized they might be better off on their own channel.

    Not your normal AC.

  153. Status of todays web searching... SUCKS! by skarzin · · Score: 1

    Well, I'll say that searching the web has gotten alot more difficult to find anything decent, my favorite search engine www.alltheweb.com (fast search) is becoming less appealing; Ever search engine I have tried, nowadays, I could be searching for bread recipes, and the first 1000 results to come up are either HARDCORE XXX FREE NO CREDIT CARD or something like Yes, we have searched all bread recipes and have high quality bread recipes for the taking, click it, and its really some site like Best of the Web which happens to be mysteriously behind most of those "we have high quality" type URLs popping up in searches. Web searching has moved in status down to... SUCKS!

  154. Oil Change? Coming right up! by Big+Sean+O · · Score: 1

    I searched "Paris France Auto Repair" in Google and found the following address: Garage Carlos 9-11 Rue Riquet - 75019 Paris +(33) 1 46 07 03 48 You're Welcome. Next question??

    --
    My father is a blogger.
  155. Dynamic sites by CyberDawg · · Score: 2

    Much fuss is made about the search engines needing to "fix the problem" of not being able to index sites like microsoft.com because the pages are dynamically generated. Is this really a problem?

    Microsoft (or whatever over dynamic site you wish to pick) chose to make their content unindexable. Don't try to make it someone else's problem. Let people who use the search engines find third-party information instead. If the site designers wanted their site in the search engines, it would be there. Many of the sites built with ColdFusion or ASP contain basically static information anyway, and making them dynamic just reduces your traffic.

    Sites like Slashdot are dynamic. A search engine can't be expected to keep up with something that changes every 30 seconds. However, making all of the archives static HTML allows them to be searchable by the engines and takes some load off the server, to boot.

    I went for a "best of both worlds" approach on my personal site by writing a perl site generator. Each time I update the site, I re-run the site generator, which takes about a minute. My server carries a lighter load, but I still have "dynamic" links to related articles and such that the site generator builds.

  156. The Sky Is Falling, The Sky Is Falling! by Eoli · · Score: 5

    You said the same thing two years ago!

  157. Re:The Grumbling public? by madfgurtbn · · Score: 1

    I don't think anyone is saying that the problem is that people won't pay for it. The task of indexing the web is just too big for humans, too complex for computers, and impossible anyway because the web changes too fast and the content is increasingly stored in dynamic databases, not in (relatively) static text files. The problem is only going to get worse as the rest of the world comes online.

    --
    Send lawyers, guns, and money. Dad, get me out of this.
  158. Re:Searching via Apps by CoachS · · Score: 1
    I'm already using one called "Copernic" (http://www.copernic.com) which has a free version and a pay version. I sprung for the pay version because I like it better -- it's basically a meta-search tool that searches through a large number of other search tools.

    Works pretty well, is updated regularly (they add new search engines all the time) and even includes a number of different "categories" like e-mail, dictionaries, auctions, etc. that let you narrow your search.

    Well worth a look, IMHO, and no...I don't work for them or own any stock in the company.

    -Coach-

    --
    Perhaps the world's greatest tragedy is that ignorance is not impotence.
  159. Google Directory? by tweakt · · Score: 1

    *ahem*
    "The Google Web Directory, organized by topic"
    http://directory.google.com/

    sheesh....

  160. And Freenet? by benii · · Score: 2
    If you've been paying attention to the Freenet scene you'll see that it's impossible to search it. Everything is handled as a key value where you can input keys and find pages. This makes it hard to censor and almost impossible to trace where the server of the page you're looking at is but it also means that there arn't any search engines. All that needs to happen is for authors to give their pages nice keys and everything should work fine. This is a lot like how META Keywords should be working on the Internet today.

    --
    one thing i can tell you is you got to be free
  161. Re:Victim of its success by m_evanchik · · Score: 1

    That's depressing, and wrong, from an technologically ethical point of view. The web as a medium requires openness. Here's a novel suggestion, if somewhat heretical. Maybe the current slashdot format does not scale as well as might hoped. I'm not knocking the slashcode. Hey, I'm posting here, aren't I. My point is that maybe it needs to be organized differently thatn it currently is. Just a thought. Or maybe I don't know how to navigate around in it well enough

  162. Victim of its success by m_evanchik · · Score: 3

    The Web is a victim of its own success. Now every snake-oil salesman, fanboy and their grandmother has a website.

    Even Slashdot is too big. How the hell are you supposed to follow a conversation this big.

    especially with the goatsex.

    I'm gonna start mailing postcards.

    Excelsior,

    ME

  163. Searching via Apps by Raptor_316 · · Score: 1

    IMHO, the web searching applications for your desktop are going to be the next wave. You would be able to even search offline, etc. Price tag included.

    1. Re:Searching via Apps by Raptor_316 · · Score: 1

      I agree. I figure that the free GNU versions will play catchup again (are there any free *nix ones. Let me do a search...heh..).

    2. Re:Searching via Apps by Tar+Ciryatan · · Score: 1

      Most people already do search for applications (who know where to look) and dont even pay....oh, you know, its just called hacking.

      --
      -Tar Ciryatan, Angry Hermit-
  164. Wolf! Wolf! by Flying+Headless+Goku · · Score: 1

    That it was wrong then doesn't mean it is wrong now. That it is wrong now doesn't mean when the claim is made again a year or two down the road it will be wrong then.
    --

    --
  165. Slasdot Web Site Ratings by e2718 · · Score: 1

    I'd like to see web sites rated by category in a way similar to what Slashdot does. You could probably create a company based on the idea.

    :-), Elo

  166. No, it's just a big opportunity... by StarPie · · Score: 3

    Actually, this just is a great opportunity for the next Great Search Engine. Look at how well Google has done just indexing a small portion of the web (1%, according to the article). So that leaves the door wide open to anyone who can crack the puzzle of how to keep up with the web. If word gets around that something is better than Google, it'll be huge. You can say "oh, no one can index the whole web accurately," but there is someone out there with the brains and courage to try it -- and succeed.

  167. The Grumbling public? by Tar+Ciryatan · · Score: 1

    Well, I think this article is saying basically, "free search engines are doing a piss poor job....hmm, I wonder if I can find a decent one for free, or will I have to *gasp* pay for one? (which would obviously be superior!)" ah, people, I will never understand them

    --
    -Tar Ciryatan, Angry Hermit-
  168. Spam by Tar+Ciryatan · · Score: 1

    Well, sadly, people seem to enjoy spam here....and I thought this place was full of decent conversations and saying that, lets see how many spammers yell at me.

    --
    -Tar Ciryatan, Angry Hermit-
  169. Magnitude of Problem by Michael+Tanczos · · Score: 2
    Hello,

    I'm one of the authors of Sparkseek, a remotely-hosted search service. I'm also a student at Pennsylvania State University. I want to give you an idea of what kind of problems researchers in the field of internet text retrieval have to deal with.

    Larry Page, one of the co-developers of the Google search engine said in his 1997 research paper entitled "The Anatomy of a Large-Scale Hypertextual Web Search Engine" that the primary benchmark for information retrieval, the Text Retrieval Conference, uses a fairly small, well controlled collection for their benchmarks. The largest benchmark they have available is only 20GB compared to the 147GB from Google's crawl of 24 million web pages. Today, Google has over 1.4 billion web pages in their database and a reported 4,000 node linux cluster.

    One of the problems I have encountered and digress that I've found difficult to deal with is the shear amount of redundancy in web content. Anybody who has ever tried a search for any linux command has no doubt encountered hordes of duplicate MAN pages in their results.

    Not only that, but I honestly don't believe that when it comes to search engines, more is better. I have noticed over the past 6 months, as google has made great increases in its index sizes, that results have consistently become worse and worse. Search engines really need to begin narrowing the focus of their index and creating multiple indexes. Educational institutions should be separated from commercial establishments.. if I'm performing research on some subject, the last thing I want is to arrive at a commercial establishment pitching some product.

    Also, the method google utilizes when creating their indexes creates a huge scalability problem. Their indexes are updated less frequently that ever, and if you read their document that was published in '97, it's not hard to see why.

    Michael Tanczos

  170. recommendation instead of seeking by aharth · · Score: 2

    A totally new approach could be that you don't search but interesting web resources gets recommended to you by your personal agent. We are currently working on a peer-to-peer system that doesn't exchange files but exchanges recommendations for web sites.

    It's much like a good friend suggests that you have to look at a interesting web site. You can see all the marketing blurb at http://www.iowl.net/. At the moment this is a seminar paper of some people (including me) at the Wuerzburg University of Applied Sciences. We have a working prototype that will be released hopefully in about a month or so.

  171. I know what they mean by Newtonian_p · · Score: 1
    I know what they mean by search engines having huge queues of sites to index. I submitted my site many months ago to major search engines (e.g. Excite, Altavista, Google) and none of them have added it yet.

    Ironically, when searching for Newtonian's palace, Google will find sites linking to my page but not my page itself.

    --

    There are 2 kinds of people in this world: Those who write in decimal and those who don't

  172. Not Unsearchable... by Splearch · · Score: 1
    ...but unindexable on a global scale by an organization with the size and goals of a corporation. My contention is that smaller portal sites, MetroNets and TopicNets, will sprout up where there is demand for better indexing, categorization, and community. The Internet gives access to the world, but its application ought to be local and personal.

    Splearch