Slashdot Mirror


Better Search Engines

prostoalex writes "Scientific American is seeking better Web searches. They report on all sorts of innovations happening outside the Google-Yahoo-MSN zone that the press is usually reporting on, including GPS-enhanced searches from University of Maryland, Shape Retrieval and Analysis from Princeton, musical search engine from New Zealand Digital Library Project, and some of the projects that A9 and Ask.com have been working on."

35 of 137 comments (clear)

  1. What we need is whitelisting by Jailbrekr · · Score: 4, Insightful

    If we can whitelist sites, and reduce the total number of advertisments cluttering the search, the existing search algorithms would work quite nicely.

    It is a pipe dream, I know. :(

    --
    Feed the need: Digitaladdiction.net
    1. Re:What we need is whitelisting by vbdrummer0 · · Score: 3, Interesting
      If we can whitelist sites, and reduce the total number of advertisments cluttering the search, the existing search algorithms would work quite nicely.

      I agree, but why not just eliminate all ads from search results? As far as I'm concerned, they can put real ads all over the result page as long as the results themselves are legit.

    2. Re:What we need is whitelisting by prostoalex · · Score: 2, Insightful

      Theoretically Google should be able to put that into their toolbar. Also you could probably use an extension in Firefox that would modify your query to exclude certain sites. After the list gets long, however, it wouldn't be too effective.

    3. Re:What we need is whitelisting by eln · · Score: 3, Insightful

      Kind of like a directory of sites like, say, Yahoo was in the 90's.

      The problem with whitelisting is that a spider-driven site like Google will always end up having a greater quantity of relevant results (as well as a greater quantity of non-relevant results, of course). History to this point has shown that people prefer to deal with a lot of bad results mixed in with a lot of good results rather than having to rely on a small set of "good" results from a directory-driven search engine.

    4. Re:What we need is whitelisting by iminplaya · · Score: 3, Informative

      This is kinda close.

      --
      What?
    5. Re:What we need is whitelisting by Anonymous Coward · · Score: 2, Insightful

      Ads exist just because search engines are managed by some companies that need money.

      Imagine now that the search engine is totally distributed...

  2. Music Search by fembots · · Score: 5, Funny

    a user can record a query by playing notes on the system's virtual keyboard. Or he or she can hum the song into a computer microphone.

    I tried that, but I was so out-of-tune the search engine returned all songs from Britney Spears.

    1. Re:Music Search by k4_pacific · · Score: 4, Funny

      I tried it too, but all the results were blocked for DMCA violations.

      --
      Unknown host pong.
  3. What I want by Anonymous Coward · · Score: 5, Interesting

    Is just some better work done on recognizing essentially similar documents. Like, if I perform a search, and 40% of the returns are the same wikipedia article copied to different sites, it would be nice if the search engine could only show me one (wikipedia). Or, like, if I'm searching for some kind of error I got while using Linux. Most of the returns I get will be various old Linux mailing lists, but only some of them will be relevant to my problem. There must be some way the search engine could logically organize them for me so that I could more clearly identify that block of returns that is most applicable to my problem of the moment.

    1. Re:What I want by me+at+werk · · Score: 4, Informative

      CopyScape can do the recognizing of copied stuff, but it's purpose is only finding website plagarism. This, however, would definately find all the wikipedia forks unless it's a really old copy and the page has had a major rewrite.

      If google could integrate copyscape into their search, you would be happy.

      --
      For context, click Parent.
    2. Re:What I want by Anonymous Coward · · Score: 3, Interesting

      What I want is a button that lets me resrtict my search for a thing to either a review of the thing, a forum/blog discussing the thing, places to buy the thing, or specs/datasheets on the thing. So many times I type in a product name only to get two dozen "find prices/read reviews on X" -- none of which actually have reviews ("be the first to review X!") or even more than a couple of not-so-great prices. A filter could be done by creating a statistical fingerprint of the page.

      I also want to be able to sort my search based on the amount of grammatically correct (or mostly correct) text on the page. Something would have to keep it from indexing hidden (white on white) words or keywords designed to grab top spots on search engines. There would have to be some more complex grammar checker (and place checker to make sure it isn't a bunch of tiny text at the bottom of the page) to accomplish this.

    3. Re:What I want by JesusQuintana · · Score: 2, Interesting
      "Like, if I perform a search, and 40% of the returns are the same wikipedia article copied to different sites, it would be nice if the search engine could only show me one (wikipedia)."

      Like, I agree. I have done some searches and simply find the same text on page after page. It would be nice if the search engine could provide some sort of heirarchy. It could say here is the authoritative source and here are all the sources that qoute it.

      I did say it would be nice, but it really isn't necessary, or it would seem very feasible. What if the authoritative source changed, but the subordinate sources are not updated? It would seem that this would apply to your Wikipedia example. And how could an algorithm determine the parent source?

      Yes, it would be nice. But better search is also a detriment. If we are an information economy, as the talking heads on television keep telling us, isn't part of our value as information workers our ability to deal with this information. In the old card catalog days, wasn't a researcher's job to take all of the information that is available and gather and interpret that information in a meaningful way? Isn't that still true of today's information worker? If your CEO can type something into Google and get the answer he wants, do they need you to find the answers for him? If software becomes too intelligent, then doesn't the human mind become obsolete. Frankly, I've always been proud of my ability to parse and understand information and then recapitulate it in a meaningful way.

      "Or, like, if I'm searching for some kind of error I got while using Linux. Most of the returns I get will be various old Linux mailing lists, but only some of them will be relevant to my problem."

      Well, you've probably got several problems here:
      1. Lack of relevant material due to smaller installed user base. (It will always be easier to find the answer to the Windows problem because more people are having it and writing about it.)
      2. Lack of need to publish material. Linux users are generally not computer illiterate so there isn't much need to write thousands of Linux hand holding articles.
      3. Lack of people searching for and linking to material. This makes Google's relvance ranking somewhat useless.


      Of course, all of this could be attributed to poor search criteria. As they say, "garbage in, garbage out." And that takes me back to my previous point. If your CEO can fix his computer with Google, why do they need you?

      In the information economy, knowledge is power and money. And if knowledge is easily obtained, then the laws of supply and demand dictate that the value of knowledge decreases. As a freelance video producer, I am watching the devaluation of my services occur as practically every Joe Schmoe can edit video on their home computer. The latest technological innovations have devalued the video/film production industry. The only people still making lots of money are the stars, because the work proactively to protect their value. I don't know why technology/information workers aren't interested in the same. If I buy a faster computer that enables me to produce videos in half the time, then I have also cut my billable hours in half as well.

      So, I am all in favor of poor search technology and lazy people. Simply, if I have the patience and the ability to skillfully use technology in ways that others cannot, then I am a valuable commodity. But if anyone can do it, then I'm just like an 18 year old who wants to work for MTV. That's why MTV staffs with interns, because there is an endless supply. If the supply is endless, then you have no value. You're a dime a dozen. You might as well be Chinese.
      --
      You said it man. Nobody f#%ks with the Jesus.
    4. Re:What I want by HugeFatty · · Score: 2, Informative
      I agree that the things you have listed are problems, and that they'd sure be nice to solve. I just wanted to address one of them for now, as I have been trying to deal with it myself.

      The hidden text problem that you mention is a surprisingly hard problem to deal with, as there are so many ways to do it.

      You have:

      • The <font> tag
      • CSS (several ways, such as the :hidden property, changing the colors, using the z order, etc.), both internal and externally linked (for which the search engine must download that file while spidering)
      • DHTML positioning over other elements
      • A background image the same color as the text
      • Javascript to generate any of the above
      • Use of nearly identical colors for all of the above (such as #FFFFFF for the background and #FFFFFE for the foreground). In fact, there could be dozens of colors that are all slightly different enough that a human wouldn't be able to detect it without looking very closely, or at all.
      I'm sure there are more that I'm missing, but I think you (meaning everyone...I'm not just picking on the parent here...) get the idea. You pretty much have to render the page like a browser to take care of all of those, which really sucks for us search engine developers trying to fight it, and us users that have to deal with that crap.
      --


      I am clearly fatter than you.
  4. Ask.com! by joshsnow · · Score: 2, Funny

    and some of the projects that A9 and Ask.com have been working on

    I want a search engine with a Genie-Jeeves. Imagine: I snap my fingers, smoke streams from my monitor, materialising into Jeeves, complete with tray, glass and a bottle of that beer I couldn't quite bring to mind when I clicked the search button...

  5. Like Yahoo, Only Cheaper by mstyne · · Score: 2, Interesting

    From The Daily WTF:

    I want a website directory, like a yellow pages, or Yahoo. I want any web user to be able to add a link, under the relevant categories available, like...finance,real estate,travel,games etc. I would like the links to be approved before they appear. I want the search results displayed in the following fashion: A URL text, or URL image, with a little description underneath. I want the following tools - top 50 searches, most popular links, a search facility. A space across the top of the page to insert my own logo.

    --
    mstyne: real name, no gimmicks
  6. metadata by subrama6 · · Score: 2, Interesting

    as we get into video search and the like, aren't searches dependent on the quality of the metadata associated with the item? i just tried video.google.com, and was impressed that typing in "bauer" got me stills from recent episodes of 24. but surely that's based solely on the fact that "bauer" was a tag for the still. at that point, why is new search technology impressive? it's the metadata that makes it possible. am i missing something?

  7. Clusty = Innovative by int2str · · Score: 4, Informative

    Asides from the horrible name, clusty (a clustering search engine) is very innovative and easy to use. I hope more search engines will adapt similar technology soon.

    Link to clusty.com search engine

    1. Re:Clusty = Innovative by lucabrasi999 · · Score: 3, Interesting

      Use vivisimo instead of clusty. It is the same search engine/company, just different names. If you search use Vivisimo, the sponsored links aren't quite as obnoxious. Unfortunately, the firefox extension uses Clusty, not Vivisimo.

      As for the names, both of the suck big-time. "Vivisimo" and "Clusty". Geez. I remember a few years ago, Price Waterhouse Coopers Consulting decided to change their name to "Monday". I wonder if the folks at Vivisimo hired anyone from PWCC, because their names suck almost as much.

  8. GPS-enabled search by jxyama · · Score: 2, Interesting

    GPS-enabled search would be excellent, as more and more people probably will adopt accessing the web on their cell phones. (already happening in japan, afaik.)

  9. Musical search already exists... by Humorously_Inept · · Score: 2, Interesting

    It has been available as a service on mobile phones for something on the order of two years. The same thing, called TuneTracker, is available in Canada now under the MuchMusic brand. Put your phone up to the mystery tune and you'll get the song title and artist's name back in an SMS message.

    I'd like to see a search engine that can intelligently filter results for the word "review." When I search for a product review, I do not want some hole-in-the-net online store's product page with a link to 0 customer-submitted reviews.

    --

    ~Someday, I hope to be an aspiring author.
  10. Yeah, but when will... by Anonymous Coward · · Score: 2, Funny

    www.findmysocks.com by up and running?

  11. Better Search techniques by rueger · · Score: 4, Insightful

    Nice article which summarized many of the problems with contemporary search engines.

    My experience is that a few years ago you could type say "baked gorgonzola" into Google and be sure to get a useful result pretty near the top. These days though what you want is likely to be on page three or four, after a dozen links to price comparison sites.

    There really is no such thing as a quick Google search any more. It almost invariably involves multiple formulations of your query, and probably trolling through at least two or three pages of results.

    Whether that's because of Google, or the sheer volume of content on the web, or sites that capitalize on Goggle's weaknesses is something I don't know.

    1. Re:Better Search techniques by rueger · · Score: 3, Insightful

      Ok Bad example. Try searching Google for information on say a Sony STR-DE945 reciever and see how far you need to look to find anything beyond retail. Like maybe for a page from the Sony website?

      Or try to find a User Maunal for the same item: sony STR-DE945 receiver manual.

    2. Re:Better Search techniques by susano_otter · · Score: 2, Insightful

      Why would you Google for the user manual, instead of just going straight to Sony's website?

      Google's rankings are based in part on what other people care about. The results you're seeing are because people are more interested in finding and using websites where they can buy the product, rather than the manufacturer's official brochure page for the product. And since that page is trivial to find, if you really do need it, it would end up being noise on most Google searches for the product.

      When I need a manual for a server in my datacenter, I don't go to Google. I go straight to the vendor's website. Works every time.

      --

      Any sufficiently well-organized community is indistinguishable from Government.

  12. Vivisimo by Dan667 · · Score: 2, Informative

    Interesting, the first thing I thought is I had seen this with Vivisimo, but I guess no one could spell that so the changed the name?

    http://vivisimo.com/

    But I agree, it is a great search engine and has gotten better as I have used it.

  13. Sure, sure... by susano_otter · · Score: 2, Insightful

    And the moment any one of these other technologies becomes at all useful, except in certain limited applications, the technology will be acquired by one of the search engines that everybody actually cares about (coughGooglecough), and the functionality will be added to their Internet search solution.

    --

    Any sufficiently well-organized community is indistinguishable from Government.

  14. Shape Retrieval and Analysis? Hmmmm... by R2.0 · · Score: 2, Funny

    Does that mean I will be able to search for porn with 38DD's?

    Did I say that out loud?

    --
    "As God is my witness, I thought turkeys could fly." A. Carlson
  15. It's available! by ByteMangler_242 · · Score: 5, Informative

    You can do this in google: searchterm1 searchterm2 ~bogus The tilde will look for synonyms. You can see which ones hit back by reading the bold results which are neither searchterm1 or searchterm2. I use ~howto and ~cheats often.

    --

    Rule of the open mind
    People who are resistant to change cannot resist change for the worst.

  16. better to search information, not pages by AnonymousCactus · · Score: 2, Insightful
    Enhancements to normal search engines are great and will always be important, but better is to go beyond that to searching, indexing and retrieving actual information. Services like AskJeeves and company originally promised true question answering and other, more experimental, projects like UW's Know-It-All promise to operate over information, not webpages.

    Perhaps these are just very generalized search engine enhancement...but I think it's a new way of thinking that will become very important over the next decade as facilitating technologies mature.

  17. Tools for scientific searches? by Pi_0's+don't+shower · · Score: 2, Interesting

    Anyone here who's a scientist ever try to use "google scholar"? Unfortunately, it's not very good. What I'd like to see (as an Astrophysicist) is some way to do a search that combined results from difficult-to-navigate scientific sites, such as NASA's ADS abstract service, the Spires HEP database, and the arXiv.org preprint database. Finding what you need on these individual sites is often a pain, and to be able to search a compilation of them would sure be nice for me...

  18. how about google blocking a domain for good by Sark666 · · Score: 2, Interesting

    When I get pages and pages of crap that we all know are ads, I wish I could just check a box, block this domain from future searches.

    Click on enough of them and a user might just see search results similar to circa 96

  19. Easy (relatively) improvement... by Nobody+You+Know · · Score: 5, Interesting

    The number one search engine feature that would make my life infinitely easier would be precise proximity operators in search engine syntax.

    (For those who don't have a clue what I'm talking about, LEXIS-NEXIS, among others, allows you to run searches like foo w/5 bar (the word "foo" within 5 words of the word "bar"), or even foo pre/5 bar (the word "foo", followed, within five words, by the word "bar". Good proximity engines allow you to search not only within x words, but also to order terms, to specify root words within terms, etc.)

    It would be great to have people reviewing and whitelisting page results, but that takes human interaction. Implementing precise proximity operators, though, can give you nearly the same benefits without any of the human cost.

    Many people here have suggested eliminating ad text from search results, but if history is any indication, any algorithmic system that we can come up with to do so will be circumvented pretty quickly. The one way to fix this is to allow me to say that I want the word "modperl" within 10 words of "solaris", rather that just specify any page that contains both terms. That will get rid of 95+% of ads right away.

    Surely, with all the bright people at Google, this is something that they can figure out pretty easily.

    1. Re:Easy (relatively) improvement... by joker784 · · Score: 2, Informative

      You mean like this: Google API Proximity Search ?!

  20. Another cool concept in search engine. by geek2be · · Score: 2, Interesting

    This one connects you with people searcing for similar keywords. I guess the idea is to have another set of helping eyes.
    site: http://www.chatnsearch.com/