Web Searches For What Lies Beneath
fat_hot writes: "The New York Times has an article [here] (registration required) about specialized search engines which try to drill into the submerged mass of the Internet iceberg to try to limit searches to particular subjects (and hopefully thereby increase coverage of the limited scope)." Considering that a google search for friends' web sites and other good stuff usually turns up more dirt than paydirt, it's pleasant to contemplate more relevance in search engines.
Northern Light strikes me as doing the best job of returning relevant results, going so far as to thoroughly categorize the results by topic. Also has a greater portion of the web indexed than any other engine. The downside is that there is a bit of lag time in adding new domains to the bot's indexing runs...
Google is pretty good at giving relevant results, but it misses a lot of sites. AltaVista is rather thorough, but not very good at relevancy ranking.
These observations are simply based upon my own experiences with these engines, so your mileage may vary. When performing intensive searches, I generally use all three, but I'll often start with Ask Jeeves, which is easily the best meta-search engine out there...
Here is another issue with web searching.
.pdf files.
.pdf material. Even tools like Atomz are capable of crawling this data.
None of the major search engines (even Google) crawl
They may take you to the document (often that is an issue, note google url) but not crawl through the item.
Try throwing this or any other pdf url into google or any other search tool. http://www.census.gov/prod/ec97/97cfd2.pdf
Even the searchpdf.adobe.com engine only searches summaries and is not that large of a dbase.
This is not a technical issue. Most crawlers can handle
I just yesterday found an essay on this subject. It can be found at http://www.lucifer.com/~sasha/articles/ACF.html He goes on a bit at times, so make yourself some coffee and print it out to save your eyeball(s). It's all about what he calls Automated Collaborative Filtering and Semantic Transport. Of course, I rarely have more than a little trouble finding what I'm looking for, but that may just be that I think I've found the most relevant info, not that I actually have. This paper lays some of the theoretical groundwork for revamping search technology. However, I would be hesistant to give up on the current engines. I think a "smarter" search should be regarded as an addition to the current toolset, not as a replacement. (End user moderation would help cut down on the detritus currently clogging the pipes though!)
These guys are claiming responsibility for it.
Here is a story from Wired about it.
However, I suspect that whatever the answer to the search engine problem actually turns out to be, it will have the following characteristics:
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
That's not to say that it couldn't be improved - I'd love to be able to "for sure" get exactly what I wanted in the top three or four returns, but, often I'm searching for something a bit obscure that is only being described by common words (alas, I can't think of what was vexing me in that department last week).
But, I think my point is still valid even if this super-search engine comes around: The search is only as good as the searcher allows it to be.
Send your friends messages of love at fuck-you.org
Huh? Where'd I say that? Clearly your school isn't doing its part in teaching research & attribution.
Contrary I believe many schools *are* doing their part. No not all, but many. Tragically school libraries & school librarians have been tremendously short-changed in the past few decades, ironically often in order to fund sexy things like computer labs.
The truth is that the skills one needs to use in a library are even more critical now then they were in the past. As you correctly pointed out card-catalogues are dead, I can't think of any post-HS system that still seriously maintains one. Unfortunately the helpful Reference Librarian willing to walk a random person around and re-tech them the ropes have also been budget-cutted out of existence too. With the information explosion / the information economy the ability to search, prioritize, and compile material has become even more critical (not to mention the ability to comprehend the materials.)
Corporate knowledge-bases, electronic paperwork, web-based 'employee handbooks', online job searches & apartment rentals; these all require the ability to search for information in an efficient and comprehensive way. Search-engine cluelessness is simply a symptom of a wider problem.
That said again I believe schools are doing a reasonably good job. I know my old elementary & high schools are teaching kids how to use search engines, as is my old university library. My concern is for those out of the educational system.
Reading the directions doesn't seem onerous to me. If one is performing searches and coming up empty or with useless material then figuring out how to fine-tune one's searching doesn't seem to require any great intuitive leap. Yes it would be wonderful to live in a world as trivially comprehensible as the doorbell but lacking that most folks have learnt to READ THE DIRECTIONS.
Generally search engines do a great job of explaining how to use them. There are even search engines that try to out-think the user and parse their natural-language requests into regular search expressions. Google isn't one of these engines; it's a high-powered bare-to-the-metal engine that requires a certain amount of understanding by its users to use. On the other hand there are literally dozens of other engines that *do* walk a person through performing a decent search. The fact that folks pick the wrong tool for the job (a tool they neither know how to operate nor are willing to invest the 1-screen/2-minutes to learn) and then complain about their results seems to be just idiocy on the part of the user (or in this case an article author.)
Yes, the original article clearly set up a straw-man in order to promote these dedicated search engines, on the other hand there are legions of folks who continue to use search-engines every day with poor results and do complain about them.
The solution? I dunno - sell them more lottery tickets?
I don't read ACs: If a post isn't worth so much as a nom de plume to its author then I wont bother either.
That point aside I'm trying to figure out the rest of your posting. You don't like the fact that different search engines use different formats? Well pick one and just use it. You prefer a GUI interface instead of a command-line type one? There are lots of those. You'd prefer a walk-through format? There's lots of those too.
I think you've got a point somewhere but I can't find it. I suppose my only comment would be that folks should, again, pick tools suited for the job. If it's not worth it to them to learn a seach syntax then they shouldn't use a search-engine that relies on one (DUH!) Google requires a syntax, many others don't, use one of them.
As to search-engines getting tricked into returning misleading its, yeah that's a problem but not a big one. So 5% or even 10% of the hits are come-ons to porn sites, there's still going to be ~30% good hits (the rest misses of varying degrees) and that's enough to be productive with.
Finally - don't tell someone not to be "smug and negative", I could insert some comments here about the apparent tone of your posting but that wouldn't be productive, lets just say I don't see those in my posting & drop it.
I don't read ACs: If a post isn't worth so much as a nom de plume to its author then I wont bother either.
Most of us recall being brough into the school library and show how to use the card catalog, given a few assignements, etc. Unfortubately for those of us out of school the's not that set of skills in place to help searching.
Boolean seaches, using key words, supplying partial words, phrases, etc. are all supported by most search engines but few folks understand how to use them.
What's really suprising to me is that folks who use search engines regularly, indeed even rely upon them (journalists I mean you!) seem some of the most poorly prepared. There are lots of resources for learning how to do a good search, many from the search engines themselves and many more from third parties yet we still get these perennial "I can't find ..." stories.
Honestly, I'm not into blaming-the-victim but how difficult is it to learn how to perform a good search? One screen of directions? Two minutes of time?
Yes there's a place for specialized engines handling unique or limited content but most of the larger, more general purpose engines do nearly as well if properly used. Again, it's dependant on the user to learn how to define what they want, all of the tools in the world are no good if they're not taken advantage of.
I don't read ACs: If a post isn't worth so much as a nom de plume to its author then I wont bother either.
That's where specialty search engines like Moreover come in. Eventually, sites like this will let you search those bits of the Web that change often (news sources, weblogs, discussion groups, sites like Slashdot, message boards, financial news, etc.), allowing people to keep up with things as they happen.
Existing search engines are great at finding things that are archived on the Web, but poor at keeping up with what's currently happening. Looking for all the articles on the latest Shuttle mission, as well as what people are saying about it? You might find one or two things about it on Yahoo! or Google, but a search engine like Moreover will find the fluff article on CNN, the more in-depth article on Space.com, and a discussion about the mission on Slashdot. That's pretty powerful.
Your directory search example didn't work too well for me. And while you *can* search Yahoo's news archive, you'll only be searching sources that have a syndication deal with Yahoo. What news search engines provide is the ability to search most major news sources without any non-news sites, and many people want that.
By the way, the NYTimes story mentions moreover.com, which is a great service. But since their search feature only searches headlines, allow me to mention my own project, NewsBlip.com, which performs full-text searches. Give it a try, thanks!
Or a review system. There is a way to do it, although it might be a pain in the ass. Basically, what you need is a web of trust and digital signatures.
For example, suppose I have a list of keywords. I submit my page to a reviewer, and they judge whether or not my keywords are a reasonably good match for my page. If I pass the test, they PGP-sign my page.
Then you just have a modified search engine that only returns pages that have a valid signature by someone who is on a list of authorities that the searcher trusts.
This type of thing could be used for a more general web page rating or reviewing system. It's just that perhaps some reviewers might judge pages solely on the criterion of meta tags matching the content.
---
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
Interesting indeed. If you go here:
u shstore.com/+dumb+motherf******&hl=en, which is the cached link from Google you'll see the following:
http://www.google.com/search?q=cache:www.georgewb
This is Google's cache of http://www.georgewbushstore.com/.
Google's cache is the snapshot that we took of the page as we crawled the web.
The page may have changed since that time. Click here for the current page without highlighting.
Google is not affiliated with the authors of this page nor responsible for its content.
These terms only appear in links pointing to this page: dumb motherf******
Very obviously Google uses words from OTHER websites to link back to websites in searches. I'm not sure I like this. This looks as though by someone linking to my website and putting bad words in their website, I could be affected by it.. anyone able to comment on this?
Yeah,
:-)
For instance, if you do a search for pornography on google, you can often times get a link to Disney.com. The reason for this is because many porn sites, if you click the, I AM NOT OVER 18 link, take you to www.disney.com.
This is both something good and something bad in the way that google indexes.
I have to agree with Arkaein here, it is very odd that someone was able to fool Google into thinking that the GW store was a top linked site. It would be nice if Google where to show you were the reference came from
I've many times searched the internet with a search program to find nothing that I was looking for. It's very upsetting when you try to find something. I used to like Ask Jeeves (www.aj.com), but it still wouldn't find what I was looking for, then I found google (www.google.com) and was very happy to find that it did infact find what I was looking for most of the time. Why can't all search engienes look for what you type in?
I think part of the problem lies in the fact that they match words all over the website.. ie... if I type in "hot green hamsters" the words Hot, Green, and Hamsters can appears anywhere on the homepage, even if I put them in "'s the search programs dont' always group them togethor. So A page talking about hot peppers, green peppers, and how hamsters eat the pepper gardens in Mexico, would bring up a search, even though it wasn't anything about what I was looking for.
...that average people are morons.
IIRC, Google uses an algorithm that, based on a combination of HTML tag size and logged click-throughs would sort the links. Neato-keen.
Well, about a year ago when google was still young and fresh, you could type in your search strings, hit the "I'm Feeling Lucky" and get EXACTLY what you wanted. Blew me away time and time with its strange accuracy.
But, as more of that click-thru data got integrated into the sifting, I got more and more of the crap that the sheep (ie, normal mom and pop AOLer types) wanted to look up. What the hell, man. Don't get me wrong, I still use google, but now I have to scan three pages deep before relevant pages come up.
Dirk
I keep trying to pick fights, but I can't shake this Excellent karma.
5) The number of times a particular page is linked to.
;-)
7) The number of times the linking page is linked.
This is not hard to manipulate. Put lots of links in your pages instead of "keywords".
Once again the porn industry leads the way on the web
instead of pages that look like this
sex sex sex sex sex
sex sex sex sex sex
sex sex sex sex sex
sex sex sex sex sex
they now look like this
sex sex sex sex sex
sex sex sex sex sex
sex sex sex sex sex
sex sex sex sex sex
Also I think google looks at the url to see if it matches the search word this is not difficult to manipulate either
www.foo.com/sex.html has 20 links to www.foo1.com/sex.html etc etc
I remember reading an interview with Tim Berners-Lee(I would post the URL but altavista can't seem to find it) where he was amazed that everyone wanted to be on the World Wide Web...he had thought that it would fragment into little specialized pieces where each interest group would have their own domain ( ie mathematics, chemistry, physics, etc) and search engines would be limited to the domain.
Will we end up there?
Skip ------ See the latest from http://www.anArchyFortWorth.com
What did they expect? Google can't read minds yet.
:)
Almost a year ago (in the beggining of april, to be more exact) they announced this breakthrough technology. I'm not sure why it isnt in use now...
Dont you love april fools?
Google can't read minds yet
True, Google can't read minds, but if you type "Can Google read minds?" and click the I'm feeling lucky button you'll find out about a nifty feature that's almost as good as being able to read your mind.
Go to Google. You know where to find it.
Punch in "Dumb Motherfucker".
Click "I'm feeling lucky".
-=Best Viewed Using [INLINE]=-
And its been around for a while as a concept. I used to work for SpaceRef who maintain an excellent niche search engine devoted to space exploration.
I maintain Omphalos which is a niche search engine devoted to the modern alternative religions (Paganism, Wicca, etc) and related subjects.
All it really requires is a reliable collection of websites focusing on a specific range of subjects and good search engine software to index their pages. The results are often much more relevant than those from the major search engines - although Google is generally an excellent choice IMHO.
"The first time I got drunk, I got married. The second time I bought a chimpanzee, after that I stayed sober" Arian Seid
Unlike method xxx, our method yyy does something completely different, unrelated and totally offtopic.
Although you could envision ways of sorting through this example, realworld examples can be far more abstract and disjoined.
-Moondog
What did they expect? Google can't read minds yet.
Bunch of mojacks.
Tony
and I found Harrison Ford - damn good movie tho - too bad the trailer ruined it
The ultimate network admin tool needs HELP!
Yes you are blaming the victim. The basic concepts of searching take less time to learn than fancy terms like "boolean". Ideals are nice, but the devil is in the details. Search engine sites perform a difficult task and some do a first rate job. For that they should be thanked, but nothing is perfect.
What confounds the user mostly are all the syntaxes uses to express those concepts. They are different for every site and take some getting used to. It would be neat to see a search engine with more than one line for input. You could have a box for exact phrases, one for anyword matches, an exclusion box... It's not that command line syntax is ugly, it's that most people have better things to memorize.
Another thing that confronts the user is the effeciency of the search itself. Very clever people constantly seek to fool search engines, and ocasionaly do. The result is garbage to wade through until the search engine can recover. I remember a time when all search.com would retrieve was porn sites. Even Google has been beat a few times.
Let's not be so smug and negative. Look for the opertunities presented by user confusion. Be happy that these new search engines are comming.
Friends don't help friends install M$ junk.
They searched for "Chavez" and then complained that there wasn't any information on Linda Chavez (the nominee for Labor Secretary).
No, the problem is that Google (and every other major search engine) takes forever (weeks to months) to spider new pages. So just after Chaves was nominated, none of the news articles about her had been indexed. By the time the spiders hit them, she'd already been dumped.
The basic problem is that HTML spidering is a horribly inefficient way of indexing information that is often (especially in the case of news articles) stored in a nice, neat database.
Thanks for proving my point. My company, Thinkstream, is working on search engine technology that overcomes just this sort of problem. Instead of centralized, HTML spider-based search sites that cough up stale data, our technology is centered around live connection to diverse, distributed data sources (especially databases), regardless of storage method.
If your friends' have sites but not too many people link to them, they won't rank too highly in Google's eyes, will they?
A Google search for 'dumb motherfucker' will yield George W. Bush's website, how inaccurate could Google possibly be?
"a Google search on "chavez" led to several encyclopedia entries on Cesar Chavez" Would it have fucking killed them to type in "Linda Chavez labor secretary"? And this was very recent news, exactly how quickly do you expect Google to scan the entire internet for updates? How quickly could these 'iceberg drilling' search engines possibly scan the net? It's a deep web right now, what's invisible will bubble to the surface if it's relevant... Maybe they have a point on using the search engines to only scan specific areas, but I think websites which specialize in these areas should license the Google engine instead of Excite's... (you know what I'm talking about right? Every big site has some article you want to find, you go to look for it, you get the worst search interface possible that doesn't return any useful links...)
--
Peace,
Lord Omlette
ICQ# 77863057
[o]_O
Yeah, and it would be great if nobody stole money and gave to charity, too. It just isn't going to happen. Any system that is A) valuable and B) depends on everyone behaving honestly is doomed to failure. You're never going to get people to stop cheating the search engines as long as doing so is both possible and beneficial to the cheaters. The plain fact is that manipulating the system works, and people are going to keep doing it as long as it keeps working. The only solution is to develop a system that is not easily manipulated.
Perhaps you should try looking at Google, a search engine that actually uses these in a clever way as the key part of its ranking system. It's remarkably effective at finding relevant information and at avoiding the kinds of simple manipulation you complain about. Other ranking schemes (like GoTo.com's straight pay for placement system) are also relatively resistant to manipulation. I think that the long term solution is going to be natural selection; search engines that are easy to manipulate to give lousy results will go out of business and leave behind the ones that are actually useful.
Good luck. The latest versions of Google include over 1 billion pages. Manual sifting for poorly labeled ones just plain isn't an option if your primary goal is comprehensiveness.
There's no point in questioning authority if you aren't going to listen to the answers.
The proverbial iceberg of data on the net lives in databases not accessible to search engines as we know them today. The power and complexity of the little engine that could would be far too sophisticated for the public to be allowed access to. It'll be interesting to see how they pull off the privacy end of the whole thing...
"Helping to keep you two steps ahead of the Thought Police!"
Why does one need cheesy dotcoms to tell us what a directory is?
A directory search limited to U.S. newspapers immediately brings up, say, an explanation by Linda Chavez about her relationship with the illegal alien in question.
If one wants political news, one can go to a political news source. If one wants information on Linda Chavez, one can do a more specific search. If one wants political news about Linda Chavez, one can (this must be getting very complex for your average dotcom founder) search a news archive.
-- Stanislav Shalunov
The premise of the article is good, but I feel that in a way that theory would stunt some acquisition of knowledge. Often in my own web searches, while seeking information about a certain specific subject or theme, I have come across other topics that interested me that had absolutely nothing at all to do with my original criteria. I know that this is commonplace, but it just reiterates the whole miracle of the internet to me: Not just information is available, but all kinds of information.
________
Sorry for the wrong links...
There are spaces in the middle of the two last. Delete them and they will work.
As the articles says: "People may know to come to the library, but they probably do not know which reference books to pull off the shelf. Of course, in such cases, patrons can at least consult a reference librarian."
...) defined by the taxonomy used. In other words the idea is to search the meaning not the words (see also www.oingo.com).
- lab/ka/KnowledgeAgents.htm
e o- 11/www/wwkb/index.html
In the example given by the article a "linda chavez" or "linda chavez labor secretary" query would be much better than the ordinary "linda".
Moreover, there exists the problem of determining the category of what is being searched. A trend is the use of AI and ontologies by the search engines, which determine what is really relevant in a page and classify it during the indexing phase based on the different categories (economy, medicine, technology, entertainment,
What the article talks about are the knowledge based agents. A quite interesting article can be found at: http://www.cs.technion.ac.il/~cs236512/www-search
Another interesting link:
- CMU World Wide Knowledge Base (Web->KB) project:
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/th
http://channel.nytimes.com/2001/01/25/technology/
That'll get you in without registration.
"There are bad people out there that will try to do bad things." - Microsoft 05/11/00
Seriously folks. The article is just saying use the right tool for the right job. It's a no-brainer. If you want news stories you search cnn.com or another newsite. If you are looking for financial information search The Wall Street Journal or another financial page. Search engines like google (get the toolbar, it is great) or AJ are for general searches to get you started out on a topic so you can refine your search from there. Duh.
Pithy, yet ultimately meaningless, phrase expressed with gusto!
Examples:
Searching for "John Smith" should return my friend John Smith and no one else.
Searching for "C++ implementation of Knuth algorithms" should return exactly that, and leave out references to C++, Knuth, or algorithms.
At the very least, large search results should immediately separate the mass of results into categories - i.e. "Jessica Alba" - up at the top should be pr0n - fan sites - commercial sites - etc. Yahoo does this, but there are way too many categories. Really, the web has maybe 10-12 different broad types of sites - commercial, homepages, academic sites, pr0n, multimedia, weblog - you get the point, the list isn't that long. We should be able to filter entire broad categories out of our searches. Altavista does a fairly good job with multimedia searches - unfortunately there still is way too much manual searching - it still doesn't read our minds enough within the broad category search.
Google uses PageRank to determine the order of results, but does it track the sites its users click on after performing a search? No, but it should. Further, it should track users individually and be able to customize its results based on that persons individual personality. The more you use a search engine, the better it should work for you.
I can't stress this enough: A search engine needs to be able to read our minds.
No, Thursday's out. How about never - is never good for you?
Actually, you'd need "news +for nerds" since the 'for' is normally ignored, being such a common word.
my sig's at the bottom of the page.
Yeah, I've read that... It was hysterical! I think I'll go read that again. Thanks for the link.
-----
"The only difference between me and a madman is that I'm not mad." - Salvador Dali (1904-1989)
I asked Jeeves "Where can I find a good search engine?" and was directed to a really good site where I can buy engine parts for my car online.
Thanks for nothing you bastard butler!
-----
"The only difference between me and a madman is that I'm not mad." - Salvador Dali (1904-1989)
I disagree. I continually find close matches using Google, much better than anything I used previously (Hotbot was good for a while).
When Yahoo started using them I rejoiced. It was the best of all possible worlds (good search engine, web of content like the calender, and hand-picked sites when all else failed).
- I don't care if they globalize against free speech. All my best free thoughts are done in my head.
90% of everything is junk
In truth, it maybe more than that.
So we come to the needle in the hay stack,and how the databases that the search engines consult give priority to different terms, how they index the various sites, and how long it takes.
Of course, for the person truly expert in these things, these are trivial details. They are as obvious as a traffic jam. For the rest of us, it is more a matter of "where did all these cars come from?"
Unlike our computer, there is no central index for the full content of the web. It is a job that is done continously at a surface level, and takes a month or two or three.
In that context, of course last night's news will not get indexed while we wait.
Just like the tradition of game installation, search engines have been designed to be used by people who have a clue.
Sometimes I swear that until we get a system designed by geniuses to be used by idiots, we will need to have some sort of internet user license or something. Other wise it is simply a matter of designing systems that can obey the command:
"Do what I want, not what I say."
This is an interesting problem in programming, is it not?
"It is a greater offense to steal men's labor, than their clothes"
The problem is also that people expect search engines to do their thinking for them (an expectation admittedly encouraged by search engines themselves). Search engines are algorithms. The content generated is a transformation of what you put in. If you know jack about what you are looking for this will often show in the results. If you know something to start with, your searches are likely to be more successful. Of course, you can always begin to intuite the workings of particular search alogorithms -- that's how people get used to or get proficient with one engine but can't use others ...
Username: cyph3rpunk0
Password: cyph3rpunk
Enjoy the article.
-- I have marked myself unwilling to moderate-- I don't have other accounts to artificially inflate the karma of
Google was great up until a few month ago (maybe even a year), when they started throwing around lots of publicity and at some point they explained to the whole stupid world in simple terms how they made Google better than the rest of the crop (i.e. the Pagerank system). At that point, the web-spammers, those pathetic fscks who spend their whole lives making content-free pages with only links and banners and popups, well they figured it out. They started creating zillions of ditzy pages containing trucks of keywords and only one link to the "real" website they were spamming about.
The concept isn't new, it's just the sheer volume that made Google freak out. The reason behind it is that Google counts the number of links leading to one page as an indicator of that page's actual popularity. So the spammers simply created hundreds, thousands of dummy pages with single, prominently-placed links which fooled Google's crawler.
The temporary solution, as always, will be to come up with a new crawling method that can filter out these poison pages, but of course it will only be a matter of time before someone "cracks" the new crawler. History repeating.
-Billco, Fnarg.com
The example they used at the beginning of the article was fixed, they just typed "chavez" into a search engine, not "linda chavez". Of course they got tons of irrelevant links. You'd think the reporter could have picked up on what a bad example this was. I'm not saying that the search engines don't have flaws, but they could have picked something that demonstrated their point much better.
Upon an unsuccessful search they do not offer you the choice.
Obviously, they have no responsibility to offer it, but it's kind of slimy that the time you want the option the most, it's not offered.
Also exact string searches are a little weird, particularly if you forget the +s for common words like "the."
I have a little experience at this.. Keeping the trash out takes a lot of tedious work.. I usually have to set a day aside to dig through the ton of unrelated crap that my site collects. I started it as sort of a way to keep up with what I thought was cool and exciting as I learned more and more about linux. It ended up being yet another dreaded responsibility.
Are YOU listed?
Well, the article speaks of a lot of things, mostly though, it's links to specialized search engines. It gives the impression that in order to really find what you are looking for, you should use a highly specialized search engine. I disagree a bit on that.
I know there are companies out there that has the technology to "put it all in one" so to speak. I have worked a little with Autonomy, and I gotta say, I am deeply impressed by what it does. They employ technology called Bayesian Inference (from Thomas Bayes). The technology has to do with "calculating the probabilistic relationship between multiple variables and determining the extent to which one variable impacts on another" - Sounds wild, eh? Well it it. Together with this, their core engine, called DRE (Dynamic Reasoning Engine), relies on the theory of Claude Shannon, which states that "the less frequently a unit of communication (for example a word or phrase) occurs, the more information it conveys".
The more input you give it, the more accurate it will be. Oh, and it's actually for all kinds of unstructured information - also e-mail.
I ramble. You should check it out.
Autonomy also makes Kenjin, which is a piece of software that you install that will understand what you are looking at, and help you search for similar stuff. Kinda kool.
Any technology distinguishable from magic, is insufficiently advanced.
The problem most users have is NOT with syntax and boolean functions, it's simply that they're rarely being logical and specific. I mean really specific.
E.G. - 3 people I work with were trying to find the name of the Abbott & Costello movie with the voodoo doll making witch in it (don't ask), they were searching and searching for made up names, years, actors names, ad nauseam, when 'Abbott Costello witch voodoo' (click I'm feeling lucky) brought it right up.
The trick is visualizing and then boiling down your desired target text to specific unique words and then searching for those words. Sounds obvious but most people still expect technology to have animate, responsive, understanding qualities like it does in the movies.
I have suggested a "fix" for those who give a crud.SEE This
I believe that maggard is right about schools not doing their part in the information age, and teaching kids how to effectively use search engines, espically considering the fact that many schools are moving towards electronic card catalogs.
I personally am self taught in the 'art' of searching the net. this includes using boolean operators and as previously mentioned, the use of quotes around phrases. Why can't schools teach these usefull skills?
The problem isn't the searches, it's the people who make the webpages.
Why doesn't everyone use metatags properly? What about specifying good (descriptive) title tags?
Plus, don't you think it would be much easier if people actually didn't try to cheat search engines?
In actuallity there would be some very easy ways to score pages for relevance then:
1) The number of times a particular word shows up in the keywords, and description of the page.
2) If the word actually appeared in the title of the page.
3) The number of times the word appears in the body of the text
4) The length of the supposedly searched word
5) The number of times a particular page is linked to.
6) The words used to in the link
7) The number of times the linking page is linked.
Wouldn't the world be happier. Personally, I think that it would be great that if there was an editing team that would simply delete misrepresented pages.
Anyway. That's my two cents.
"i blew a booger that i'd swear had it's own spinal cord" "OUCH" Caroline's Spine
I disagree with the line about searching for stuff on google turning up dirt. If you know how to format a search properly, and which words are key, nearly anything can be found on google.
OTOH, it is always nice to search technology getting better. There are some simple ideas which would aid searching, such as voluntary self classification of web sites into general categories (I'm sure this could easily be worked into one or of the emerging document stardards, if it hasn't been already). This would effectively divide the internet into a large number of overlapping sub-nets, as far as searching was concerned -- you could search everything, or just websites pertaining to 'games', etc... I think that a solution along these lines (although probably a better/more complex version) will be necessary before truly powerful searching becomes easily available.
I can't envision some complex algorithm and/or a team of people classifying stuff ever being a strong solution without the aid of enhanced standards for the web.
-Robert Thornburg
Why does every fool out there submit stories that I have to jump through a hoop to get to. Give me YOUR stinkin password or stop advertising for the Times. Do your own legwork and find the same story at the another newspaper.. Attach THAT URL, and stop working for the NYT you huckster. Good god folks, think before you post info-demanding URLs.
"Curiosity killed the cat, but for a while I was a suspect."- Steven Wright
If any system will ever gain self-awareness *without* it's programmers permission, ala sky-net, it will be a search engine.
There are 01 kinds of cars in the world. The General Lee, and everything else.
It may be true that as google indexes more newbie home pages, the average quality of the links it sees is going down, but that's another issue.
If we're talking about specialized search engines, then don't we need some way to know which sites to search? What is the feasability of creating a system where meta data about a site is entered in a database tied with domain registration? When I go register widgets.com I can specify that I'm commercial, serve north and central america, manufacture widgets, etc. Meta data about my individual pages could provide more detail, but the meta data at the domain level would direct the specialized search engine to my site in the first place. It just seems to me that even if a search engine is specialized it needs some way to find appropriate sites without brute force searching the net, or they will still have the same problems unless they have the manpower to filter the results.