Better Search Results Than Google?
Mechanik writes "CNN has an AP article about the next generation of up and coming search tools, which try to cope with the glut of hits that result from 'conventional' search engines such as Google. One tool, Vivisimo, "is like a superfast librarian who can instantly arrange the titles on shelves in a way that makes sense. [...] But unlike libraries, Vivisimo doesn't use predefined categories. Its software determines them on the fly, depending on the search results. The filing is done through a combination of linguistic and statistical analysis." Grokker, another, downloadable program, "not only sorts search results into categories but also "maps" the results in a holistic way, showing each category as a colorful circle. Within each circle, subcategories appear as more circles that can be clicked on and zoomed in on." You have to love the author's use of trying to look for a hotel in France with the terms 'Paris Hilton' as an example of searching gone awry."
...until I can regexp my searches. It would make a whole lot of difference.
Well, Google made a huge leap forward from the old-guard, of AltaVista & Yahoo, who were in their own way a huge leap beyond what had gone before. We had to expect this to happen sooner or later, but two things spring irresistably to mind.
:-)
1)Will it gain the enormous foothold in the collective consciousness that Google has acquired? To Google is now a verb... and it gets mentioned on Buffy, which is as good a cultural barometer as we are ever likely to have.
2)Will the UI and secondary services (such as the ODP, and Google Groups) be as good as Google itself?
Also, while I'm sure that it will happen one day, I'll believe it when I use it and not before... Oh, and the Paris Hilton thing? LOL! That sort of anti-result comes back from search engines *a lot*. I was just talking to my mom about searches of that type of ambiguous nature the other day.
Sign the FSF's Anti-DMCA petit
of Antarctica, an old and very clunky Java Yahoo-like engine (sorta). It used a map of Antarctica to drill down into categories and subcategories before putting the user in a 3D world interface at the lowest level. When I interviewed with them, the interviewer did an excellent job of turning me off the technology, explaining that the 3D interface would allow 'billboard and other advertisements' along with the search results formatted in a 'mall or street' of entries.
Gah.
A new search engine comes along that touts its uber intelligent way of searching. It is hyped by the press but ends up by the way side. (See Teoma)
I don't get excited about "Google alternatives". Google satisfies my searching needs as it is. Sometimes "knowing what to search for" is better than a super intelligent search engine.
As far as I'm concerned anyone with a clue can produce the results they need with a little bit of practice and common sense. They don't need new search engines.
Clif
clifgriffin > blog
What if you want that glut of hits? Sometimes you have to dig through some pretty obscure hits on a search to get what you want, and categorizing them or putting them in funny circles just complicates the process and can make the search take longer. I'll hang with Google and Teoma, thank you very much.
And I certainly don't want a downloadable search app running, that's just another possible inroad for spyware. I've been burned enough times by apps I thought were "clean" that went off and chewed up enough bandwidth to choke a horse.
Be excellent to each other. And... PARTY ON, DUDES!
Tried it...too many ads and so I don't quite trust it to give me the kind of pure results I seem to get from Google. I'll wait for Google to implement the same kind of categorization system or at least let other people who have the time test out visimio.
---Technology will liberate us if it doesn't enslave us first.
I wonder what happens if you use Grokker to perform a search for images? It would be cool if the colorful circle be re-patterned in a way to resemble the image you are looking for.
In this way, searches for something like goatse.cx would be especially topical.
We realized the same idea for images. Take the results from Google Image Search and rearrange them using methods from computer vision.
An article about this is available here: Clustering visually similar images to improve image search engines .
Is there a search engine that can filter out all of those annoying placeholder sites that grab unsuspecting visitors by simply putting every word about a certain subject on a page and then having links to other useless websites? This is 'webspam' as far as I am concerned and the next step in search engine design should be 'placeholder' site aware.
A search engine that ignores specifically commercial sites would also be helpful.
Any ideas on either of these type features in current or upcoming search engines?
I've been doing a lot of thinking lately about better ways to interface with data, generally with searches but it applies to most anything. Naturally this was inspired by reading some Sci-Fi (Saturn's Race by Niven and someone...the book is in the other room.) I got to thinking, the perfect interface I can imagine is much like an actual room, things laid out visually where you would expect them. The normal 2D GUI has always seemed a bit unnatural to me.
When this is applied to searches, I'd like to see information grouping, liek was mentioned in circles, although I want it more organic. tree structures, book shelves, whatever is most appropriate to the current search, and I want them interchangable so I can format my view however I think works best. In a web search, I like the idea of a street. The major sites, amazon,com, ibm.com, etc are all represented by nice looking storefronts, but there are also dark alleys I can do down, to find less reputable places. So in this case, information is arranged by reputation of the source.
I haven't quite figured out how to approach this from a coding viewpoint, but surely there are projects out there that try this. WilmaScope for example is a good way to look at certain types of data. Why can't more things have this kind of intuitive interface? 3dDesktop is another attempt at this, but it is a mapping of 2D desktops to a 3D shape. I want more of a visual representation than just a bunch of desktops attached to a sphere. I konw there are others out there, but how about some leads? What have you see/used for intuitive data representation? Why hasn't this taken off?I tried a few searches on Vivisimo before it went live on slashdot and I must say I'm impressed. It addresses one of the main faults of search technology today: context. When you perform a search a tree is shown showing the different contexts (not categories) where the terms were found. Excellent for ambiguous concepts.
But, and here is the beef, it should be obvious to anyone that there must be a interface change in the short term future of search. A textbox is a very limited input to express a complex search. Using regexps and regexp-like operators is not enough. This Vivisimo is a step in the right direction, but there's a lot of way to go through.
For example try to make this search using any engine (Vivisimo, Google, Yahoo, Altavista, etc): who was the red-haired singer that recorded a song with Tom Morello a few years back?. At least I can't find an answer because one of the main aspects I'm using (the red hair) maybe is not as important as other aspects used to describe the situation by anyone else.
There must be a interface revolution in the years to come. Come to think of it, are we still using a textfield to express every possible combination in a google search? Gross!!!
Life isn't like a box of chocolates. It's more like a jar of jalapenos. What you do today, might burn your ass tomorrow.
...google leaves so much to be desired. Too many paid and crafted links...too many stealth redirects...too many commercial links forced ranked...no AI.
google reminds me of that old pizza commercial with the new employee 'big dummy'. When he finally gets something to do, he runs off exclaiming "I am HELPING!!!" - not
I hope what I am writing is not too off-topic. I have found this tendency among people (mostly involved in non technical/scientific jobs) associating top searches for high level of "authencity". It is totally overlooked that top searches are "popular" but might not be of high quality/authencity. Ofcourse, great deal of association can be made between "popularity" and "quality". Better things are more popular.. However, most often popularity (like power) feeds on itself. i.e. Popular links become more and more popular (ofcourse other scenarios exist). There should be some way out..to recognize the quality of information.(slashdot like moderation of all webpages by a search engine is not a bad idea in theory!). So, unless we have search engines that not only come up with popular sites but with more relevant content of high quality there is a lot of scope for improvement. (For instance how does an essay written by a college student through online research compares with that written by library research..). Another area where search engines can make great improvement is search of dynamic pages. "page rank" like algorithms suits well for static data. For instance a highly relavant post on some newsgroup posted *recently* might not show up on your search page! I hope google isn't another future microsoft (oh! did I mention power/popularity feeding on itself before? :) ) stifling innovation.
Search engines can be lot lot lot better..hope they will be soon!
There is a DEFINITE central structure.
Atoms, modifiers, and conjunctions.
Atoms are character classes (letters, ranges, or bracket expressions), conjunctions of said classes, or a paranthetized expression (like in maths).
You have two conjunctions. The first is concatentation is what you get when you put one atom right after another (they both have to appear in that order). The other is alternation (pipe) where either the left atom or right atom must appear.
Finally modifiers are an optional number of repetitions for each atom to match. The default is from 1 to 1 (exactly one). * means from 0 to infiinity, ? means 0 to 1, + means from 1 to infinity, and {x,y} means from x to y.
That's it.
Black holes are where the Matrix raised SIGFPE
I have used Vivisimo a few times but never realized that their method of categorization was quite langaage independent.
If it really is then DMOZ, the Human Edited Directory, ought to incorporate dynamic categorizations like this, infact to the point that someday each user should have his/her own unique categorization of the all the websites in the world ...
Meanwhile, are they using the words in the headings to determine categories ? Or is it words that have in some way been emphasized ? And to do this in a way that transcends language ...
I am really curious as to how the words that determine "categories" in a sentence/para/section/page can be identified and sifted away from less important words. And how to determine the "keywords" that are not as important as "categories" but still more important that the "filler words" on the page. Keyword for Google is what you are searching for. That is easy. But how does Vivisimo take it further and establish it as a category?
To see a world in a grain of sand, and then to step back and see the beach where the sand lies
Since Google places ads to the right of my search request, I don't expect ads in the results of my search which is what is happening now.
human languages, there's no central structure.
I think if you spoke to any linguistics major, they would disagree. If you are interested in structures in human languages, a good place to start is with any of Chomsky's linguistics work, because he studied how words combine into phrases and phrases into sentences (think of it as a tree). In fact, every sentence in every human language is formed from a noun phrase, auxillory, and a verb phrase. It is kind of similar to token types combining to form sucessivly "larger" constructions in a computer language, but it is more easily recognizible because a) computer langauges barely have any transformational rules b) have a very limited non-user defined vocabulary.
====
Crudely Drawn Games
I prefer to believe in the visionary Tim Berners Lee with it's Semantic Web ideas. There are lot's of works in that direction. When we could do searches with semantics, the results would be exponentially better. Until them, my bid is in Google. How many time yet? Ten years?
It would be nice if there is a feature that filters e-store entries. For example, I was looking for a solution to my Logitech RumblePad left analog stick problem. And no matter how refined my search is, I still get thousands of pages to stores selling that gamepad. I don't want to buy a gamepad. But I guess search engines and e-commerce would never be separated. Sadly this is how the Internet works now.
They're still a software firm. Did you interview with Tim Bray of XML fame, perhaps? The web demo I saw way back when used ODP data and a lot of Java.
Who do you get to be an expert to tell you something's not obvious? The least insightful person you can find? -J Roberts
I searched for "welding control" and got back a list with no places trying to sell me welders or welding supplies. Mostly, I got back useful papers that are not availible through my school's library. This is where this idea could shine. Good Stuff. The system does need a overhaul, though. Just my two cents.
"That's not ironic, it's just mean!" - Bender
Grokker has several major downsides as compared with Google right now:
* It's a program, not a web page.
* It only runs on Windows and MacOS X. (More generally: it cares what kind of system it runs on, which Google doesn't.)
* It uses Java.
Basically, it's a step in the wrong direction from Google. Google's homepage is the model of simplicity: no ads, no extraneous information, nothing that isn't specifically focused on getting you the search results you want. Google's search results are clear, unbiased, seperate out and clearly label the advertisements, and have just the right amount of Do What I Mean. If Google came out with a version of this, it would just be a set of unobtrusive text links at the top and/or bottom of the page saying "Did you mean: 'Paris Hilton person' or 'Paris Hilton hotel'."
The other reason you don't want a seperate client is because when you get the results, you will want to open them in a web browser. So why not use a web browser to find them in the first place. The only thing that might make sense is a browser plugin. Grokker also has a plugin, but it is proprietary, requires Java (which is also proprietary), and only works on Internet Explorer for Windows or Safari for MacOS X
If Grokker wants to succeed, they need to realize two simple things:
* They should provide a service, not a piece of proprietary software. Provide a Free Software plugin or provide the information to someone who can.
* Text, text, text. Other than the Google logo, there are no images on Google's front page (which I rarely visit thanks to Mozilla's ability to search from the address bar) or their search results. Grokker's results are entirely image-based.
The Paris Hilton Hotel Sex Tape (Rated R)
My only issue with Google is that it has a mildly annoying problem with linking to other search engines. Say, for instance, you search for n. Sometimes, instead of being presented with a list of sites carrying information about n, you're presented with links to other (mostly horrible) search engines. It's just as bad as being served a list of pages that are nothing more than "Google magnets," filled with a bunch of terms close to the topic you searched for, but missing any real content.
That's Google's largest flaw, IMHO.
(1) Subscription Model - Make submissions for website links only accepted after review by human beings. You could then charge the 'searcher' a monthly or yearly subscription fee to access this service. I would definitely pay $5 a month to get a 'filtered' search engine.
(2) Community Ranked and Moderated Model - An open-source, community driven and moderated search engine that relied on the massive amount of visitors to comment and rank pages they have received via the search engine result page. A simple plug-in for IE or Netscape, etc., could allow the user to simply click on a scale of 1-5 how useful the site was. Obviously this would be biased against brand-new data, but this is a problem with a subscription service as well. With such a large number of users, this free, community moderation model would be hard to defeat, especially with IP tracking and the ability to constantly change the code in the moderation code.
When I'm feeling peckish, I like to use Kartoo It searches for items in an interesting way.
__
Thou hast besquirted me, O leotarded one.
In fact, every sentence in every human language is formed from a noun phrase, auxillory, and a verb phrase
Come now.
This is the problem with Chomsky and his linguistics really. Looking at English and believing that all languages are fundamentally alike and that there is a simple structure to be found.
Chomskyan linguistics is in many ways like looking at C and saying: 'Oh! All programming languages must be procedural and have pointers and use curly brackets to delimit blocks! And because it is obvious how all programming languages must be inherently equivalent this must _actually_ be how every other programming language works, everything else is convoluted C! And anyway, I can't be bothered to learn Haskell. Or Prolog.'
Yes, I'm being unnecessarily flamebaitish.
Vivismo might not be able to beat google in the sense that google beat yahoo et al. But if their underlying technology is good enough, then maybe google will buy them and integrate their ideas into google. That might be their best strategy at this point!
anyone remember northernlight?? Vivisimo isn't much of a revolution, unless you consider their ability to get around northernlight's patent 'revolutionary.'
I would prefer as an alternative to regexp (since that obviously would be way too much power and too many exploits) is simple logic operators.
Most search engines now have AND and OR but none have nested logic or short hand
for example I would love to do this in google: (linux && modems) || ("AT commands" && !windows)
> SELECT * FROM brain_cells WHERE synaptic_rate > 0
0 row returned
That works too. Knowing what to search for takes some practice. ;)
You can learn a lot by searching for serialz and crackz for specific versions of software you have and plan not to pay for.
This practice will teach you how to fine tune a search, why you should not use IE as your primary browser, why you should block popups, how sites front load text to get listed higher in search engines, where not to go for porn, why you should delete or selectively use cookies, and a bunch of other useful tips you can use for all your web searching needs.
Bad boys rape our young girls but Violet gives willingly.
Yes, I'm sure Google is just going to roll over and give SCO whatever they want, and not fight them or anything... It will be a cold day in hell before SCO gets ANY licensing fees from Google.
You're thinking like a linux user, and not the average user.
Honestly, you *must* have had some time in your life you're trying to find out something on the web and Google hasn't been able to easily find it. Another post used the example of the "red haired singer" which is a good one. If a search engine sorted all the websites into "CD Sales" "Performances" and "Tom Malone the Construction Worker" for that person, it'd certainly point them in the right direction.
Your response doesn't apply to most people. People don't want to learn how to work technology, they want technology to "just work" for them.
Altavista has always had the capability to specify that separate search items exist together in a document ("AND"), but that they occur in close proximity.
I can say:
"Knoppix distro" review
to Google, and I get results related to Knoppix, some of them indeed reviews OF Knoppix. I also, however, get useless hits that may mention Knoppix, but review something else further down in the document. I do not get hits restricted to Koppix reviews.
If I do this with Altavista, I get hits much closer to what I want:
"Knoppix distro" NEAR review
as a law student, I've been doing a lot of searches on westlaw and lexis. Some of the handiest search improvements over basic google:
/s word2 - search for word1 and word2 in the same sentence /p word2 - search for word1 and word2 in the same paragraph /4 word2 - search for word1 and word2 within 4 words of each other
word1
word1
word1
word can be replaced with quoted strings. It's amazing how this will enable one to focus a complex search. Moreover, it's simple, easy to understand, and relatively simple computationally.
Recently, I've noticed a trend in 'landing' pages dominating the results, the kind that the search engine optimizers have been saying get you to the top of the engines. Experts have been saying that those don't work on Google, but over the last couple of months they *have* been working apparently. For instance, do a search for "80/20 mortgage". The first 6 results are all clearly the same search engine "bait" and Google appears to have taken it, hook, line and sinker. None of those pages are real content and none of them are either explanations of what an 80/20 mortgage is or even companies offering 80/20 mortgages.
I used this as an example, both because I already was looking for one and because it's a pretty non-geeky kind of thing to search for, rather than looking at results for Linux and complaining about MS entries.
The Glass is Too Big: My Take on Things
Amen to using Google catch at work. Really helps to evade your companies firewall. They can't block goggle from the company firewall where I work. There would be a lot of people lost at work without goggle.
Firstly, the languages in question are called Inuktikut and Inuptiaq. Inuit is the name that the people who speak those languages call themselves. Secondly, they do have noun phrases and verb phrases, just like other languages. However, these languages also demonstrate much more bound morphology than english, so there isn't as much of a need for combinational rules and transformations to denote mood, tense, or even action. Just because inuktikut isn't as "space delimited" as English is, doesn't mean it has different units of meaning. After all, can you say that French has no past tense because you have to conjugate verbs to indicate it? You would be on better ground if Chomky were the only one advancing this kind of theory, but it is one of the best supported ones in linguistics.
====
Crudely Drawn Games