Google To Gain a Rival?
markpapadakis writes "Seems like Google got itself a new rival, which seem to have the potential to actually challenge successfully our beloved'G'. hTeoma Technologia launched a beta version of its search engine which enhances the link analysis idea, borrowing some ideas from Google and extending it to recognise 'communities' of subjects."
It's not a matter of security. A lot of stuff you don't want indexed because it's temporary or maybe just too variable (dynamically generated). robots.txt is mostly for the search engines' own good, and if they're ignoring that, they'll eventually be full of junk.
If you spent $300 million researching that new way to build a house (granted, I don't know how much Google spent researching indexing topics) , and then you started charging to build houses for people lets say, 300% faster and with 200% more strength at the end, would you want some other company coming in and stealing your process that you spent $300 million developing? How would you plan to recoup the research costs? Would you even spend the money reseatching in the first place if you KNEW that it would just get ripped off and you wouldn't be able to get any of that research money back because of competition? Funny, patents actually are good for innovation.
You can't see this if you have sigs turned off.
Hmm. I use Google because it finds what I want faster, more efficiently and more accurately than any other search engine I've used.
I love the clean simple fast interface. I love the lack of flashing banner ads. I love the relevance of the text based ads, and the differentiation between those and my search results. I love the categories, and that half the time it'll show me a category listing exactly what I'm after, as well as the normal list of sites. I love the fact that I can have Google in Dutch, despite not speaking that language. I love the site: tag and the difference it makes when looking for UK sites or for something on a specific website. I love the cache and how it insures me against the aging web. I love the sheer breadth of material available. I love the approach and insight of the company, how it focusses on searching, making searching easier, and on being good at searching, and doesn't get distracted by obscure business models. I love the way that occasionally they switch out the normal logo for one that celebrates a given day, and then links that logo to a search result that is relevant.
Oddly enough, the fact that they're running on x thousand PCs running a free operating system doesn't really impact on me at all. I have immense respect for the engineering involved, and for the responsiveness of the site, but I also wonder if a hardcore IBM mainframe might have been cheaper overall.
If MS bought Google, I would still use it. If they started showing banner ads, popups, forcing you to hold a Passport account, prevent non-IE browsers viewing the site, then no, I wouldn't use it.
Right now there is no search engine that comes close to the beauty of Google. I recognise that beauty from a technological perspective, irregardless of the back-end OS being used.
~Cederic
Some links are scripts. The whole idea behind robots.txt is that some links may never end or won't give fixed results. It's good advice for you to follow as it will keep your spider from spending all day following links and it will keep your search engine from indexing content that will be different the next time it is visited.
> ANYONE CAN READ IT
Actually, if I found your spider ignoring my robots.txt, I'd block you and you'd never see my site again.
-- Don't Tase me, bro!
I just decided to go take a poke around, and as a test, I decided to perform a search on linux mips. (I've been browsing around recently and doing a bit of hacking on it lately, and I know which sites I found the most relevant for it.)
The results, currently, are pretty similar. The first link on the page pointed directly to the Linux/MIPS HOWTO, which I've been referring to quite often recently. Everything else is quite similar down the rest of the first 10 results as well.
Google still has it's advantages over Teoma at the moment though:
It's one of those things that quite frequently are useful when you're searching for something: instead of landing on the main page of the site (if that contains your search terms, and is of course linked more often), you can go directly to the part of the site that addresses exactly what you're looking for.
I really hate it when a site that I want to go visit has pulled it's content or moved it around. But if I'm doing a search on Google, or I even know the last known address of a page, I can just head over to the Google cache and often pull up exactly what I'm looking for, even if the content has been moved or deleted on it's original server. Sites, unfortunately, do vanish from time to time. It's always nice to be able to access that content when you need it most.
Anyway, that would just be my whole 2c on it.
Dogma: Dead (mostly because your Karma ran it over)
Here's the real link.
Mark Prindle, the most underappreciated genius on the web.
I spent a good bit on an adword compaign that picked theKompany and other KDE keywords following theKompany's claim that such competitive advertising was illegal. Needless to say the KDE camp went all out, hit spamming my ads, I went though around 10,000x the number of impressions/hour I was supposed to. Google staff was prompt, courteous, fixed the problem, tracked the spammers back to germany (?) and refunded my money.
As for credibility, they'd be one company that I'd be willing to give my email address to, knowing that they get it and won't be sending me "Important Updates" every month.
Competition is great, but let's not forget the good that Google has done. We need a well funded company to fight off things like the Altavista patent lawsuits on searching.
I don't understand why some folks are so virrulantly anti-google. The flack they took for putting up the deja archives who totally unreasonable seeing as they had barely got the archives out from under deja.com's decaying body. And their new image search is damn cool.
- some results are totally unrelated to the word(s) I inserted
- results with Umlauts are shown in a wrong character-set, resulting in garbage
- the number of the results is only 1/5 ~ 1/10 of the results Google or Altavista give for the same searchterm, so I suppose Theoma has indexed only a 10th of the pages other searchengines have
- Oh, they use Helvetica... it looks really ugly on my Win89Box, with some adiacent characters overlapping
- and well, I love Google Groups, the Google Cache, the changing Google Logo, the ability to try the search on other engines...
Theoma has a loooong way to go, but then: also Google took 2 years to beat Altavista, so for Theoma there may lay another 2 years ahead... Since Altavista revamped their search-algorithm, and speeded up their interface, when Google arrived; the same will happen again: Google AND Altavista will make their search better again.just my 2 c
ms
It's nice, but the problem is that those search engine with bought rankings also "poison" meta search engines. For one request, I got download.cnet.com as #1 site because it was ranked very high on various sites used by vivisimo. It had *nothing* to do with the request :-(
I would also appreciate it if all high rankings of a site are displayed. It helps you to find out where you must still submit your own site.
I've already discovered Vivisimo, which is a nice step up from the meta-search-engine garbage of yesteryear. (Disclaimer: I go to CMU, which developed much of the technology behind Vivisimo, but I personally didn't work with it.) Not only does it sort links by relevance, it also categorizes results. I found it very useful when doing a research project last year -- searching for "Japanese Women" on even the most finely tuned search engine turns up pages of results that can be diplomatically called "non-academic."
I doubt it's a replacement for Google, but I recommend it the next time you're searching for a topic that might have several different meanings.
For more information, click here.
The near word is implicitely in every search-- pages rank higher when the search terms are found near each other.
--
Slashdot's robots.txt doesn't include /articles where discussions can be found.
Well I did read this (yes, I actually READ the referenced article before posting):
"Currently in beta, the site is primarily intended to demonstrate Teoma's technology to potential partners or buyers."
and be able to filter out the crap.
If google would allow a post-processing phase to apply this sort of logic AV would disappear from my list of search engines.
www.eFax.com are spammers
The money they will get out of this has little to do with the single user search engine, and everything to do with the data mining. Companines like Google can mine their databases to do marketing queries on a HUGE scale. The search engine is great for the rest of us and a great advertising tool for them, but it is not where the money is.
bash-2.04$
bash-2.04$
bash-2.04$yes "Don't you hate dialup connections?"| write USERNAME
How long do you think, assuming that this new technique is valid, will it take for Google to catch up and provide similar results?
They already have 5 of the 6 requirements ( as I see em):
1. Existing, proven, scalable infrastructure
2. Gob-loads of search engine experience && the programmers/net admins to back it up
3. A better name (Marketing, sadly, does count)
4. ~1.3 billion pages already 'spidered' and waiting to be re-munched using any technique they deem appropriate
5. A lot of high-paying corporate customers (Yahoo!, RedHat etc) which helps pay for everything... and lets face it... money talks.
ALl they really need is an algorithm.... whish shouldn't be a problem from the guys that revolutionized searching in the first place.
My $0.02
DOS is dead, and no one cares...
DOS is dead, and no one cares...
If there's a Bourne Shell, I'll see you there
I don't trust search engines that don't let me get lucky... um... feel lucky...
DOS is dead, and no one cares...
DOS is dead, and no one cares...
If there's a Bourne Shell, I'll see you there
Oh, and it doesn't seem to have indexed as much of the web as Google yet (admittedly, tested using the not-very-scientific method of searching for myself and my site), but I guess that'll come with time.
most of us who use Google were fans waay back when their database was a fraction of the size that Teoma's is now, and we still swore by it. It's interesting that some of the same people I have talked to who were militant in their support of Google (is, it's "our" search engine!) now are disdaining Teoma.
And I am sure that Google will respond to the challenge with honor - I can't imagine that Google would try a patent challenge. It seems so out of character. But then again, I may be guilty of putting Google on a pedestal just because it was started by other geeks. Though one could make the argument that in today's downturn economy, patent litigation is just good business sense. There are no morals or honor in pure capitalism.
I'll add Teoma to my bookmarks - if they give me better results than Google, I'll switch in a heartbeat. Even if they run M$ IIS !
Don't blame me - I voted for Howard Dean. http://dean2004.blogspot.com
right, obviously, since the article clearly says the site is just a demo to attract interest from investors. Teoma has not yet decided whether or not to run as a standalone search engine.
PLEASE read the article before posting
Don't blame me - I voted for Howard Dean. http://dean2004.blogspot.com
It basically involves two weighted listing of sites. Sites in the second list pointed to by sites in the first list earn weight points based on the weighted value of sites in the first list. Sites in the first list earn weighted value based on the site that they point to in the second list.
You iterate this a few times and you end up with the first list being a listing of "Link Pages" which have a lot of useful links on the subject. The second list becomes an ordered listing of "authortive sites", sites that are pointed to by many other sites.
What's really neat about this is this method has the ability to find seperate communities. For instances, search for the word jaguar and this method will give you authoritive sites and link pages for the car, the animal, and the atari games system quite easily....becuase each meaning of the word jaguar would have a distinct grouping of authortive sties and link pages.
What's more is this type of problem can be formulated as a eigenvector calculation for the matrix of link pages, and authoritive sites.
-jef
On a slightly related note, Google's director of operations and head sys-admin gave a great technical presenation of why google runs so damn fast last week at the Bay LISA meeting. Two of the more interesting things were that Google's colors on their search page were chosen for rendering efficiency and the fact that they have a team of people who actually count the bytes on their pages to make sure that you are getting all the necessary info with the least possible bytes. Considering it was a free talk, it was very interesting for us linux enthusiasts.
Hopefully this newcomer will go to the same lengths to make their search engine competitive...
there are no stupid questions, but there are a lot of inquisitive idiots
do I have a problem with this. Nope, none whatsoever. Because, if the big pharmaceutical companies can't protect their product then they won't manufacture it. And if they don't, who will? Who else can afford the R&D? It may be that by giving up my rights to this research I will help to provide a cure or prevention for diabetes. I'm happy with that.
Big Pharm spends *by far* more on advertising than Research. See here. Also, to as a side-note, please see here to understand that free-market capitalism in the health care industry doesnt make sense; to note "Canada insured 100 percent of its citizens for $2,250 per person in l998 while the United States expended $4,270 per person insuring only 84 percent of our citizens.", not only that, its cruel and disgusting to hold people's health ransom for money...
De-Regulating the health care industry is more about stable profit for big-pharm than anything else.. Canada and Britain's citizens would do well to understand what 'American Style' health care really means. Fewer healthy people, higher cost, profiteering at the expense of your health (literally).
What does this have to do with R&D & Patents? Patents are weapons used by the Health Care Industry to kill people for money. The 'R & D' they do is to make money. Neither thing has 'beans' to do with Healthy People. The R&D should be done by doctors with alot less attachment to profit motives, which by nature, make for an *UNHEALTHY* "Health Industry"..
"So how do you motivate people to make others healthy when your only incentive is profit" would be a better question.
I find it quite nice that this search engine totally ignored my robots.txt and scanned my entire site anyway. How can a search engine, so friggin complex and monstrous, ignore the basics of spider etiquette ?
I guess it's time to rename my directories again.
-Billco, Fnarg.com
Google had no ads at the outset either.
Additonally, who's to say that those google-ites haven't improved their technology over the last year or so. I'm sure many of us have turned exlusively to Google's tried and true system... oh so easy... oh so accurate.
Finally I think we love Google's look and those tiny little modifications they make to their logo on the special (but mostly American) dates.
Hey, if someone can better it, we could all use a search with a button "The right link."
yoink
Some of the better search criterion that lead to my rather benign site:
For instance, one of the dominant /. themes is the incessant railing about the evils of IP and patents. Yet google has what probably amounts to a boatload of patents, and they don't seem to get called on it (nor does Transmeta, or Tivo, for that matter). All the patents references I saw in the earlier comments were along the lines of "hmm, google has these patents, wonder if we're set for a big patent fight".
I bet if MS owned and operated google, /.ers would hate it and would never stop editorializing about the consequent coming of the Apocalypse.
This is interesting - they have no ads on their pages. How do they expect to make money (and stay in business)? Not that I am complaining - I like the clean interface of google and teoma.
Didn't Google get all sorts of patents on the concepts used in their search engine? You have to wonder if a patent fight is on the horizon. I for one encourage competition and it'll take some serious innovation to displace my beloved Google, but I say more power to them as long as they aren't just ripping off Google's concepts.
Top Most Bizarre/Disturbing Error Messages
It seems to be not concentrated in pages but in sites, so being rather a different approach to google.
In any case a link to keep and a technology to watch. There are never too many good search engines. Good luck to them!.
--
Rome taught me patience and assiduous application to detail. Virtues which temper the boldness of great, general views.