Google To Gain a Rival?
markpapadakis writes "Seems like Google got itself a new rival, which seem to have the potential to actually challenge successfully our beloved'G'. hTeoma Technologia launched a beta version of its search engine which enhances the link analysis idea, borrowing some ideas from Google and extending it to recognise 'communities' of subjects."
Thank God for that post-posting editing function, eh Hemos? ;)
Within 10 minutes, the link to searchenginewatch.com changed 4 times: teoma%20.html, toema.html, toema. html..
Oh well.
I searched for my name, got a whole lot of page in French (naturally) but this searhc engine doesn't seems to know french.
So it groups result under silly "Topics" like "Le" or "De" ( = "the" or "of")
Too bad...
If they want to have this be actually useful they will need more up-to-date scans of the Web than once every 8 or 10 months (!).
> per person in l998 while the United States expended $4,270 per person
> insuring only 84 percent of our citizens."
Which is why we see large number of americans going across the border for canadian healht care.
oh, wait a minute . . .
:)
>not only that, its cruel
> and disgusting to hold people's health ransom for money...
Far better to make it illegal to own, say, a private CAT scan machine, and hold health ransom to time, while allowing vetrinarians to have the same machines to use on pets (which stand idle while people die waiting their turn for the human ones).
hawk
It's not a matter of security. A lot of stuff you don't want indexed because it's temporary or maybe just too variable (dynamically generated). robots.txt is mostly for the search engines' own good, and if they're ignoring that, they'll eventually be full of junk.
This page sums up pretty much what I feel about software patents.
You can't see this if you have sigs turned off.
If you spent $300 million researching that new way to build a house (granted, I don't know how much Google spent researching indexing topics) , and then you started charging to build houses for people lets say, 300% faster and with 200% more strength at the end, would you want some other company coming in and stealing your process that you spent $300 million developing? How would you plan to recoup the research costs? Would you even spend the money reseatching in the first place if you KNEW that it would just get ripped off and you wouldn't be able to get any of that research money back because of competition? Funny, patents actually are good for innovation.
You can't see this if you have sigs turned off.
And in other news, Netiquette has passed away after a prolonged fight against idiocy on the net. GIF at 11.
--
Imagine someone who created a cure to AIDS and patented it, that person could charge ridiculous amounts of money for this cure even if it was something simple and cheap to make. People who couldn't afford this cure would die simply because John Doe patented his cure.
Interesting you should use this analogy. I've just recently been asked to participate in a genetic study of diabetes (I've been a diabetic since I was 4). All I have to do for now is provide a blood sample. In the information I was given, they specifically say that you have to give up any rights to products that may be derived from research. Specifically, they want to be able to patent the products that they produce from this research so that pharmaceutical companies can license the rights to manufacture them.
So, do I have a problem with this. Nope, none whatsoever. Because, if the big pharmaceutical companies can't protect their product then they won't manufacture it. And if they don't, who will? Who else can afford the R&D? It may be that by giving up my rights to this research I will help to provide a cure or prevention for diabetes. I'm happy with that.
And when is this likely to happen? Never. So we're stuck with having patents and big companies making and selling the products to save peoples lives, or no products and a lot of dead people. Make your choice.
Their database is old and doesn't seem to be all that big. They don't seem to honor boolean terms. They'll throw back dozens or hundreds of related pages from the same part of the same site without grouping them or squelching them. No apparent support for fuzzy spelling variations.
And when Apache and Debian show up at the top of a query on "Linux", it throws the sophistication of Google's relevance calculations into relief. Apache and Debian are linked from a ton of web pages, but the overwhelming majority of those pages are message board postings and message board TOCs or things like "This site runs on..." page footers. What this says to me is that Teoma isn't doing a good job of weighting the relevance and prominence of inbound links. It's as though it's going purely on the raw number of times the search term appears in a page linking to Site x regardless of how many are clearly identical and thus probably links from menus and TOCs, and not from the pages' unique content, where a link should count far more.
Hmm. I use Google because it finds what I want faster, more efficiently and more accurately than any other search engine I've used.
I love the clean simple fast interface. I love the lack of flashing banner ads. I love the relevance of the text based ads, and the differentiation between those and my search results. I love the categories, and that half the time it'll show me a category listing exactly what I'm after, as well as the normal list of sites. I love the fact that I can have Google in Dutch, despite not speaking that language. I love the site: tag and the difference it makes when looking for UK sites or for something on a specific website. I love the cache and how it insures me against the aging web. I love the sheer breadth of material available. I love the approach and insight of the company, how it focusses on searching, making searching easier, and on being good at searching, and doesn't get distracted by obscure business models. I love the way that occasionally they switch out the normal logo for one that celebrates a given day, and then links that logo to a search result that is relevant.
Oddly enough, the fact that they're running on x thousand PCs running a free operating system doesn't really impact on me at all. I have immense respect for the engineering involved, and for the responsiveness of the site, but I also wonder if a hardcore IBM mainframe might have been cheaper overall.
If MS bought Google, I would still use it. If they started showing banner ads, popups, forcing you to hold a Passport account, prevent non-IE browsers viewing the site, then no, I wouldn't use it.
Right now there is no search engine that comes close to the beauty of Google. I recognise that beauty from a technological perspective, irregardless of the back-end OS being used.
~Cederic
Some links are scripts. The whole idea behind robots.txt is that some links may never end or won't give fixed results. It's good advice for you to follow as it will keep your spider from spending all day following links and it will keep your search engine from indexing content that will be different the next time it is visited.
> ANYONE CAN READ IT
Actually, if I found your spider ignoring my robots.txt, I'd block you and you'd never see my site again.
-- Don't Tase me, bro!
most of us who use Google were fans waay back when their database was a fraction of the size..
During Google's Beta period, they focused on indexing tech-related sites, specifically Unix/Linux/Perl/etc related stuff.
I think that's why the fanbase on Slashdot grew so quickly - they were exactly the target market. And the expectation that they would spread the word to their technical and non-technical friends has been successful. I know several non-tech users of Google who must have found them by word-of-mouth (seeing how they don't advertise).
--
Business. Numbers. Money. People. Computer World.
I just decided to go take a poke around, and as a test, I decided to perform a search on linux mips. (I've been browsing around recently and doing a bit of hacking on it lately, and I know which sites I found the most relevant for it.)
The results, currently, are pretty similar. The first link on the page pointed directly to the Linux/MIPS HOWTO, which I've been referring to quite often recently. Everything else is quite similar down the rest of the first 10 results as well.
Google still has it's advantages over Teoma at the moment though:
It's one of those things that quite frequently are useful when you're searching for something: instead of landing on the main page of the site (if that contains your search terms, and is of course linked more often), you can go directly to the part of the site that addresses exactly what you're looking for.
I really hate it when a site that I want to go visit has pulled it's content or moved it around. But if I'm doing a search on Google, or I even know the last known address of a page, I can just head over to the Google cache and often pull up exactly what I'm looking for, even if the content has been moved or deleted on it's original server. Sites, unfortunately, do vanish from time to time. It's always nice to be able to access that content when you need it most.
Anyway, that would just be my whole 2c on it.
Dogma: Dead (mostly because your Karma ran it over)
You forgot the most important of all... :-)
No "I'm feeling Lucky" button!
bp
Here's the real link.
Mark Prindle, the most underappreciated genius on the web.
I spent a good bit on an adword compaign that picked theKompany and other KDE keywords following theKompany's claim that such competitive advertising was illegal. Needless to say the KDE camp went all out, hit spamming my ads, I went though around 10,000x the number of impressions/hour I was supposed to. Google staff was prompt, courteous, fixed the problem, tracked the spammers back to germany (?) and refunded my money.
As for credibility, they'd be one company that I'd be willing to give my email address to, knowing that they get it and won't be sending me "Important Updates" every month.
Competition is great, but let's not forget the good that Google has done. We need a well funded company to fight off things like the Altavista patent lawsuits on searching.
I don't understand why some folks are so virrulantly anti-google. The flack they took for putting up the deja archives who totally unreasonable seeing as they had barely got the archives out from under deja.com's decaying body. And their new image search is damn cool.
- some results are totally unrelated to the word(s) I inserted
- results with Umlauts are shown in a wrong character-set, resulting in garbage
- the number of the results is only 1/5 ~ 1/10 of the results Google or Altavista give for the same searchterm, so I suppose Theoma has indexed only a 10th of the pages other searchengines have
- Oh, they use Helvetica... it looks really ugly on my Win89Box, with some adiacent characters overlapping
- and well, I love Google Groups, the Google Cache, the changing Google Logo, the ability to try the search on other engines...
Theoma has a loooong way to go, but then: also Google took 2 years to beat Altavista, so for Theoma there may lay another 2 years ahead... Since Altavista revamped their search-algorithm, and speeded up their interface, when Google arrived; the same will happen again: Google AND Altavista will make their search better again.just my 2 c
ms
maybe they will italicize everyone to death.
BilldaCat
How can a company expect to make money (and stay in business) by carrying ads? Maybe you haven't been following the fallout of the dot-conomy.
www.timcoleman.com is a total waste of your time. Never go there.
the number of the results is only 1/5 ~ 1/10 of the results Google or Altavista give for the same searchterm, so I suppose Theoma has indexed only a 10th of the pages other searchengines have
That's what the article on searchenginewatch says:
Teoma is a crawler-based service and has a collection of about 100 million
URLs. Of course, to be a serious contender in the search engine space, Teoma will
need to grow, and it is planning to do so.
It's nice, but the problem is that those search engine with bought rankings also "poison" meta search engines. For one request, I got download.cnet.com as #1 site because it was ranked very high on various sites used by vivisimo. It had *nothing* to do with the request :-(
I would also appreciate it if all high rankings of a site are displayed. It helps you to find out where you must still submit your own site.
I've already discovered Vivisimo, which is a nice step up from the meta-search-engine garbage of yesteryear. (Disclaimer: I go to CMU, which developed much of the technology behind Vivisimo, but I personally didn't work with it.) Not only does it sort links by relevance, it also categorizes results. I found it very useful when doing a research project last year -- searching for "Japanese Women" on even the most finely tuned search engine turns up pages of results that can be diplomatically called "non-academic."
I doubt it's a replacement for Google, but I recommend it the next time you're searching for a topic that might have several different meanings.
For more information, click here.
If someone found a cure for AIDS, shouldn't they be able to get paid for that? Shouldn't people who do the most good for the world receive the largest rewards?
And so what if it's expensive? Say you had AIDS and the cure cost $10,000. Are you going to go buy a sports car instead?
You sound like one of those "I'll take what I want and rationalize it later" type of people. Here's a tip: If you don't bother rationalizing, you can take things faster.
The person who can cure AIDS can already "say millions of people who can't afford the cure must die" by just not bothering to cure it. Maybe he thinks the rewards aren't worth the effort.
If you were in charge, maybe they wouldn't be.
Fortunately, I live in a free society where transactions are voluntary. No one is forced to produce, and no one is forced to buy, but we do it anyway because it's in our individual best interest.
Each transaction makes both the buyer and the seller better off. That's why we both say "Thank You" after the transaction. This is why transactions happen, and this is why more transactions are better than fewer.
Your efforts to reduce the incentive for the producer will result in less producer effort and fewer transactions. This makes both the buyers and the seller worse off.
In the case of an AIDS cure, worse off for the buyer means DEAD.
I checked it, and although it does rank relevant matches well, it is lacking on the indexing, as well as on the caching. These are two of google's strongest points.
With Google, I was searching for information which turned out to be on a defunct website. I was able to get what I wanted by searching within the google caches for the individual url's linked to. With this new one, that's right out.
Google also got the newsgroups that deja had. They are still not quite up to snuff (threading still sucks), but Tehoma lacks even that.
On the other hand Tehoma IS still in beta, and will probably get better. They will continually be indexing the web, and if they are smart will be refining their search process to give more revelant links.
Even if they do rise to a par with Google, I'd still use Google just because the caching of pages is so useful.
There is a civil war coming in the United States. Remember which side has most of the guns
- wonderland child porn
- skinhead porn
- seattle post-intelligencer january 12, 2001 crossword puzzle
- why parents exploit children for pornography
- any telephone number of anyone in Ohio<U>S>A>
WTF?--
The near word is implicitely in every search-- pages rank higher when the search terms are found near each other.
--
Slashdot's robots.txt doesn't include /articles where discussions can be found.
well.. if you're doing the minimalistic thing with 1 graphic for your company name, a single textbox and 1 submit button, you're bound to look like google..
//rdj
No one can understand the truth until he drinks of coffee's frothy goodness.
--Sheikh Abd-Al-Kadir, 1587
Well I did read this (yes, I actually READ the referenced article before posting):
"Currently in beta, the site is primarily intended to demonstrate Teoma's technology to potential partners or buyers."
Also, Google lack the ability to do things like:
True, many poeple cannot understand how to formulate a Boolean query correctly, and that may be why Google doesn't feel it's important. However, just like a CLI is better for some tasks than a GUI, a Boolean search it better for some tasks than what Google has now.
www.eFax.com are spammers
and be able to filter out the crap.
If google would allow a post-processing phase to apply this sort of logic AV would disappear from my list of search engines.
www.eFax.com are spammers
There's probably a problem trying to keep track of how many sites link to the site in question, and every time a page changes, you have to recalculate the weights of sites linked from that page.
Then you have to consider stuff like 404 pages that redirect to a page that links to the main site.
After that there are banner ads etc which you may or may not want to exclude from the weighting.
---
-----------------------
Nicotine free Amish .sig.
-----------------------
Nicotine free Amish .sig.
The money they will get out of this has little to do with the single user search engine, and everything to do with the data mining. Companines like Google can mine their databases to do marketing queries on a HUGE scale. The search engine is great for the rest of us and a great advertising tool for them, but it is not where the money is.
bash-2.04$
bash-2.04$
bash-2.04$yes "Don't you hate dialup connections?"| write USERNAME
How long do you think, assuming that this new technique is valid, will it take for Google to catch up and provide similar results?
They already have 5 of the 6 requirements ( as I see em):
1. Existing, proven, scalable infrastructure
2. Gob-loads of search engine experience && the programmers/net admins to back it up
3. A better name (Marketing, sadly, does count)
4. ~1.3 billion pages already 'spidered' and waiting to be re-munched using any technique they deem appropriate
5. A lot of high-paying corporate customers (Yahoo!, RedHat etc) which helps pay for everything... and lets face it... money talks.
ALl they really need is an algorithm.... whish shouldn't be a problem from the guys that revolutionized searching in the first place.
My $0.02
DOS is dead, and no one cares...
DOS is dead, and no one cares...
If there's a Bourne Shell, I'll see you there
I don't trust search engines that don't let me get lucky... um... feel lucky...
DOS is dead, and no one cares...
DOS is dead, and no one cares...
If there's a Bourne Shell, I'll see you there
Ditto (or at least top 5, depending on what search phrase you use, but any obvious ones seem to work). Like I said, not the most scientific method and I guess it's all down to how effective their spiders are. Having said that, I'm still going to take it personally : (
Oh, and it doesn't seem to have indexed as much of the web as Google yet (admittedly, tested using the not-very-scientific method of searching for myself and my site), but I guess that'll come with time.
most of us who use Google were fans waay back when their database was a fraction of the size that Teoma's is now, and we still swore by it. It's interesting that some of the same people I have talked to who were militant in their support of Google (is, it's "our" search engine!) now are disdaining Teoma.
And I am sure that Google will respond to the challenge with honor - I can't imagine that Google would try a patent challenge. It seems so out of character. But then again, I may be guilty of putting Google on a pedestal just because it was started by other geeks. Though one could make the argument that in today's downturn economy, patent litigation is just good business sense. There are no morals or honor in pure capitalism.
I'll add Teoma to my bookmarks - if they give me better results than Google, I'll switch in a heartbeat. Even if they run M$ IIS !
Don't blame me - I voted for Howard Dean. http://dean2004.blogspot.com
right, obviously, since the article clearly says the site is just a demo to attract interest from investors. Teoma has not yet decided whether or not to run as a standalone search engine.
PLEASE read the article before posting
Don't blame me - I voted for Howard Dean. http://dean2004.blogspot.com
It basically involves two weighted listing of sites. Sites in the second list pointed to by sites in the first list earn weight points based on the weighted value of sites in the first list. Sites in the first list earn weighted value based on the site that they point to in the second list.
You iterate this a few times and you end up with the first list being a listing of "Link Pages" which have a lot of useful links on the subject. The second list becomes an ordered listing of "authortive sites", sites that are pointed to by many other sites.
What's really neat about this is this method has the ability to find seperate communities. For instances, search for the word jaguar and this method will give you authoritive sites and link pages for the car, the animal, and the atari games system quite easily....becuase each meaning of the word jaguar would have a distinct grouping of authortive sties and link pages.
What's more is this type of problem can be formulated as a eigenvector calculation for the matrix of link pages, and authoritive sites.
-jef
Anyway, none of the ideas used by Google and now Teoma are really new -- academics have been doing this stuff for a while now.
This idea is fine, until you look at the idea that any super intelligent AI like that might censor the links for your own good. You might not find anti microsoft links unless you specified hate and microsoft, for example. Or it might be too much stress in your life to know about the impending comet strike, and so that is left out of the search results, even if you choose to vacation at ground zero.
After all, it is smarter than humans, and hopefully is more moral? The question on what to do with "the questions of morals", and whose "morals to program it with" becomes very disturbing when applied to super intelligent search engines.
Check out the Vinny the Vampire comic strip
"It is a greater offense to steal men's labor, than their clothes"
On a slightly related note, Google's director of operations and head sys-admin gave a great technical presenation of why google runs so damn fast last week at the Bay LISA meeting. Two of the more interesting things were that Google's colors on their search page were chosen for rendering efficiency and the fact that they have a team of people who actually count the bytes on their pages to make sure that you are getting all the necessary info with the least possible bytes. Considering it was a free talk, it was very interesting for us linux enthusiasts.
Hopefully this newcomer will go to the same lengths to make their search engine competitive...
there are no stupid questions, but there are a lot of inquisitive idiots
As long as search results are the products of advertisers, won't all search engines suck?
Funny, patents actually are good for innovation.
No, your wrong. Research will go back to the 'public domain' or 'common knowledge' area and be done at University - supported by all.
When a 'new better' method is founded, the 'good parts' of capitalism will empower anyone who cares to, to try and make that thing better, faster and cheaper. If you allow hole realms of knowledge (like ranking pages based upon their a href's) you will enable a company to operate *contrary* to what a 'free market' would provide. You cannot have both numerous and broad monopolies and free markets. This is ridiculous and counter productive.
Note: I myself favour a controlled and planned economy, capitalism enables terrible choices which do not align with healthy sustainable futures (like pollution, consumerism, exploitation of labour and whatnot) - *B*U*T* if the Plutocrats who run the G8 are going to run this capitalism game, they at least have to do it right by not enshrining businesses as all-powerfull-gods-of-knowledge... but I digress.
to invest into a product that won't give them good returns
See again, your stuck in a rut... why are we only researching medicing that provieds a 'good return'? WTF is our motivation again? oh yeaah, its HEALHTY PEOPLE, when we have an idea on how to make PEOPLE HEALTHY we should INVESTIGATE. BigPharm only investigates what makes profit..
Do you understand the clear and distinct different motives here?
do I have a problem with this. Nope, none whatsoever. Because, if the big pharmaceutical companies can't protect their product then they won't manufacture it. And if they don't, who will? Who else can afford the R&D? It may be that by giving up my rights to this research I will help to provide a cure or prevention for diabetes. I'm happy with that.
Big Pharm spends *by far* more on advertising than Research. See here. Also, to as a side-note, please see here to understand that free-market capitalism in the health care industry doesnt make sense; to note "Canada insured 100 percent of its citizens for $2,250 per person in l998 while the United States expended $4,270 per person insuring only 84 percent of our citizens.", not only that, its cruel and disgusting to hold people's health ransom for money...
De-Regulating the health care industry is more about stable profit for big-pharm than anything else.. Canada and Britain's citizens would do well to understand what 'American Style' health care really means. Fewer healthy people, higher cost, profiteering at the expense of your health (literally).
What does this have to do with R&D & Patents? Patents are weapons used by the Health Care Industry to kill people for money. The 'R & D' they do is to make money. Neither thing has 'beans' to do with Healthy People. The R&D should be done by doctors with alot less attachment to profit motives, which by nature, make for an *UNHEALTHY* "Health Industry"..
"So how do you motivate people to make others healthy when your only incentive is profit" would be a better question.
Well, if you search for Linux using this thing the first site you get is Debian, not linux.org. (Following that are Freshmeat and Apache, incidentally.)
Google's first hits, on the other hand, are linux.org, linux.com, and RedHat.
Could this be intentional?
I don't give a rip whether they rip off every Google concept ever used. I don't care whether the site is identical to Google in almost every way. No patent should ever be allowed to protect the look and feel of a web site, they are designed to protect inventions-- and that would be the backend part on a web site. The look and feel should be a copyright or trademark issue, and frankly, I don't feel like Google has done anything distinctive in any of these categories to merit a patent or any other intellectual "property" protection.
Yes, Google has a rather nifty approach to ranking and search heuristics, but tough noogies on them if someone else can write software that reacts similarly to the information they gather from the world. As it is, Teoma has some features in their beta that differentiate them significantly from Google (in good ways). Already I've managed to see who is linking to me (and not having to rely on referrer logs for that) and that is way cool (and if Google shows that, I am interested in knowing how to see it).
On a less enthusiastic note, a search on "ichimunki" turned up www.ichimunki.com first (woohoo!) and a whole boatload of Slashdot postings (bleah, who needs that indexed).
I do not have a signature
I find it quite nice that this search engine totally ignored my robots.txt and scanned my entire site anyway. How can a search engine, so friggin complex and monstrous, ignore the basics of spider etiquette ?
I guess it's time to rename my directories again.
-Billco, Fnarg.com
Google had no ads at the outset either.
Additonally, who's to say that those google-ites haven't improved their technology over the last year or so. I'm sure many of us have turned exlusively to Google's tried and true system... oh so easy... oh so accurate.
Finally I think we love Google's look and those tiny little modifications they make to their logo on the special (but mostly American) dates.
Hey, if someone can better it, we could all use a search with a button "The right link."
yoink
Some of the better search criterion that lead to my rather benign site:
For instance, one of the dominant /. themes is the incessant railing about the evils of IP and patents. Yet google has what probably amounts to a boatload of patents, and they don't seem to get called on it (nor does Transmeta, or Tivo, for that matter). All the patents references I saw in the earlier comments were along the lines of "hmm, google has these patents, wonder if we're set for a big patent fight".
I bet if MS owned and operated google, /.ers would hate it and would never stop editorializing about the consequent coming of the Apocalypse.
I know it's still in beta, but so far, after a few test searches, I'd have to say it's a far cry from Google. I don't find the results as useful for a couple of reasons:
1. Searches default to allow embedded matches, so that a search for '2gb bios limit' yields a bunch of crap about 3.2gb limits, etc.
2. Results give about a line and a half of text from the site for you to review, but it's not necessarily text that contains your search criteria, so it's hard to judge relevance for yourself.
I judge it not ready for prime time.
This is interesting - they have no ads on their pages. How do they expect to make money (and stay in business)? Not that I am complaining - I like the clean interface of google and teoma.
Didn't Google get all sorts of patents on the concepts used in their search engine? You have to wonder if a patent fight is on the horizon. I for one encourage competition and it'll take some serious innovation to displace my beloved Google, but I say more power to them as long as they aren't just ripping off Google's concepts.
Top Most Bizarre/Disturbing Error Messages
It seems to be not concentrated in pages but in sites, so being rather a different approach to google.
In any case a link to keep and a technology to watch. There are never too many good search engines. Good luck to them!.
--
Rome taught me patience and assiduous application to detail. Virtues which temper the boldness of great, general views.