Slashdot Mirror


Teoma Aims To Kill Google

gwernol writes: "SFGate.com has an interesting article on the relaunch of Teoma's search engine. They are trying to topple Google as the leading search engine. If their technology delivers on its promise then it will at least be some real competition for Google which can only be a good thing."

38 of 313 comments (clear)

  1. I Beta Tested this by telstar · · Score: 5, Informative

    I was a beta tester for this search engine ... rewarded beta testers with a gift certificate at amazon.com. I wasn't all that impressed to be honest. It was fast but the result-set produced wasn't anything spectacular, and the new search features they added were of the "cute and fuzzy" variety. Nothing that would really yield much productivity. They added an associated topics section, and some visual cues to get to information, but given the choice between that and Google, I'd choose Google any day.

  2. Similar to by Joe+the+Lesser · · Score: 3, Funny

    So Teoma is basically the Kia of the search-engine industry no?

    --
    "I only speak the truth"
    Karma: null(Mostly affected by an unassigned variable)
  3. I don't think so by Kizzle · · Score: 3, Funny

    Even if it is as good as google, its still not going to kill it. Google is the search engine everyone knows and loves. Unless this engine comes out with some great feature like reading your mind instead of you needing to type, It's not gona take away too much of google's traffic.

  4. Alas by junkster191 · · Score: 5, Interesting

    I got all excited and went to test it out. Based on my unscientific and arbitrary dozen or so tests of obscure literary phrases, rare medical conditions, and *not* so famous dead people google gave me much more relevant pages every time. Hmm... I don't care how it is supposed to function theoretically, if it doesn't provide results then I'm sticking to google.

    1. Re:Alas by Aanallein · · Score: 4, Informative

      I got all excited and went to test it out. ... google gave me much more relevant pages every time.

      That might be because (according to the article) the new functionality will only be "available beginning at 5 p.m. PST Monday"
      I'm not too certain about the timezones (particularly with daylight savings thrown into the mix; and no, I can't be bothered to look up a worldclock right now), but I think PST time right now is something like 2 a.m.
      So we still have a goodly while to go before we can really see what this search engine is capable of.

      Not that I think this thing can actually beat Google, but at least wait with judging until you've seen the new and improved version of the engine, not what they have now...

  5. Re:Only a search engine by psaltes · · Score: 5, Informative

    Does no one read the article? They are rolling out a new version (which the article was about) tomorrow at 5pm PST! The site that is there now is using presumably months/years old technology. Anyone who's posted so far complaining about some search on Teoma is being fairly silly.

    That said it doesn't sound like the new version will topple google either.

  6. Beta indeed.. by BelDion · · Score: 5, Insightful

    It'll be a while before this Teoma thing can topple Google.

    First of all, no cache. The cache in Google sort of sneaks up on you in its usefulness.. Whether it's because the website is down or because you're looking at an html version of a PDF or word document, you find that you're using the cache all the time.

    More to the point though, how friggin slow is Teoma? I hope it's due to relative newness or something, because it's frightfully slow when running queries. Google flies, click search and the page comes back next to instantly (on a broadband connection anyhow), Teoma seems to be taking several seconds right now. I'd say Slashdot effect, considering where we are, but what kind of poorly designed search engine crumbles under the slashdot effect?

    --

    I am BelDion's .Sig; Who the hell is Jack?
  7. A few notes... by redhatbox · · Score: 4, Informative


    From the Teoma search page:

    "Teoma delivers three types of search results Web Pages: Authoritative sites relevant to your search term. Web Pages by Topic: Top result pages are grouped based on their topics. Experts' Links: Pages contain directories of links for related general subjects."

    Okay, great... but where's the "advanced search" option (such as Google's, at this page)? I know this is a "beta version" of the Teoma site; maybe their advanced search functionality isn't ready for prime time just yet. Or, maybe I've got it all wrong... do they believe their engine is good enough to eliminate the need for advanced search functionality?

    Also of potential interest are a couple of links at the bottom of each search results page. These links let you try your search on AskJeeves.com or DirectHit.com. As I understand it, they're gunning for Google as their biggest competition, but it seems somewhat odd that they'd include links to what most people (at least people I know) consider to be inferior search engines instead.

    Just a couple of thoughts :).

    1. Re:A few notes... by great+throwdini · · Score: 5, Informative

      Also of potential interest are a couple of links at the bottom of each search results page [to] try your search on AskJeeves.com or DirectHit.com. [I]t seems somewhat odd that they'd include links to what most people [...] consider to be inferior search engines instead.

      Complete the thought. Ask Jeeves, Inc. owns both Teoma (September 2001) and Direct Hit (January 2000). The selected URLs prominently display that owership relation.

  8. Ask.com? by slardy · · Score: 4, Interesting

    How is Teoma attached to ask.com (ask jeeves)? When you get search results all the results are linked to this server: http://tm.wc.ask.com/ Did ask.com buy out teoma?

    --
    http://www.nu-vision.org
    1. Re:Ask.com? by great+throwdini · · Score: 5, Funny

      Did ask.com buy out teoma?

      It isn't too hard to follow the link labeled Press Information at the Teoma site to find another link to the Search Engine Watch report entitled Ask Jeeves Acquires Teoma from Ovtober, 2001.

      The good folks at Teoma were even nice enough to excerpt the following:

      "Ask Jeeves has purchased the Teoma search engine, which has attracted interest over recent months as a potential relevancy challenger to Google."

      You may even notice that Ask Jeeves is plastered all over the contact page. I don't think they're hiding the connection between the two brands from anyone.

      Has the use of search engines impaired our ability to follow links from one document to the next?

      Heck, a Google search of your exact question led to the NewsTrove tracking of the assimilation. Then again, the other results were a little iffy. ;)

  9. But will they throw crap at you? by iturbide · · Score: 5, Insightful

    Let face it. Just how good a searchengine is technically is only part of the story. The other part is how much advertising, cookies, links to 'buy a book about whatever on amazon' and all that will they throw at you? You get the idea. This is imho what killed off altavista and loads of other search engines. If people get annoyed enough, are thrown into a portal, or just plain have to wait too long for all that crap to load, they just won't go there.

    If they don't get that right, Google has little to fear.

  10. Reasons to use Teoma over Google by firewort · · Score: 4, Interesting

    Reasons why anyone should use Teoma over Google:

    1) if they don't cave into the demands of the Co$ and delist sites whose outlook on Co$ is less than positive.

    2) if they don't refuse adverts on a very arbitrary basis: they refused non-positive Co$ ads, as well as ads from businesses that sell night vision scopes (and not firearms.)

    see:
    http://www.politechbot.com/p-03325.html - google rejects ads from Co$ critics

    http://www.politechbot.com/p-03260.html
    google rejects ads from firearm-related merchant, accepts SPAM-WARE advertiser.

    Gee, thanks google!

    --

    1. Re:Reasons to use Teoma over Google by cymen · · Score: 3, Funny

      Well one of the two adverts or whatever they are called was to rotten.com... Maybe it slipped through?

  11. Let's Talk About This Tomorrow by Schlemphfer · · Score: 4, Insightful

    Why are we wasting time talking about this search engine now? It launches at 5:00 PM Pacific time Monday. At that point, we'll be able to make useful comparisons to Google.

    --
    I'm generally "Interesting," "Insightful," and even "Funny" here. What the hell happens to me at parties?
  12. It lookes nice and all... by dotgod · · Score: 5, Funny

    but Teoma doesn't have a h4x0r mode like Google.

    1. Re:It lookes nice and all... by dotslash · · Score: 4, Funny

      ...and Google uses Pigeon Clusters (PCs) to rank using it's proprietary PigeonRank (TM)system. I kid you not. Check this page out.

  13. Trying it out... by Pathwalker · · Score: 5, Funny
    When it all comes down to a final reckoning, there is only one search engine attribute that we all care about:
    How well we show up when doing a vanity search.
    Let's see how the search engines stack up:

    1. Searching on my real name.

    When I search on my real name on both Google and Teoma, my personal web page comes up as the first hit. Furthermore, on both google and teoma, 70% of the hits on the first page directly relate to me, although tenoma has a duplicate link.

    Both engines preform well in this test.

    2. Searching for a handle.

    I have used the handle Pathwalker for years - let's see how well it shows up:

    On this test, Google Lists my webpage on the first screen of hits. Teoma on the other hand lists a lot of mystical mumbo-jumbo about finding your path in life; none of the info on ME which I am looking for and care about.

    Google wins this test hands down.

    3. Email searching

    Many of my e-mail addresses have contained the string hungerf3 - let's see how many times each search engine can find this:

    Google finds 1470 hits of that string, all of which appear to relate to me, and of which it considers 21 important.
    Teoma, on the other hand finds only 13, but they all appear to be of generally high quality.

    Still, google wins this test as well through the sheer amount of information related to me which it can dig up!

    Overall, one test was tied, and Google won the others. While Teoma appears to be a good search engine, it just doesn't have enough information about me in it. If they fix this, then I might start using it more...
  14. Charge submissions. by deragon · · Score: 5, Informative

    They charge for submitting a URL. $30.00US for the first one. That could impeed on the search engine's success.

    References:

    http://static.wc.ask.com/docs/addjeeves/Submit.htm l
    http://ask.ineedhits.com/

    --
    Remember the year 2000? They promised us flying cars. They delivered the PT Cruiser...
  15. Teoma went down to 'frisco... by Yo+Grark · · Score: 5, Funny

    Teoma went down to 'frisco. They was lookin' for eyes & minds to steal. They were in a bind 'cause they were way behind, and they was willin' to make a deal, when they came across this engine servin' up those webpages nice and hot. Teoma jumped up on a silicon stump and said, "Boy, let me tell you what. I guess you didn't know it but I'm a search engine, too. And if you'd care to take a dare, I'll make a bet with you. Now, you return a pretty good search, boy, but give Teoma their due. I'll bet a RAM Disk of gold against your soul, 'cause I think I'm better than you." The engine said, "My name's Google, and it might be a sin, but I'll take your bet, and you're gonna regret, 'cause I'm the best that's ever been."

    Comon'Google, raise up your cache and kick some ass, 'cause hell's broke loose in searches.

    Teomahe deals the terms of agreement. "And if you win you get this shiny RAM Disk made of gold. But if you lose, Teoma gets your archive whole."

    Teoma opened up their HD case and said, "I'll start this show." And seached "fire flew from his fingertips" and returned "The Path of The Arcane.". boy they indexed slow. Their ram made an evil hiss, a new seach missed and by the phrase resulted this: "The Path of The Arcane."

    When Teoma finished, Google said, "Well, you're pretty good, you face-lifted son, but sit down in that chair right there and let me show you how it's done.

    Seachin for releavance, go chache go. Returned "The Devil Went Down to Georgia - Charlie Daniels Band" oh oh oh , Feelin Lucky in the first search just go. Google, does your site bite? No, man, no.

    Teoma bowed their head because they knew they'd been beat. And they laid that golden RAM on the ground at Google's feet. Google said, "Teoma, just come on back if you ever want to try again. 'Cause I told you once, you son of a gun, I'm the best that's ever been."

    - Moral of the story? It takes a second rate search engine to bring doubt, before we fully appreciate the term: "In google we trust", which, surprisingly was found on google, but not Teoma.

    -YoGrark

    --
    Canadian Bred with American Buttering
  16. Ugly cheap logo by Megs · · Score: 4, Insightful

    Okay, fine, they're allegedly going to bring out the Google-killing version tomorrow (News for April Fools, Stuff that makes for a really good laugh).

    The real question is, are they going to get rid of that lame, butt-ugly logo that just screams "cheap knockoff"?

    Also, in my profoundly unscientific survey of two friends on AIM, neither of them were able to correctly recall the name Teoma. Just because it means something cool doesn't mean that it will actually be a cool name...

    Meghan

    --
    Ask me about LOOM(TM).
  17. Re:Time. by nathana · · Score: 3, Informative

    Are they going to do the same thing google is doing, and let companies pay to have their pages come up in results more frequently than others?

    Gaaaah! How many #$@#!$-ing times is this particular piece of FUD going to be spread?? Google DOESN'T do this. Google does allow companies to pay to have their text ads rated higher for given keyword searches, but this doesn't influence the stupid search results!

    Sorry, Renraku; nothing personal. I'm sure you weren't purposefully trying to spread misinformation: you were probably misinformed yourself (most likely by the Slashdot article that started all of this paranoia). But I've seen this one enough that it's really starting to get to me...

  18. If it's slashdotted... by Warped-Reality · · Score: 5, Funny

    Here's a Google Cache of the site

    :)

    --
    This is not the greatest sig in the world, no. This is just a tribute.
  19. Re:I care nothing for Scientology or firearms by firewort · · Score: 4, Interesting

    I don't believe I claimed that Teoma might be superior- I put forth the idea that Teoma might be worth using if they weren't spineless and indecisive like Google. Because the new Teoma isn't up at the time of this posting, it's a little hard to tell for certain.

    Google is a fine search engine, but I much prefer the tools I use to not be influenced by what I consider to be poor politics and poor policies.

    What next? France will ask Google to remove any links for neo-Nazi or pro-Nazi sites? Sites that detail history regarding Nazi Germany in any fashion?

    Censorship is a slippery slope.

    How can I judge what are relevant search results if the search engine is censoring some of the valid results? Certainly, a search engine's job is to display only sites it finds relevant, but out and out censorship should play no role in that task.

    --

  20. Ten Minute Searching Score by sam_handelman · · Score: 5, Insightful

    I have evaluated a hit as relevant if it contains information related to the question asked. General information about Greece, or about the nutrient value of artichokes (but not containing specific info as to their vitamin content), I did not count as relevant. Pretty subjective, of course.

    Query (relevant hits of top 5)
    Google Teoma
    Religious Intolerance by the Greek Orthodox Church
    5 2(1)
    Nethack 3.4 Spoilers
    5 0
    Vitamin Content of Artichokes
    4 0
    Average Velocity of Asteroids
    4 0
    Who won the peloponnesian war?(2)
    5 5
    Samuel Handelman Columbia University(5)
    2 0
    Harry Noller University of California Santa Cruz
    4 4
    Edward Dratz University of Montana Bozeman
    5 3
    Dangers associated with mercury thermometers
    2 0
    Did Turing have any children?
    0 0
    okay
    Autobiography of Alen Turing(3)
    5 2
    Isaac Asimov's Middle Name(4)
    3 2

    Anyway, my time is up. avg. 50 seconds to run and squint at each query.

    Subjectively, to all of these querries, the #1 hit on google contained the answer to my question (the EXACT vitamin content of artichokes, the NAME of the side that won the war,) while Tacoma, even though the hits were relevant to the question, it was not clear if the information I sought was actually in the returned result; except for my former faculty advisor and his colleague, which Teoma found just fine.

    (1) I'm counting the Scientology hit as relevant.
    (2) Google corrected my spelling, which Tacoma did not. I'll accept that from a Beta.
    (3) Turing didn't write one. It was a trick question. Any link to a review, specifically, of either any of three (that I found) biographies of Alan Turing I counted as a hit.
    (4) I didn't get his middle name, but it turns out he wrote a story called "Middle Name" which swamped the results. Google found specific references to the story, whilest Teoma returned links to lists of Asimov's fiction, but I generously scored both as hits.
    (5) when I put my name in quotes Tacoma University either a) cannot find any matches or b) doesn't understand what the quotes mean. I assume b since none of the hits it finds without quotes mention me.

    Anyway, I'm satisfied in calling that statistical signifance (95% chance) that google is better.

    --
    The good and new comes from no quarter where it is looked for, and is always something different from what is expected.
  21. Re:Only a search engine by waytoomuchcoffee · · Score: 5, Informative

    Does no one read the article? They are rolling out a new version (which the article was about) tomorrow at 5pm PST! The site that is there now is using presumably months/years old technology

    Um, the site up right now is in BETA. And the article clearly states "After spending the past six months perfecting the technology, Gerasoulis and his development team on Monday evening will roll out a souped-up search engine".

    Are you saying that the current beta was discarded "months/years" ago and no one remembered to take it down, and that the "new" search engine to debut tommorrow didn't go through a beta stage first?

  22. Slightly OT: Google and the Google Toolbar for IE by Joe+U · · Score: 3, Interesting

    I know it's an IE thing right now, but the Google Toolbar is one of the more useful browser addons ever.

    As an experiment, for a week, I turned off the address bar and used the Google toolbar for everything. I was really impressed by the results.

    Turning the address bar into a search engine is a great idea, one that Google should think about enhancing. If done right, a Google Address bar could make the current DNS system much less important, and that's just a start. There are a lot of possibilities with a setup like this.

    In the end, I turned the Address bar back on to get an idea of what site I was on at the moment, it's easy to lose track without the URL line. However, I did not get rid of the toolbar, and I use it daily.

  23. Re:Slightly OT: Google and the Google Toolbar for by lkaos · · Score: 4, Informative

    Check out mozilla. The address bar _is_ a google search engine :)

    --
    int func(int a);
    func((b += 3, b));
  24. Teoma ranking by Animats · · Score: 3, Informative
    Well, all my sites have the #1 ranking for the usual keywords, and I didn't do a thing to make that happen. So I can't complain.

    Teoma is sluggish, but that can be fixed with money.

  25. +1 Funny? by great+throwdini · · Score: 3, Funny

    I wrote the above, but I don't understand the mod.

    Asking Jeeves the question posed by the OP ["Did ask.com buy out teoma?"] would have been funny. If only because it prominently returns this helpful link:

    Where can I learn about Ask Jeeves' acquisition of Teoma Technologies?

    This result might also be viewed as funny, in that it partly refutes this claim that Ask Jeeves is considered by most an "inferior search engine." Looks to me like it can handle hasty questions from five-digit Slashbots just fine.

  26. I like it. Here's why... by eldurbarn · · Score: 3, Informative

    Inserting tongue slightly in cheek:

    I searched on keywords that represent products that I sell on-line. In each and every case, my page was #1 on the list.

    I suppose this may change when they go "live and in color", tomorrow... but, for now, I can live with it ;-)

    --
    -Eldurbarn
  27. My study of Google, AllTheWeb, Teoma, and WiseNut by dh003i · · Score: 5, Informative

    Here's my simple study. I type in words at each search engine, and look at how many results I get. I rank them in order of most to least results, and I've put my (sometimes comical) comments below the results from each query.

    QUERY 1: LESBIAN

    AltaVista: 29,176,797
    Google: 11,600,000
    WiseNut: 8,282,738
    AllTheWeb: 1,166,487
    Teoma: 442,000

    Congrats to the pervs at AltaVista for having nearly 30 million results on "lesbian"! The jack-offs at Google come in a distant second at nearly 12 million results on Lesbians. Nice job to the occasional wankers at WiseNut on their 8-million results. AllTheWeb? Only 1 million results? Don't you guys jack off at all? What right does a search engine have to call itself AllTheWeb if they only get 1 million results on a query for "lesbian"? Teoma gets the "nice try" pat on the back. Grow some nuts, Teoma, then come back and play with the big boys.

    Now, lets try something a little bit more sparse.

    QUERY 2: Michael Jordan.

    AltaVista: 27,980,822
    Google: 1,320,000
    Teoma: 245,000
    AllTheWeb: 205,054
    WiseNut: 72,998

    Again, AltaVista comes out on top at 28 million. This is questionable, but probably accurate. AltaVista has really indiscriminate searching technology, and doesn't try to eliminate redundant or very similar pages (or subpages) like Google does. But, strictly by the numbers, again, Google comes in a distant second at 1.3 million. Teoma actually comes in somewhat respectibally this time at 2.5 hundred thousand; still, its not in the same league as Google or AltaVista. AllTheWeb again comes up short and dissapointing, especially given its name. Guys, don't call your engine AllTheWeb if it only returns 1/4 as many results as does Google! WiseNut apparently isn't too wise at only 72 thousand results for MJ. Come on guys, get with it. MJ's may have been retired for 2 years, but he's still big news.

    On to something a bit more obscure:

    QUERY 3: Leilani Rios

    For those of you who don't know, Leilani Rios is a stripper who was kicked off her run team for stripping to pay her way through college. What BS. This is a recent development; so this query sort of tests for how updated the search engines are.

    Google: 1,870
    AllTheWeb: 723
    AltaVista: 567
    WiseNut: 426
    Teoma: 74

    Well, I can hardly say this is surprising. AltaVista (~600) is finally dethroned, Google revealed as king (~2k). While I'm here, I should eat some crow for earlier criticisms of AllTheWeb (~700). Perhaps they don't deserve the title AllTheWeb, but 723 results on this query isn't bad. Still, not even half of what Google returned. WiseNut again occupies the low mediocrity position with 426 results. Teoma...Teoma Teoma Teoma, coming in with a sorry 74 results. Come on guys, this is recent news, but its also big news. The girl was in PlayBoy magazine for christ sake! Again, Teoma, spend some time growing up, grow some balls. Then come back and play with the big boys.

    In the interests of fairness, I'll do another query for a person who recently became news.

    QUERY 4: Katie Sierra

    AltaVista: 68,416
    Google: 37,200
    AllTheWeb: 25,447
    WiseNut: 21,184
    Teoma: 4,740

    Welp, AltaVista's back on top again at 68k, though I doubt the validity of it. Remember, AV doesn't sort out very similar pages, as does Google. Google comes in second at 37k. AllTheWeb, again, not bad, though certainly not "all the web" at 25k. WiseNut again comes in on the short side of mediocracy. Teoma...welp, you're beginning to see the pattern. Come on guys, this is sorry. I might find more results than that for Katie Sierra by just searching slashdot! (;-).

    Next is a personal query for a website of mine that's minor and unfinished:

    QUERY 5: "Here is a listing of links to several sites that either argue against"

    I used quotes this time because I'm specifically seeing if these search engines will produce a result for my web page (or one with those exact words, if any other has those exact words).

    Google: 1
    AllTheWeb: 1
    Others: 0

    Welp, what can I say? Google/AllTheWeb apparently appreciates even my trivial, marginal, unfinished thoughts. How dare AltaVista, WiseNut, and Teoma not have my trivial unfinished web page catalogued! No, just joking. I didn't really expect any search engine to have my page in it. But Google/AllTheWeb gave me a pleasant, ego-stroking surprise. This was what really impressed me with Google/AllTheWeb. What actually happened is I forgot about my web site (that is, its address) and typed in "pessimistic views" at Google(then today at AllTheWeb)...the first web page listed looked familiar and I wondered why until I realized it was a page I created years ago. Kudos to Google and AllTheWeb for including the "little guy".

    Well, that's it. You guys get the picture. Google is still king. AltaVista does a good job at faking it, but we all know that AV doesn't distinguish well between duplicate or very similar pages. AllTheWeb, impressive, but certainly not all of the web. WiseNut, I've never heard of before, but you did half-ass. Teoma...you came in 2nd in ONE category. Not even 1st. But, not being on the bottom rung just didn't feel right to you. Feel good to be back home? Here's my preferences for search engines and why:

    1. Google. Provides a lotta search results, well organized, and many great features.

    2. AllTheWeb. Before I discovered Google, you were my girl, but now your just my whore ;-). No, really, AllTheWeb has its uses. Its a techie search engine with lots of neat advanced features, and I love the FTP / MP3 search options.

    3. AltaVista. AV, though I'm sure you have (metaphorically speaking) fake breasts and a pushup braw, I still have a fond spot for you. Before I discovered AllTheWeb and Google, you were my girl. But now your more like the ex-wife who keeps on nagging me. AltaVista's kinda the thing I goto when I'm feeling nostalgic for my first car. Not really much use, but still got a little soft spot for ya.

    4. WiseNut. Never heard of this search engine before and there's obviously a reason for that. WiseNut seems to be, to me, the very definition of mediocracy. I'll keep an eye on you and see if anything good comes of you, but I'm about as hopeful for that as I am that Enron execs will be found "innocent".

    5. Teoma. Well, you did pretty shitty in every category. But you've got an excuse -- your the new kid on the block. The 16-year old girl who's mouth is so small you can't quite take in a whole . No, seriously. Teoma has some potential. I like the way I get fast results, and I like the no-nonsense interface. I think the more advanced way in which you organize things. I'll put you on my list of possibly up-and-coming search engines. But don't kid yourself yet. You're nowhere near the league of Google.

    Despite my harsh, sometimes funny, tone in this post, all these engines are good. But "good" (i.e., AltaVista, Teoma, WiseNut), just doesn't cut it when you have GREAT engines like AllTheWeb, and when you have THE ENGINE, aka Google.

  28. Re:Where's Teoma's caching? by Dwonis · · Score: 3, Informative

    Be patient. Caching takes up a lot of storage, which costs a lot of money, which Teoma doesn't have yet.

  29. The information retrieval technologies involved by gbnewby · · Score: 3, Informative

    Their "jobs" link mentions a variety of technologies, including LAPACK. LAPACK is a collection of scientific functions (there's a C version, CLAPACK, but LAPACK is FORTRAN). My guess is they're using, among other things, techniques related to latent semantic indexing (LSI) and vector space models (VSM) for their ranking.

    Unless you're an Information Retrieval Wienie (like me), you might not know about LSI and the VSM. The cool thing is that these are methods that work really well in the laboratory, but have scaling problems so are not found much in large-scale systems.

    Google, we know, uses Page Rank to rank pages based (partially) on the "authority" of the page. It's not clear whether Teoma uses this or not (it is patented). LSI is also patented (by Bell/Lucent), but VSM is not.

    For both Google and Teoma, they seem to use hybrid approaches:

    - Word occurrence, with weighting (weight of a term in a document; weight of a term in a collection). This is fundamental to all search engines (it's part of what distinguishes an information retrieval system from a database).

    - Statistical relations among words and documents (e.g., VSM and LSI techniques -- there are many variations). These look at either a term by document matrix (where each cell is a term count), or term by term matrices (where each cell a measure of the terms' pairwise relatedness).

    - Clustering, to eliminate duplicates and identify groupings (Teoma seems to do this; Google does this in their directory. This is NorthernLight's claim to fame, and is patented)

    - Authority ranking (it's not clear whether Teoma does what Google does, but this is probably a part of the mix)

    Each search engine has its own recipe for how these and other factors are combined. If only they would share (and stop getting software patents)!

    ...Greg

  30. Re:Where's Teoma's caching? by grytpype · · Score: 3, Funny

    Maybe Teoma could just link to the Google cache.

    --

    - Have a picture

  31. Having read the article.... by Metrol · · Score: 5, Interesting

    Teoma is going to have one heck of a time ramping up to the kind of processing that Google is doing, if for no other reason than the kind of money they're going to need to put Redmond's way. Have a look. It's no wonder they couldn't put together the financial resources.

    Now, let's just pretend that the technology that Teoma is using is roughly equivalent to Google's. Google is up to what now, 7000 servers? That's 7000 copies of Win2k, each including a full Internet hosting license, which is a fair bit more than your usual in house licensing.

    Did they write their own DB, or are they fully into the MS world with SQL Server? We're talking about some serious bucks here that cannot be devoted to expanding hardware.

    On the other hand, Google can devote 100% of their cash investments to hardware and research. Adding a brand new > 1G box with a couple of monster drives costs maybe $600-$700.

    --
    The line must be drawn here. This far. No further.
  32. /. Crowd by sean23007 · · Score: 3, Funny

    I think this will be taken up by the Slashdot crowd, if only for one reason. A simple search for the word "Microsoft." On Google, the first link is to Microsoft's corporate website, the second is to the Internet Explorer Home Page, the third is to Microsoft Help and Support, etc. Teoma yields the same first result, www.microsoft.com, but the second result says Boycott Microsoft and the third is a link to information about the US vs. Microsoft court case.

    Now which one of these is more geek friendly? (By the way, I used this Google.)

    --

    Lack of eloquence does not denote lack of intelligence, though they often coincide.
  33. Re:Where's Teoma's caching? by rseuhs · · Score: 3, Insightful
    First, you don't need a database because you just have to fetch pages, the search-index is either unrelated to this or needs it anyway.

    Secondly, downloading gigabytes of data is not free, it costs bandwidth. Consumer-prices around here are about 0.05 $ per Megabyte, let's assume that Teoma pays 0.01 $ per Megabyte (Yes, I know that they probably don't pay on a per-megabyte basis, nevertheless they have to pay for their bandwidth one way or the other. If anybody knows how much this costs more exactly, please feel free to correct me).

    To download 10 Terabytes would cost 100000 $, cheap IDE-harddrives cost about 2$/GB, so storing 10 Terabytes would cost about 20000, or 5 times less. (Please note that 2$/GB are retail prices, if you actually buy 10Terabytes of harddisks, I guess you will get some kind of discount ;-)

    If you also take into account that you have to reindex sites frequently, (let's assume monthly), the yearly cost of operating the search engine is 60 times the cost of "storing the web".

    So unless I'm completely off-scale with my assumtions, the cost to maintain a cache is actually neglegtible compared to the cost of basic search-engine operation.