Could IBM Shake up the Search Engine World?
overshoot writes "IBM has just tossed a bucket of chum into the whole search showdown, which Microsoft thought was between them and Google. Apparently, IBM Research has developed a 'key facts' search technology (as distinct from 'key words') over the last several years. Now they're going public with it -- by putting it on SourceForge under an OSS license!" (According to the article, it's expected to show up on SourceForge by the end of this year, not immediately.)
...resulting in 100% consolidation?
Heheh...not.
Procrastination -- because good things come to those who wait.
That is all.
The search bar on your site barely works as it is.
It will be funny if sf.net denies them. But then, I guess they got a deal with them already.
Cheers,
RoadkillBunny
Now FOSS will destroy Google as well as Microsoft.
Companies that are going bankrupt (Like 321 Copy software company) or CloneCD should also release their programs under FOSS before going under to destroy their opponents as well, and everyone except the information monopolists benefit.
I'll stick to letting Google know every single detail of my life thanks.
Yay, now EVERYONE can make their own Search Engine and say how they are SO much better then everyone elses!
Yay, I have a sig.
IBM is really into open source lately
Their software is horrible. Ever worked with DB2? It sucks bigtime.
How will this compare with, say, something like Spotlight?
This is where Google comes in and buys IBM out in full.
Go to the w3.org and put Slashdot.org through the validator.
I applaude IBM for taking this stance and entering the hotly contested search engine world.
More competition is better. I would enjoy more innovation. They do have a hard long road to follow however, and they may find it difficult.
Check out my journal if interested in a difficult problem.
Check journal for info on Anti-TextBook, an idea by me.
Is die a very slow and painful death...at least in the software market.
wfp2.almaden.ibm.com - - [08/Aug/2005:15:48:34 -0400] "GET /robots.txt HTTP/1.0" 200 69 "-" "http://www.almaden.ibm.com/cs/crawler [fc7]"
:)
wfp2.almaden.ibm.com - - [08/Aug/2005:15:48:38 -0400] "GET / HTTP/1.0" 200 41317 "-" "http://www.almaden.ibm.com/cs/crawler [fc7]"
I've been getting once a day connections on my server from ibm for quite some time now (a year or so). Doesn't surprise me in the least.
Belive in Technology and AMAZE yourself. -- RIP ZDTV/TechTV
as in freedom, not free as in beer.
From TFA: "While simple but powerful keyword searches have revolutionized how Internet users locate and retrieve information, IBM is looking to transform how office workers sift through the piles of data stored inside organizations."
The posting implies that IBM is entering into competition with MS and Google. I saw no indication that IBM intends to launch a web search engine.
omg i think i love you.
Now I think Microsoft has a big problem... Now they really should start becoming innovative... And google finally could have a nice open source competitor. This will increase innovation in giant leaps and ofcourse would make it hard for microsoft ever to beat Google.. This will be a worthy test of the power of open source!!!
IBM is pretty crazy when it comes to advanced research in any of its fields.
I have heard of stories from researchers there that IBM has its own terminology for alot of technical EE/CS stuff, as they discovered it way before the world did but were so secretive they didn't publish any of it.
I'm not surprised if IBM has enough tech in search to seriously knock down Google!
This OSS thing comes as a surprise, as it contradicts their secretiveness about their research.
a bucket of chum into the whole search showdown,
This is an awful mixed metaphor. How does Slashdot expect its readers to navigate the treacherous IT seas with such poorly-seasoned and half-baked information?
MSN thought it was between them and google?0 722/tc_cmp/166401634
http://news.yahoo.com/news?tmpl=story&u=/cmp/2005
sorry bill, but if anything its between yahoo (22% share of all searches) and google (47%).
Not to mention most of those MSN searches (12%) are from IE users who don't know how to change their browser's start page.
Unstructured Information Management Architecture SDK. The UIMA SDK (Software Development Kit), is an all-JavaTM implementation of the UIMA framework, and it supports the implementation, description, composition, and deployment of UIMA components and applications. It also supports the developer with an Eclipse -based development environment that includes a set of tools and utilities for using UIMA.
Go you crazy Java dudes, go.
How we know is more important than what we know.
...is for you to stop anthropomorphizing it/
I, for one, welcome our new chum-tossing search-engine overlords...
KDeskserach?
...in the next KDE :D
KDeskfinder?
Koogle?
Kahoo?
is a P2P layer on top of this complete with efficient, distributed and secure search. A good P2P search engine is still missing and (IMHO) one of the more important things needed, last but not least for political reasons (privacy, censorship etc.).
:-)
That would make it possible to give back control of every aspect of the 'web experience' to the user.
Ok, I'm dreaming
The important information is simply the url http://www.alphaworks.ibm.com/tech/uima/
The name is different, but how are "key facts" different than "key words"? The article only seemed to say that it will be used by businesses internally.
/. crowd didn't really RTFA and dismissed it as being silly. They were trying to do exactly this kind of thing - provide intelligent ways for businesses to manager their heaps of data. That company is dead and gone now but it seems like everyone is starting to pursue this kind of thing, like MS with their relational database file system thingy in Longhorn. I guess they were just a few years too early.
On a nostalgic, somewhat related note, does anyone remember Scopeware? Unfortunately, it seemed that when that story was posted, most of the
...will sure light up. There will be so many people trying out-do the not-doing-evil of all of the other search engines that they'll have to resort to being evil just to prove how not evil they are.
Don't disappoint your bird dog. Go to the range.
I'm not sure if this is feasable as it would be hard to ward off spammers, but is there any chance that we could see an OSS distributed search system that works like SETI@HOME?
Maybe I'll patent it, before Epicrealm does...
Religion for nerds. Stuff that really matters
which Microsoft thought was between them and Google.
Where did this come from? It certainly wasn't part of the article. With BAIDU's IPO, and Yahoo expanding its index count to 20B pages (almost 4x Google's count), I seriously doubt that anyone in the search engine business thinks they can predict who will dominate in a few years - it's possible that the next "pagerank killer" is written by some CS grad students or by a search engine company that hardly anyone has heard of (yet).
...which Microsoft thought was between them and Google.
I think it still is pretty much between them (and perhaps Yahoo) as IBM is obviously not actively persuing this market. From first glance it appears that they wanted to give search engines a swing, and in the end decided not to go after it. However being IBM, instead of burying their research they released it into the public so others can benefit from it.
While this is good, but Microsoft and Google really have nothing to worry about. It's not like Big Blue is starting up it's own web search portal.
"What do you despise? By this are you truly known." --Princess Irulan, Manual of Muad'Dib
/)
No, the posting (at least tried to) implies that IBM is changing the rules on the search game.
Chum are the bait that you throw to sharks to get them fighting each other.
Lacking <sarcasm> tags,
"However, the technology has not existed to allow software to search out and make sense of these disparate forms of data."
Surely the technology has been around for a while http://www.w3.org/RDF/? It's just that no-one is using it?
So Google and MS will incorporate the "key facts" code into their products. That won't exactly shake up the search engine world. It will (possibly) improve it for everyone, and maybe (if "key facts" works better than their proprietary "key words" functions) even let another engine compete in their category. The latter might shake something up. But, like every other mass human activity, this competition is fought over brand names. Google clevery established a terrific brand, through careful simplicity and consistency in graphic and info design. This IBM release would merely grant more substance to the existing brands, and some substance to any newly emerging one. Which new brand would have to establish its own competitive value, largely through style.
IBM's move does have the power to shake up the open/proprietary software jihad underway. If Microsoft used their open code, it would be hard for MS to claim that open source is inherently bad, or proprietary code is inherently superior. Google would demonstrate the same argument, but no one complains about Google's code remaining proprietary, because it mainly runs on their servers, which few people yet demand should be opened to outsiders. These are the kind of subtle strategic moves that let IBM continue to pull the strings of the entire industry. Success that generates more business and flexibility for IBM, in the mixed open/proprietary space it's carving for itself, will also demonstrate another powerful idea. American corporations can achieve market influence through strategic deployment of basic R&D. Not just through proprietary products, but also through manipulation of competitors who adopt open tech they create.
All in all, this looks like a smart move by IBM. Let's hope 1> this rumor is true; 2> the tech is really good; and 3> we're not already too far gone down the entrenched lines between our corporate jihadis to get the benefit of the mutual cooperation that this tech could enable, to great mutual benefit.
--
make install -not war
Call me skeptical, but there are many things that appear on slashdot which are lauded as "FROM IBM" and therefore glorious, while I seldom see that anything coming from IBM is ever that glorious. They are slow and monolithic. Developerworks also isn't IBM, which is an error a lot of posters make -- Developerworks is a site IBM runs that pays people for articles, and usually the information quality is average or marginal at best. IBM research also doesn't succeed in making a lot of great products, hence they need to open source some things of marginal quality here and there to maintain street credit and trump the IBM name.
Large behemoths like IBM manage to innovate against their nature, and even that happens only occasionally. The machine resists...
Anything interesting happens on the fringe, never at the big 800 pound gorilla of bureaucracy.
and useless comments to insightfull comments I have ever seen. You know, like this one.
to know I'm looking for amateur or anal when I search for 'a'?
And from the Slashdot summary... IBM has just tossed a bucket of chum into the whole search showdown, which Microsoft thought was between them and Google.
No, IBM's technology has little to do with Google, Yahoo or Microsoft's search technology. This isn't a competition until either three introduce similar technology. Reading the article's third paragraph would clarify this, and would make the summary a little more accurate, too.
For he today that sheds his blood with me shall be my brother.
What, expecting free mod points on a worn out cliche?
Don't let taps, bottles, and cans hold it back. Let beer flow freely, as in FREEEEEEEDOM!!!
About 8 years ago, when I was writing software for OS/2, I ran across an interesting extension that IBM had for its DB2 software, called (I think) the Ultimedia extensions. These would allow you to search photos for a type of object that it understood. So you could tell it to search for all pictures that had a red ball and a tree...and it would return a list of all photos with those two objects. It was really interesting, but I have not heard anything about it since then...
ttyl
Farrell
CAN-CON 2019 - Ottawa's only book oriented Science Fiction Convention! October 18-20, Sheraton Hotel, Ottawa, Canada h
It's available now. As the article says:
UIMA technology is expected to be made available through open-source software site SourceForge by the end of 2005. The UIMA framework can currently be downloaded free of charge from IBM AlphaWorks at http://www.alphaworks.ibm.com/tech/uima/.
So, I ask, why wait for it to appear on SF if we can get it now?
I'm in agreement here. If anyone can see the algorithms, then it's going to be pretty easy to manipulate the results and ruing the efficiency. Perhaps this will be the first example of the limits of OSS due to the necessity for secrecy.
Someday I intend to make a great OS program called "KK" just to throw them for a loop when they try to name the KDE version of it :P
...who famously had Hamlet wonder:
"Whether 'tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take arms against a sea of troubles,..."
You can dowload the Unstructured Information Management Architecture SDK from alphaworks and take a good look at how to analyze unstructured information (text, audio, video, images, etc.) to discover, organize, and deliver relevant knowledge.
IBM is pretty crazy when it comes to advanced research in any of its fields. I have heard of stories from researchers there that IBM has its own terminology for alot of technical EE/CS stuff, as they discovered it way before the world did but were so secretive they didn't publish any of it. I'm not surprised if IBM has enough tech in search to seriously knock down Google! This OSS thing comes as a surprise, as it contradicts their secretiveness about their research.
The key to search engines, whatever their underlying ranking algorithm, is trawling through the couple of billion pages on the net to generate the data to be be searched.
Obviously most of us simply don't have the bandwidth or the computing power & storage to do that.
So are IBM treating the search engine source release as a hypothetical interest for people who can't actually make practical use of it, or are they going to give access to their own trawled data?
If the latter, then this is very significant.
If there search engine is anything like their web site, then MS and Google have nothing to worry about.
There have been many times when I have known what something is or does (since I've seen it in action), but not what it is called. If I could search for information on the basis of known facts, rather than just guessing at search terms, I think I would have much quicker success at such searches. I can usually find whatever I needed to know, but it can take weeks if I don't know the words to search for. Sometimes it takes joining mailing lists or asking people personally. Yeah it works, and the current system is immensely better than going to a library to hunt for something, but it can still be improved upon.
For example, I was looking for a particular type of flute, smaller than a normal flute but larger than a piccolo, but with the same standard keywork and fingering system. I knew such a thing existed, having seen it in use in a flute choir, but I didn't know it was called a "treble G flute". Instead I had to search on what I did know -- it's a flute, and used in a flute choir -- and pick through the truly staggering number of hits myself in hopes of finding what I'm looking for. If I could have automagically narrowed that down with specifications such as "smaller than a concert flute", "larger than a piccolo", "made of metal", and "has Boehm system closed keywork", I would have had very few hits to search through and most of them would have been relevant. Google reduced the whole world down to a haystack to search for that elusive needle. Searching by facts might have reduced that down to a teacup.
Mal-2
How is the Riemann zeta function like Trump rallies? Both have an endless number of trivial zeros.
This is good news anyway. Keyword/phrase searching becomes less useful as the universe expands. I have 11000 texts fully indexed with swish-e and I get way too many hits unless I use phrases. If I knew what phrase was in the books I sought, I would not need the search engine.
I love search engines because I cannot figure out how to organize a file cabinet or a hard drive...
A problem is an opportunity http://mrpogson.com
That was a long time ago in a galaxy far, far, away. eBay now runs on Sun.
This is how a commercial enterprise does "Open Source". Please take note!
No wonder my IBM stock is tanked.
If this radical new technology is anything like the new, improved, "Deep Blue" search backing IBMs support pages, its a real piece of junk, almost like Altavista circa 1998.
Conformity is the jailer of freedom and enemy of growth. -JFK
That's just the hardware (and operating system and JVM). And not necessarily all eBay's hardware. The original poster is correct: eBay uses IBM WebSphere software.
Interesting....
I thought IBM tried to patent everything and anything plausibly patentable that came across the desk of someone on their research team.
If they patent everything, they can be pretty sure that they'll be able to extract some pretty hefty licensing fees from the industry at large. However, if they keep too many things under wraps, while they might gain a competitive advantage for a product that they're bringing to market relatively soon, they risk loosing the ability to file for all of the relevant patents. For example, someone in another research lab might simultaneously make similar discoveries and file for some patents. Thus, in the worst case, IBM could be forced to pay heavy licensing fees to the second company, for tech that IBM originally discovered.
So, I guess, I'd like to know, under what conditions does IBM tend to keep things underwraps and when do they opt for the patent?
In the trials for the new mainframe they were searching the entire net, but not for your typical search reasons (ex. Searching on an address), but to find relationships and patterns. Evidently they were getting some really interesting results searching on predictive patterns for stocks (finding tell-tale relationships that indicated when something was going to move) or in evaluating government actions. A lot of discussion I sat in on was on how they could use the tech to find patterns across thousands of sources.
Anyway the net net is they were trying hard to find a way to sell this tech, part of their new efforts to monetize technologies like this (IBM has this great weather predicative technology for micro-cells that still hasn't seen the light of day). Guess they couldn't find a good way to sell it directly, so releasing it this way to the world is pretty interesting. Though it wouldn't surprise me that when they ran the numbers they found they'd sell more hardware and services then the software would ever generate if it was adopted by other companies.
IBM has just tossed a bucket of chum into the whole search showdown, which Microsoft thought was between them and Google.
Yahoo says, "I'm nawt dead yet." "It's just a flesh wound."
if its open source whats to stop jerks from making it ignore robots.txt? this is gonna help phpbb, and other things that currently have asstastic search capability - but its gonna be the next big thing for DNS attacks, wurms, etc.
mayby someone'll come up with some sort of SPI-thingy that sniffs out the indexing weasels from the good guys and bloxorz them!
i call it, realsearchengines.org - the only place to register as a legit search engine indexer, and to report "search-spammers"
10 tads = 1 few
10 fews = 1 some
10 somes = 1 alot
10 alots = 1 load
10 loads = 1 buttload
10 buttloads = 1 assload
10 assloads = 1 shitload
10 shitloads = 1 fuckload
I do not have the book here or I would give the non-metric chart, you know how hard it is to remeber how many hogsheads are in an imperial buttload?
A blog about stuff.
You almost make it sound as if this is the first OSS search engine out there. Apache Jakarta's Nutch, a subproject of Lucene, has been around for over two years. I haven't done tons of research on the subject, so I'm betting Nutch isn't the only one.
Both "both" are not both needed. :-) for you idiot moderators on crack.
Infuriate left and right
It's open source, right?
Google likes open source.
If this is something really useful, why wouldn't they take it in board ? Assuming of course that Google don't have something that whoops UIMA and aren't talking about / releasing it. In which case they could use it to augment their research.
So it may not be Google vs Microsoft vs IBM.
It may be (Google & IBM) vs Microsoft.
Comment removed based on user account deletion
The only key facts I need to know around here are about 30,000 events per second on my network. splunk does the job, it's free, and it isn't a Trojan GNU packed with IBM consultants waiting for nightfall.
...though they stole most of it from SCO.
Yeah, the patent everything method sounds more plausible for money making. I was pretty surprised when I heard the stories too. As an example, apparently the LSB and MSB are backwards at IBM, so like their "MSB" is our "LSB", or something like that. Other things like Dual-rail dynamic domino logic are called something wholey different I heard... I am not so sure as to their conditions for release or kept under wraps. That's an interesting question I'd like to know myself.
This doesn't exude my confidence...perhaps the first order of the day when going over the lines of code would be to see if the application "phones home"
I have been working with search engines for some time now, and the 'concept search' that IBM is mentioning is nothing new. Actually, the Cambridge UK-based company Autonomy http://www.autonomy.com/ has been market leader in this field for years. IBM even has some specialists on Autonomy working for them... makes you wonder...
To which Paul Allen responded from the deck of his yacht, "We're going to need a bigger boat!"
Seth
$5 / month hosted VPS on linux = awesome!
Fellow /. viewers,
Why not do the following: Several of us have access to sufficient infrastructure (own/lease with diskspace to spare, plus a bandwidth surplus).
Why do we not combine that in a distributed search environment with mirrored nodes with this technique of IBM. The addition of the distributed technology to spider and index the web will be a significant challenge, but the concept is I think pretty appealing. I for one will be willing to "donate" the necessary domain and starting facilities.
Anyone who is interested, you know how to find me.
My wife's sketchblog Blob[p]: Gastrono-me
I'm skeptical about their technology. I'm sure that if it is so "good" then IBM will try to take some money of it... and if it is not so "good" so let it go on SF.net and take some populist's credits out of it... and maybe geeks will stick to it and they will make Google and Search.MSN weaker not necessarily by open-source quality search engines but by swarms of emerging not-as-good-as-google specialized search engines...
Anyway IBM will benefit by open-sourcing the market segment that its competitors are dependant on...
Well, I've got to get back to work. When I stop rowing, the slave ship just goes in circles.
Any chance this OSS project could be made into a distributed app that is hosted on thousands of individual web servers in some sort of cooperative?
Imagine if any group of people could develop a search engine that (through funky DNS or distributed scripting) they could easily host themselves and provide internet searaches that have certain intentional biases.
Like a "Google for gamers" or "Google for crackers" or "Google for linguistics" - but all independantly hosted.
Sounds like "Free as in 'Look, I'm cool like Google'"
World Changing - News for Humans, Stuff about our planet
Exactly how do the IBM guys manage to make money from this? They just seem to dontating it to biggies lke Google and MS. Sure they are not in mood for some social service, until it benefits them.
Dont waste you time reading stupid sigs like this.
After searching for a whooping 5 minutes and even googling (gasp!) I couldn't find any decent article about what this actually is, just lots of info on how to use it. It looks like there is a new query language so it might be interesting for query expansion. But how does it extract these key facts from the documents? Does it do real natural language analysis? Just guess by looking at the document terms like every other search technology? Or is it just a framework that doesn't really do anything by itself? It sure looks like it when skimming http://www.research.ibm.com/journal/sj/433/gotz.ht ml so no revolution yet, sorry.
I work for IBM. If we're releasing the search engine we use on our internal site, the only thing that we can be hoping is that someone will fix the damn thing.
Or maybe we're throwing it out prior to licensing Google's algorithms (IBMers can only hope). Most of the hits we get from the internal site in respose to a query are useless.
I don't know, it seems pretty clear to me: the reference is to shooting fish in a barrel.
What I hope this is used for is the Linux desktop. Searching in Linux sucks. For the most part that is ok, if you can install Linux (and use it) you know where your stuff is. But if Linux expands to an average end user, a search that works would be a great boon. "IBM's version of Google Desktop Search." Google could fill this gap itself, but it hasn't released ANY software for Linux, so once again Big Blue steps up and contributes something useful. I hope it is incorporated.
How much you want to bet that this technologoy did NOT come from IBM's India campus?
You still have to buy the software that will plug into the framework in order to actually process the information, though some open source projects are certain to come along.
This is interesting stuff, but not as thrilling as the article would suggest. Imagine if google open sourced their systems software, except the part that does the whole PageRank thing.
The meek shall inherit the earth, in 3 by 6 plots. - Lazerus Long
They'd only have to pay licensing fees if they sold a product embodying the patent w/o having independently developed the invention before the patent application was filed. They would perhaps have to deal wth lawyers fees, but they always do
I like how the author points out Google as being the "worlds largest computer company" in the same article as IBM. Apparently having the company name International Business Machines, having $100B in assets and revenues of $100B a year will not trump a hyped up dot-com company with $3B in assets and revenues of $3B a year. Surely Google is the world leader in search but when did that become the only function of computers?
There's a new serious effort to make an open source search engine, and I can see there's a lot of interest about this in this thread. You can get a preview of our opening documents at the Openzuka forums, http://openzuka.org/phpBB2/viewforum.php?f=2.
At the least, I see distributed processing possible in doing the work of indexing (and analyzing) the internet, and passing this on to a central server.
Send me your e-mail address or other contact and your level/area of interest, and I'll let you know how things develop! http://bemweb.com/contact/
benjamin, Agaric