Post-Googleism At IBM With Piquant

Latent Sematic Indexing by LISNews · 2004-12-26 01:07 · Score: 5, Informative

They don't come out and say it, but it sounds like it's just a big ol' LSI System. It works really well for some types of searching, but I'm not sure if such a thing would out perform google for a general purpose search engine.

"Latent semantic indexing adds an important step to the document indexing process. In addition to recording which keywords a document contains, the method examines the document collection as a whole, to see which other documents contain some of those same words. LSI considers documents that have many words in common to be semantically close, and ones with few words in common to be semantically distant. This simple method correlates surprisingly well with how a human being, looking at content, might classify a document collection. Although the LSI algorithm doesn't understand anything about what the words mean, the patterns it notices can make it seem astonishingly intelligent."

Re:Latent Sematic Indexing by SpinyNorman · 2004-12-26 01:21 · Score: 5, Informative

Actually it sounds more like CYC-lite.

The LSI system, despite the name, knows nothing about semantics. I just ASSUMES that words that frequently occur near each other are semantically related.
Re:Latent Sematic Indexing by ragnar · 2004-12-26 01:23 · Score: 5, Informative

I thought the same when I read this. I've met the people at NITLE who are developing an implementation of LSI. It is impressive and they have a download of their software available via CVS. For persons interested in this area of research it is worth the while to look at what NITLE is doing.

--
-- Solaris Central - http://w
Re:Latent Sematic Indexing by timeOday · 2004-12-26 01:29 · Score: 4, Interesting

I'm not sure if such a thing would out perform google for a general purpose search engine.
The short answer is no, because traditional information retrieval methods like LSI are easily fooled by spammer tricks like keyword stuffing.
The genius being google's success was paying *less* attention to the content of a page when categorizing it, and relying on links *to* the page instead. Why? Because of spammers.
Think about hiring for a job. You don't limit yourself to interviews with candidates, because the're highly motivated to decieve you. So you look for references. Certification is an example of this - somebody besides the person himself who will vouch for his competence. An even better reference is somebody you know and trust who thinks highly of the individual (which is why personal networking is so important to getting hired).
Google's PageRank is analogous. Instead of looking at the content of a page, you rely heavily on links to the page, especially links from more trusted sources. This helps defeat spammers, who use all manner of tricks to make their crap look good to search engine spiders.
Re:Latent Sematic Indexing by Anonymous Coward · 2004-12-26 01:40 · Score: 1, Funny

I'm sure Prime Minister Poutine will be happy to hear of this development...
Re:Latent Sematic Indexing by Haydn+Fenton · 2004-12-26 01:57 · Score: 5, Informative

For other Natural Language Processor being researched and/or developed by IBM, check out their NLP Research page. They have quite a few different technologies in this feild, which I wasn't aware of.
I for one, welcome our new semantic web overlords! It's really great to hear that something based on semantic technologies is finally breaking through. This could be the dawn of a new era :)
I know this is very optimistic, but how long do you think it will be before we'll have something like this combined with something like Google. The amount of knowledge readily available will be mind boggling huge. Imagine having a text service on your mobile, you text off a question to something and get an answer immediately back. All knowledge available everywhere, any time, that would be a great thing. Heck, it's even quite scary to think about it.
Re:Latent Sematic Indexing by Haydn+Fenton · 2004-12-26 02:03 · Score: 4, Informative

Yep, a little digging shows that it does indeed use CYC technology, or at least, according to this site (google's HTML of a PDF).
Re:Latent Sematic Indexing by tootlemonde · 2004-12-26 02:51 · Score: 2, Informative

it sounds like it's just a big ol' LSI System
A Perl implimentation of LSI can be found at Building a Vector Space Search Engine in Perl
However, there are at least three problems. First, it doesn't look LSI can answer questions like "Who is the Prime Minister of Canada?"
Second, the approach is patented by Telcordia Technologies.
Third, there are scalability problems with LSI. The author of the Perl article writes:

For all its advantages, LSI also presents some drawbacks. The poor scalability of the singular value decomposition (SVD) algorithm remains an obstacle to indexing very large collections. While techniques have been developed for making incremental updates to a scaled collection, these changes typically cannot exceed a certain threshold without triggering a rebuild [7,8]. These constraints make LSI ill suited to the kinds of large, rapidly changing document collections typically found on the Web.
A further disadvantage to LSI is the difficulty in interpreting the underlying reduced term space [4]. This makes it difficult to select an optimum number of singular values to retain in the SVD for a given collection, or allow domain exert adjustment of relevance values in the reduced space once the SVD has been calculated.

As a result, the author is now pursuing something called Contextual Network Graphs and has written a Perl module that was updated as recently as last August.
Re:Latent Sematic Indexing by Anonymous Coward · 2004-12-26 04:03 · Score: 1, Interesting

The short answer is no, because traditional information retrieval methods like LSI are easily fooled by spammer tricks like keyword stuffing.

That depends upon how you apply it. For instance, with a comprehensive database built up with that technology, when you search for Hilton, it might be able to respond with "the family or the hotel chain?", and return categorised results.
Re:Latent Sematic Indexing by MasonMcD · 2004-12-26 04:10 · Score: 2, Insightful

From the article:

MR. CICCOLO, the search strategist, said that in a way his team was trying to match - and reverse - what Google has achieved. "As Google use became widespread, people began asking why it was so much easier to find material on the external Web than it was on their own computers or in their company's Web sites," he said. "Google sets a very high standard for that Web. We would like to set the next standard, so that people will find it so easy to do things at work that they'll wonder why they can't do them on the Internet."

They seem to be explicity targeting intranets or known good databases, so the spammer issue might be moot.

This raises another issue, however. Will this technology become so useful as to lead to the bad old days of proprietary information dbs a la Lexis/Nexis? I'm assuming the indexing will have to take place on company-owned servers.
Re:Latent Sematic Indexing by Elektroschock · 2004-12-26 04:44 · Score: 1

Yes, it sounds like cyc and cyc is just a too ambitous project, see

http://www.opencyc.org/

For me it has no use at all.

IP, html, google work very well because they are simple. There are "better", complicated systems, protocols, ideas. But they are not useful yet.

I think it sounds like a honey trap for investors who want to waste their money and I really wonder whether the will file a "software patent" or do other crap :-)

The prime minister detection is a very simple issue.

AI does not work, because it is not designed for a customer, it is designed to convince the public funding agencies.
Re:Latent Sematic Indexing by Master+of+Transhuman · 2004-12-26 09:55 · Score: 1

"the bad old days of proprietary information dbs a la Lexis/Nexis"

Those days never left. As information brokers know, there is still more accurate, structured info locked up in fee-paid databases than there is on the Net - and the ability to know where those databases are and how to search them is where information brokers make their money.

--
Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
Re:Latent Sematic Indexing by St.+Arbirix · 2004-12-26 10:34 · Score: 2, Funny

They don't come out and say it, but it sounds like it's just a big ol' LSI System.

Actually they did that on purpose. The press release was actually a test for Piquant to see if it could figure out that it was really just a rehashed older idea.

--
Direct away from face when opening.
Re:Latent Sematic Indexing by otisg · 2004-12-26 13:50 · Score: 2, Informative

Not only that, but this stuff is also patented, see: here.

--
Simpy
Re:Latent Sematic Indexing by Anonymous Coward · 2004-12-26 14:27 · Score: 0

Dear Sir, Thank you for your brilliant proof of the impossibility of AI (whatever the hell you mean by that) via reductio ad absurdum. Your subtle proof will have the AI community reeling for years trying to recover from your devastatingly insightful revealing of the inherent contradictions implicit in the discipline of artificial intelligence.
On behalf of your alien overlords, let me say 'thank you', you stupid mutha fucka!
Yeah, dude, 'tis all 'bout da customers! 'Tis how all progress occurs...
Re:Latent Sematic Indexing by Anonymous Coward · 2004-12-26 15:27 · Score: 0

They don't come out and say it, but it sounds like it's just a big ol' LSI System. It works really well for some types of searching, but I'm not sure if such a thing would out perform google for a general purpose search engine.

Sorry, but you are just wrong. LSI (latent semantic indexing) isn't capable of doing what is described in the article. Clearly, deeper semantic analysis is taking place, probably with some type of analogy engine or semantic pattern matcher doing the heavy lifting. Sounds like the system is still in the development stage though. Such system generally require large ontologies (think graph not tree) of concepts that allows for similar semantic patterns to be analyzed, represented, and matched. These take a long time to make and refine. I imagine that this is what still needs to be done before the system is ready for prime time.
Re:Latent Sematic Indexing by mikkom · 2004-12-27 00:17 · Score: 1

Another woderful proof of how great software/algorithm patents are.

--
My quality social news site.com.

Re:so..... by henrycoderm · 2004-12-26 01:07 · Score: 0

"...the system responded correctly to the question, 'Who is Canada's prime minister?' even though those exact words didn't appear in the article. What do you think?" (emphasis mine)

Wow by setagllib · 2004-12-26 01:08 · Score: 4, Insightful

That's pretty impressive. It takes quite a clever AI to read between lines and connect concepts, but I have to wonder how much of its 'understanding' was hard-coded rather than purely abstract. Would it be trivial to just stick in another language database and have it read translations of the article the same way?

Nevertheless it makes me feel like all the programming and design I've ever done is pathetic and I will never amount to anything. That's how it is in the software industry - always someone out there who makes you look bad.

--
Sam ty sig.

Re:Wow by EpsCylonB · 2004-12-26 01:13 · Score: 2, Insightful

That's how it is in the software industry - always someone out there who makes you look bad.

Thats how it is in Life.
Re:Wow by smchris · 2004-12-26 02:00 · Score: 2, Interesting

I have to wonder how much of its 'understanding' was hard-coded rather than purely abstract.

Baby steps, but the sort of essential baby steps that accumulate real technological progress. When the system discovers its _own_ non-trivial and useful rules, when it spontaneously parses our input to reply upon a self-generated "Oh, you mean......", then it gets scary.

Epistemology is a big word.
Re:Wow by Captain+Scurvy · 2004-12-26 02:06 · Score: 0

Thats how it is in Life.

For many people, software development IS life.
Re:Wow by Alsee · 2004-12-26 02:09 · Score: 1

What is life? Is it a big download? Do you have a .torrent link?

-

--
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
Re:Wow by eonish · 2004-12-26 02:20 · Score: 1

If we make it analyze all the religious texts, scriptures and books.. Can it answer the question "What is the meaning of life"?
Re:Wow by PsiPsiStar · 2004-12-26 02:33 · Score: 1

Just don't give it Nietzche.

--

___
It's the end of my comment as I know it and I feel fine.
Re:Wow by lachlan76 · 2004-12-26 02:48 · Score: 1

lachlan@localhost $ analyse -q "What is the meaning of life" Segmentation Fault
Re:Wow by No.+24601 · 2004-12-26 02:57 · Score: 1

Thats how it is in Life.
If that's what you care about.
Re:Wow by miu · 2004-12-26 03:10 · Score: 1

I doubt Nietzche could do any permanent harm. At worst exposure will lead to the program wearing lots of black and scowling at people while telling them "I will destroy you!", but it will get over it soon enough.

--

[Set Cain on fire and steal his lute.]
Re:Wow by forkazoo · 2004-12-26 04:01 · Score: 2, Funny

lachlan@localhost $ analyse -q "What is the meaning of Life, the Universe, and Everything?"
42

lachlan@localhost $ analyse -q "Is there a God?"
There is now!
Re:Wow by Flyboy+Connor · 2004-12-26 05:56 · Score: 1

The AI was not exactly reading between the lines. As I understand it, based on an analysis of the contents of one document, the system looked for other documents which were closely related. Those other documents might very well contain the answer to the question directly.
While it is still an interesting application that can reliably indicate related documents, it is not new: at the institute where I worked 5 years ago, a similar application was developed, which was able to identify keywords which belonged to a document even if those keywords did not appear in the document. It was correct in 80-90% of its keyword guesses.
Re:Wow by Anonymous Coward · 2004-12-26 06:07 · Score: 0

You obviously understand nothing of linguistics. The syntax of most languages is very similar and breaking sentences down into their pieces with decent success is pretty easy given a list of categorized words.

There is also no AI involved. Also, RTFA and stop wondering out loud. Thanks
Re:Wow by igny · 2004-12-26 06:36 · Score: 1

If we make it analyze all posts in this discussion, can it answer the question asked by the story submitter "What do we think?"

--
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
Re:Wow by Anonymous Coward · 2004-12-26 18:46 · Score: 0

Ha ha. That is hilarious! Ab-so-lute-ly hilarious!!! And oh so original.
Have you ever actually read Nietzsche? Or have you just seen his name on new-age fortune cookies?
Re:Wow by setagllib · 2004-12-27 00:12 · Score: 1

It depends on your definition of 'AI'. If people call the small algorithms defining how some characters should behave in basic games 'AI', this certainly is, even if it doesn't learn on its own.

TAI (think about it) and stop flaming. Thanks.

PS: If you think the syntax of most languages is very similar, you haven't ever once spoken fluent Russian. SystranSoft's translator, for instance, in spite of its occasional success in translating English/French, can't get a single thing right in Russian. It's too flexible and undertermined to easily parse. If you throw in slang it's virtually impossible without knowing the context and culture itself. Watch a movie translated by a certain 'Goblin' and you'll know what I mean. 'breaking sentences down into pieces' my ass.

--
Sam ty sig.
Re:Wow by miu · 2004-12-27 07:29 · Score: 1

Like lots of 16 year olds I read a lot of Nietzsche. He is impressive in a way guaranteed to impress a 16 year old. Anyone who has lived more than a couple years in the real world can spot that he is a BS artist. His philosophy is very agreeable to anyone susceptible to righteous anger at the fact that the world will not reward them merely for being smarter than average.
There is some actual substance to some of his work, but it is fairly thin and most of it is covered better by other philosophers. So have you read Nietzsche or just scanned him looking for things to agree with so you can sneer at everyone who does not recognize how great you are?

--

[Set Cain on fire and steal his lute.]

Reg Free by bendelo · 2004-12-26 01:09 · Score: 5, Informative

Reg-free link

MOD DOWN -1 KARMA WHORE by Anonymous Coward · 2004-12-26 01:09 · Score: 0

n/t

Sounds impressive by Timesprout · 2004-12-26 01:09 · Score: 4, Funny

Till you realise the computer answered 'some asshole' which could be any prime minister in the world really.

--
Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
What truth?
There is no dupe

Re:Sounds impressive by Anonymous Coward · 2004-12-26 01:12 · Score: 0

hey, it couldve answered with this guy . Although not a pm he still fits the description of what they are.
Re:Sounds impressive by Alsee · 2004-12-26 02:15 · Score: 0

Till you realise the computer answered 'some asshole' which could be any prime minister in the world really.

You should see what it answered when I asked "Who is president of the United States". I couldn't get it to stop. I had to hit the power button and reboot.

-

--
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
Re:Sounds impressive by PsiPsiStar · 2004-12-26 03:24 · Score: 1

You should see what it answered when I asked "Who is president of the United States". I couldn't get it to stop. I had to hit the power button and reboot.

It was probably trying to recount the votes. Either that, or it had received some threatening e-mails from the diebold voting machines down the block.

--

___
It's the end of my comment as I know it and I feel fine.
Re:Sounds impressive by FireBreathingDog · 2004-12-26 05:19 · Score: 0, Offtopic

Heh heh heh. I love it. Seeing your sig is like seeing all those people with Kerry bumper-stickers still on their cars. Calling the president Hitler, an idiot, an evil genius, a Nazi chimpanzee, etc., didn't win you the election. Imagine that. Maybe next time you guys will realize you need a little more than childish name-calling to convince voters to go your way.
Then again, seeing how you've all been after the election makes me think you still don't get it. Calling 62,000,000 people idiots because they didn't vote the way you wanted them to isn't going to make them any more likely to vote your way in the future.
...and we're the ones you call idiots. Sheesh.

--
Shame on Google.
Re:Sounds impressive by Anonymous Coward · 2004-12-26 09:27 · Score: 0

Exactly! Why do those people still not realize that gay marriage and abortion are the only issues that count for presidential elections?
Re:Sounds impressive by Daedalus-Ubergeek · 2004-12-26 11:03 · Score: 1

Till you realise the computer answered 'some asshole' which could be any prime minister in the world really.

Don't you mean "some 'eh'-hole"?
Re:Sounds impressive by Alsee · 2004-12-27 01:27 · Score: 0

Heh heh heh. I love it. You never even disputed that my sig was true. In fact your insults just prove my point. Bush has divided the nation and sparked more anger and divisiveness than any recent president, perhapse even since the civil war.

Sure we can get bogged down in an endless battle over which side is right and which side is wrong, but I've cleverly sidestepped that morass. I've already won the argument. You cannot deny the anger and insults on both sides, the insults you quoted and the insults you made yourself. It was Bush himself who originally campaigned on the promise "I'm a uniter, not a divider". Well, Bush has spectacularly failed or violated at least that promise.

Even if we assume for the sake of argument that Bush is some sort of saint, you cannot deny that he has severely alienated half the country. That half the country considers him evil and/or stupid and/or a liar. And even if they are the ones that are stupid and wrong, well Bush has STILL been a disaster in creating that impression in half the country.

Even if we assume everything else Bush has done has been Great and Wonderful and The Right Thing To Do, the fact that he has caused so much anger and divisiveness in the country is in itself bad and harmful to the country.

I am sorely tempted to document the fact that the majority of Bush supporters are/were spectacularly misinformed*, but that would be wading back into the morass I specifically avoided by making the point the indisputable fact that Bush has been divisive.

* Perfectly intelligent people can be misinformed, especially when they have been deliberately deceived.

-

--
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.

Enterprise... by ajaf · 2004-12-26 01:12 · Score: 0

We are closer to build a real enterprise...

"Computer, tell me the diference between a male and a shemale"

--
ajf

Re:Enterprise... by Timesprout · 2004-12-26 01:18 · Score: 1

I think you need to be very careful who you try chatting up when you have a few beers onboard mate.

--
Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
What truth?
There is no dupe

ask jeeves... by gl4ss · 2004-12-26 01:12 · Score: 1

i remember that it used to buff itself as an answerer to such questions back in the day..

must have been pre-google since i used it sometimes

--
world was created 5 seconds before this post as it is.

Re:ask jeeves... by Anonymous Coward · 2004-12-26 02:54 · Score: 0

correct, although it wasn't / isn't if it still does so, a very good one. this didn't stop them charging $1 million for companies to use their tech.

Trust Issue by Flamefly · 2004-12-26 01:16 · Score: 5, Interesting

On a global scale this system tends to fall apart, there is a constant issue of trust when dealing with what looks to me, to be the holy grail of the semantic web.

What if 2 sites said the Prime Minister of Canada was Santa? explicity said it, would that overwrite the linked information? How would the system know what is right? You can't always just pick the majority answer, so you need to set up little areas of trust "I trust www.thisplace.com and everything it says" and that site in turn will say "I trust www.overhere.com" but who allocates the trust, couldn't those people be biased?

The semantic web will have a fantastic impact on the world, but the trust issue is something that needs to be addressed, and I don't see how it can ever, globally be done.

More likely we would have systems like this for individual sites, or intranets, trusted circles that would be unlikely to contradict themselves.

hopefully one day, if we truely get a global semantic web, we can see if the answer really is 42 :]

Re:Trust Issue by Anonymous Coward · 2004-12-26 01:27 · Score: 0

That is not an issue since the sites will be ..tataam.. signed
Re:Trust Issue by Anonymous Coward · 2004-12-26 01:30 · Score: 1, Insightful

One way of trusting is based on what google currently does for page relevance. Trust a site based on the number of other sites that link to it. In that way you could get a 'rough' idea of how trustworthy the site is.
Re:Trust Issue by KinkifyTheNation · 2004-12-26 01:37 · Score: 0

How can you really trust any random information from the internet?

There is plenty of information on the internet that may be "wrong", and most people take this information, accept it, and learn from it.

How can you teach a computer to know if something IS true without manually educating it using knowledgeable persons (professors, doctors) in the particular area?

Wouldn't that defeat the purpose of letting it spider and index on it's own?
Re:Trust Issue by ctr2sprt · 2004-12-26 01:47 · Score: 4, Interesting

All search engines return a bunch of results ordered by those it thinks most likely address your search terms. One very simple way of ranking the results is popularity (number of pages with the same answer to your question). You could fine-tune the popularity index with a Google-ish reference counting algorithm.
One of the neatest approaches of this technology, I think, is the ability to eliminate search results. Anyone who's ever used Google to troubleshoot a problem knows that the first thirty or forty matches will all be the same: web mirrors of mailing lists or USENET posts. Using a vaguely semantic technology like this, Google could say, "Hey, all these pages are effectively identical" and collapse them into a single result.
This would be terribly useful for me, since I usually start my troubleshooting searches with an error message. Error messages in the Unix world being quite standardized, this nets me at least ten irrelevant "threads." Since each "thread" is duplicated about ten times in the Google results, that means the question I'm actually asking may not appear until page 5 or later. But using result grouping like this - which Google tries and is generally unsuccessful at - would mean I'd see my question asked on the first or second pages. Big improvement.
Another nifty trick would be an actual, working "related pages" link. So let's say I find my question, but, as is all too common, it's a question without an answer. I click on the link, the search engine does its magic, and it pulls up (perhaps) technical details on the software in question or alternate solutions to my problem. This is definitely going to be harder to implement than my other idea (perhaps even impossible for now), but it'd be really nice. It could make navigating the Internet like navigating Wikipedia or amazon.com.
Ah well. I can dream.
Re:Trust Issue by DarkMantle · 2004-12-26 01:53 · Score: 1

hopefully one day, if we truely get a global semantic web, we can see if the answer really is 42 :]
But then we'll cease to exists, and be replaced with something even more strange and unexplainable.

--
DarkMantle I been bored, so I started a blog.
Re:Trust Issue by SharpFang · 2004-12-26 01:59 · Score: 1

Or what would the system answer to "Who is Bush?"
"Bush is the president of *": 888 results.
"Bush is an idiot": 5,830 results.

Actually correct? Who cares? Politically incorrect and that's what matters!

--
45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
Re:Trust Issue by shwouchk · 2004-12-26 03:06 · Score: 0

you obviously havent read the article... this isnt meant to compete against the vanilla google, but rather against google desktop search and its competitors.... I.B.M. says that its tools will make possible a further search approach, that of "discovery systems" that will extract the underlying meaning from stored material no matter how it is structured (databases, e-mail files, audio recordings, pictures or video files) or even what language it is in.
Re:Trust Issue by Anonymous Coward · 2004-12-26 15:31 · Score: 0

On a global scale this system tends to fall apart, there is a constant issue of trust when dealing with what looks to me, to be the holy grail of the semantic web.

This system has nothing to do with the semantic web. The semantic web is the stupid idea to make all web content producers describe the meaning of their page in logical terms. This system reads raw text (and maybe some other formats) and because it has a richer representation of the information stored at each source, can better find what you want. But the two ideas have the same goal.

I wonder... by Raul654 · 2004-12-26 01:17 · Score: 4, Interesting

Using google means that this would have to contend with a lot of noise - looking for one nugget of information on the internet will tend to yield a low signal-to-noise ratio. I wonder what would happen if instead, you were to run it using Wikipedia as a back end (full discosure - I'm a wikipedia admin). There'd be less information, but I suspect the quality of the results would be better.

--

To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton

Re:I wonder... by Anonymous Coward · 2004-12-26 06:48 · Score: 0

I wonder what would happen if instead, you were to run it using Wikipedia as a back end (full discosure - I'm a wikipedia admin).

Well, then why don't you approach IBM's research group and ask them that question yourself? It would be a useful technology demo which might not have occurred to them, and as a wikipedia admin you would be the appropriate person to suggest the idea to them.

Prolly a hand-picked question by Ancient_Hacker · 2004-12-26 01:22 · Score: 3, Insightful

One example is meaningless. To get a realistic idea of how useful this system is, we'd like to see what it says if you ask several dozen questions. For all we know this was the one question out of 100 that it answered correctly.

Re:Prolly a hand-picked question by Quixote · 2004-12-26 02:02 · Score: 4, Funny

Any sufficiently advanced technology is indistinguishable from a rigged demo.
-- Andy Finkel, computer guy
Or, conversely,
Any sufficiently rigged demo is indistinguishable from an advanced technology.
-- Don Quixote, slashdot guy
;-)
Re:Prolly a hand-picked question by stephanruby · 2004-12-26 02:04 · Score: 1

One example is meaningless. To get a realistic idea of how useful this system is, we'd like to see what it says if you ask several dozen questions. For all we know this was the one question out of 100 that it answered correctly.
And for all we know, the programmers were given the article(s) and the question(s) before they wrote the program. To get a realistic idea of its usefulness, they should really post it on the web as an experimental app. If it's any good, people will use it.
That's what I like about Google, they test their experiments on a segment of the population before they start hyping it.
Re:Prolly a hand-picked question by hobo2k · 2004-12-26 07:35 · Score: 1

Yeah, I'd love to see how it does on the reading comprehension section of the SATs.
Re:Prolly a hand-picked question by Inthewire · 2004-12-27 18:20 · Score: 1

Rephrase "a segment of the population" and try again.

--

Writers imply. Readers infer.

AI research is still in the Dark Ages by Anonymous Coward · 2004-12-26 01:23 · Score: 2, Funny

The solution to functional, robust and real AI is not better software or better hardware. Real AI will never be implemented on silicon chips.

We must integrate ourselves with computers to a point at which the living being and computer cannot be separated anymore. The perfect union of the biological component (wetware) and computer (hardware) will mark the end of the human race - and the birth of something new and wonderful.

Obviously this will face strong, religious and quasi-religious (ethics) resistance from the old guard but it will pass with the fools themselves.

Re:AI research is still in the Dark Ages by Anonymous Coward · 2004-12-26 07:42 · Score: 0

hmmm, define artificial in artificial intelligence, then answer me : where is the artificial intelligence part in your proposition?
Re:AI research is still in the Dark Ages by Kippesoep · 2004-12-26 10:25 · Score: 1

Resistance? Resistance is futile! You will be assimilated. Seriously, don't you think there's a reason the Borg or, for that matter, pretty much any cyborg fantasy, are portrayed as being evil or at least having the potential to be not "something new and wonderful", but "something new and terrible"? Like many /.-ers, my work heavily involves using computers, but on some level I am afraid of them (or rather, what technology may eventually become). Irrational? Maybe...
Re:AI research is still in the Dark Ages by Anonymous Coward · 2004-12-26 13:35 · Score: 0

Resistance is futile...

Canadian Prime Minister by Anonymous Coward · 2004-12-26 01:25 · Score: 4, Funny

I for one congratulate Canadian Prime Minister Tim Horton for running a great campaign and his wife Wendy for her fantastic chain of restaurants!

Re:Canadian Prime Minister by djeddiej · 2004-12-26 02:06 · Score: 1

All your Base Prime Ministers belong to Canada

--
just a web application developer and instructor in Toronto, ON Canada

Comparision by mahesh_gharat · 2004-12-26 01:25 · Score: 1

Does that system capable of searching for Paris Hilton when searched for the letter "P" instead?
This reminds me of the famous quote "Artificial Intelligence usually beats real stupidity"

Re:Comparision by Anne+Thwacks · 2004-12-26 02:24 · Score: 1

Artificial Intelligence usually beats real stupidityThose of us over 18 have generally found this to be tihe other way round as ina small amount of real stupidity beats any amount of artificial intelligence

--
Sent from my ASR33 using ASCII

garbage in, garbage out by bigmo · 2004-12-26 01:33 · Score: 1

While this is pretty impressive stuff, I think we should be wary of how it gets "information" to digest and correlate. If it gets high quality, well researched articles, it will potentially be a great tool to get the "highlights" on a subject and provide a starting point for your own research. However, if it is given less qualified articles to index, it will develop a poor and possibly perverse view of a given subject. Poorly informed people tend to talk the loudest and longest, so I'm concerned about a "finder of fact" set loose on the internet. Likewise I'm concerned about that same "finder of fact" given a limited set of information filtered by people, even if they're well meaning.

I suppose this is true of all information gathering, computerized or not. It's the potential efficiency of a system like this that scares me.

Re:garbage in, garbage out by 36-bitter · 2004-12-26 01:40 · Score: 1

Ah, but maybe there are patterns that can be used to score some articles as probably low-quality. Like your observation that "poorly informed people tend to talk loudest and longest." Throw in a penalty for dodgy spelling and I think it might be pretty good.

I'd like to see that article. by Anonymous Coward · 2004-12-26 01:36 · Score: 1, Interesting

If the article doesn't come out and state that Paul Martin is the Prime Minister then how could anyone--including a computer--know that for sure? I think the submitter was stretching the truth a bit when he said the words "Prime Minister" don't appear in the article. Can you imagine an article about George Bush that didn't use the word President?

Re:I'd like to see that article. by Anonymous Coward · 2004-12-26 03:20 · Score: 0

I could imagine an article about George Bush that used the word "psycho"
Re:I'd like to see that article. by Anonymous Coward · 2004-12-26 06:42 · Score: 0

The exact words "Who is Canada's prime minister" did not appear in the article, but presumably the words "Canada", "prime minister", and "Paul Martin" did, and it was able to figure out that they went together, and answer the question phrased in that form.
Re:I'd like to see that article. by RyuSoma · 2004-12-26 09:29 · Score: 1

Can you imagine an article about George Bush that didn't use the word President?

I suspect many of us wish we could.

A higher standard for a machine than a human? by Anonymous Coward · 2004-12-26 01:36 · Score: 0

A human also extracts contextual information from articles and reaches conclusions or "beliefs" that it then gives you in response to a question.

You may believe the human because you trust its judgement and because hell, it can't remember exactly where it heard that the PM was santa.

You don't have to believe a machine as a badge of friendship the way a human will ask you to. The machine will also be able to tell you exactly where it read something that made it think the PM was Santa.

You can then go to that site and say "Ah, it's a joke," or "Ah, the machine was too stupid to understand."

Moreover, the primary purpose of such a machine is not to answer questions straight out, but to point you to sites. So if you ask for sites about the biography of the Prime Minister it will hand you an article which might imply to a stupid person that the PM is Santa. You, a not stupid person, will then say "This isn't the article I'm looking for, please give me another."

This is little different than googling "Canada Prime Minister biography" and rejecting a biography of Lester B. Pearson because his bio starts many years earlier than would be reasonable for a current PM.

The advantage is that you could say to the LSI "Who is Canada's current Prime Minister" and it could point you to a site that answers the question *even though the word current isn't used in the article.*

It just cuts down on the search term juggling we now do to get an answer that makes sense. Most of the Ask Slashdot questions that make people angry are of this nature. Someone will ask "How do I stream media of this nature to this sort of device" and many people will respond "Just google it!"

You have to know enough about streaming and the devices to give Google a good set of search terms.

You fools! This is the beginning of the end! by Anonymous Coward · 2004-12-26 01:36 · Score: 1

[Scientist at IBM asks the computer a question after having it connect to and read all the documents on all the computers in the world]

Scientist: "Is there a God?"
Computer: "There is now."

/can't remember what movie/book this was from

As important as this tech is for web-searching by Anonymous Coward · 2004-12-26 01:39 · Score: 1, Insightful

...in the long term it may be even more important for translation between languages -- being able to discern both implicit and explicit meaning in a passage will make accurate translations easier -- and perhaps in combination with Cycorps "Cyc" (or similar project) in the extreme long term to create an artificial intelligence capable of understanding human communication.

There are other interesting possibilities. In the tradition of Esperanto and Lojban, it can also be used to gather the common aspects of natural language and create a universal second language (one much easier in grammar and spelling, more compact in expression, and more complete in meaning). This wouldn't have the cultural baggage of English, which is at present the only thing coming close to a universal second language.

I think it's about time by 36-bitter · 2004-12-26 01:45 · Score: 1

There *must* be something better than the same old dumb string matching.

However, this sort of thing might be better employed as a knowledge engineer's assistant, doing the rough work of attaching useful metadata to documents drawn from the enormous piles that we've accumulated.

Piquant sounds like an AI. by Mentifex · 2004-12-26 01:46 · Score: 0

It sounds like an artificial intelligence (AI).

An AI is really sophisticated when it can ask its own questions of the user.

What do I think? by OmegaFire · 2004-12-26 01:47 · Score: 1

I think the prime minister of Canada is Paul Martin.

Now... by SharpFang · 2004-12-26 01:52 · Score: 5, Insightful

Feed it the news about Iraq. Then ask it what the war was about.
Good bye, new system, too dangerous for "national security".

--
45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2

Re:Now... by Drak+Martel · 2004-12-26 02:29 · Score: 1

Unfortunately, this bit of software is not made to make inferences. What it came back with would probably depend on how many sites said the war was about oil or whatever the hell they say (Democrat sites), how many say it was about weapons of mass destruction (Republican sites that haven't been updates), and how many that just cliam that the US had a right to beat the shit out of Saddam (updated Republican sites). This is of course going on the assumption that the engine would be looking for majority. If it just looks for the first site... who knows what you will end up with. And if it tries to take everything into account in some kind of related collection, it will just tell you that no one really knows what's going on, which is probably the most accurate answer anyway.

--
"Half of being great is learning from your mistakes. The other half is covering them up."
Re:Now... by Oswald · 2004-12-26 02:35 · Score: 1

...it will just tell you that no one really knows what's going on...
That would truly be a triumph of computer programming, given how few people seem to be smart enough to draw that conclusion.
Re:Now... by Anonymous Coward · 2004-12-26 02:46 · Score: 0

Garbage In, Garbage Out. So long as you only feed that computer news from "trusted" sources, you should be fine.

Assuming of course you don't ask the right (wrong?) questions... "What have been the different historical reasons used to justify the war in Iraq?" might be damaging if a citizen understood that shifting justifications were an indication of lying.
Re:Now... by ignavus · 2004-12-26 13:52 · Score: 1

"Earth calling America! Earth calling America! Come in planet America...."

Just to let you know.

There are other countries besides America. Their parties are usually not called "Republicans" and "Democrats" - and don't even necessarily correspond to those American parties. The non-American countries also hold views about Iraq. Many also write in English (UK, Canada, Australia, New Zealand, also India, the largest democracy in the world ...)

Google, and any alternative search engine, would spider through and index all these non-American sites. There are a lot of them - more than you might think.

It is possible (gasp) that you might get answers that *don't reflect American perceptions of the world, or America's internal politics*. Remember, most of the world wanted the other guy to win, whatsy's face. The US has an unusual political make-up because of its enormous affluence and wealth compared to the rest of the world.

Also remember: the US accounts for just 5% of the world's population. The rest of us are 95%. You are outnumbered. Even the Internet is becoming less American day by day. And as for the web, it wasn't even invented by Americans or in America (it is a European invention).

Try to keep a perspective: you are one country among many, not the sole occupants of the planet. It would create such a nice impression on the rest of us if you would remember this.

--
I am anarch of all I survey.
Re:Now... by jdgeorge · 2004-12-26 16:59 · Score: 2, Funny

Okay, let's get back on topic. I fed the parent post into Diebold's equivalent of IBM's fancy technology and asked it to provide an appropriate response. Here's what I got:

------------------

There are other countries besides America. Their parties are usually not called "Republicans" and "Democrats" - and don't even necessarily correspond to those American parties. The non-American countries also hold views about Iraq. Many also write in English (UK, Canada, Australia, New Zealand, also India, the largest democracy in the world ...)

What a pile of pinko, left-wing, pansy-assed, New York propaganda. Everyone knows the Good Ol' US of A is the only real country. Don't try to pull that "there are other countries" crap or we'll kick your sorry nation's ass just like we did back in 'Nam. Oh, and Iraq, too; we really kicked some major terrorist ass there. And your anti-Republican propaganda means you're definitely a terrorist.

Also remember: the US accounts for just 5% of the world's population. The rest of us are 95%. You are outnumbered. Even the Internet is becoming less American day by day. And as for the web, it wasn't even invented by Americans or in America (it is a European invention).

Now, that's right out of the Democrats party thing where they say what they say about stuff. Damn, Democrats are stupid; Everyone knows that the US is, like, the third biggest country. That means the US is AT LEAST a third of the world's population. Except Africa, but they don't count, 'cuz they all live in huts and eat dried camel poo.

Oh, and I wouldn't be bragging about the web being a European invention, because it wasn't. Besides, if it was, the web sucks anyway, so why are you bragging about it?
-----------------

Deep Blue beats Kasparov by Sir+Tandeth · 2004-12-26 01:59 · Score: 1

I for one welcome our superintelligent big blue overlords.

Re:You fools! This is the beginning of the end! by phfpht · 2004-12-26 01:59 · Score: 1

Isaac Asimov's "The Last Question"

http://www.google.com/search?hl=en&q=asimov+%22the +last+question%22&btnG=Google+Search

PM Horton by handy_vandal · 2004-12-26 02:02 · Score: 1

I for one congratulate Canadian Prime Minister Tim Horton for running a great campaign and his wife Wendy for her fantastic chain of restaurants!

Prime Minister Horton ... he's related to the Horton of "Horton Hears a Hoo" fame, right?

-kgj

--
-kgj

Re:You fools! This is the beginning of the end! by Anonymous Coward · 2004-12-26 02:05 · Score: 0

No, that's not it.

Won't work. by jameson · 2004-12-26 02:12 · Score: 5, Informative

Disclaimer: I haven't read the article; however, I was somewhat involved in research in this field in late 2003 and early 2004.

What the summary of the article claims IBM is developing-- a technology for getting the semantics behind an arbitrary sentence on the web-- is the Holy Grail of the discipline of Natural Language Processing (NLP) and very, very, very, _very_ far away at this point. Many people believe that we cannot ever get there (that's the point of a Holy Grail, after all), but I don't want to be quite as pessimistic (or realistic?) at this point.

The problem here is that English (or any other natural language, for that matter) isn't SML, or Haskell, or some other language with a well-defined denotational semantics. Natural language suffers from at least three problems that make it very tough to gather anything useful from a given piece of text:

(1) Grammar. Natural language isn't typechecked, and frequently uses incomplete sentences, which makes it hard to develop grammars (context-free, context-free probabilistic, lambek-style/proofnet-style or whatever else people have come up with) for it.

(2) Anaphora resolution. "I saw a dog on the street this morning. It was barking". So who's barking, street or dog? Gramatically, both would be possible; only with prior knowledge we can see that we're talking about the dog here.

(3) Polysemy. What does "play" mean, taken by itself? It can be used for different meanings in "to play a game", "a play of words", "a terrific shakespearian play" etc.; you might want to have a look at wordnet one of these days to get a feeling for this. Not knowing which meaning an arbitrary occurence of "play" refers to means that you have to try lots of options when parsing, LSIing or whatever else you do (though most people simply ignore this problem in research today-- it's too hard to disambiguate words in practice).

That's not all, of course-- try thinking of the need to deal with irony/sarcasm, metaphors, foreign words, the credibility of whichever sources you're using etc., and you'll get a pretty good feeling for why this is beyond merely being "hard". Of course, for very small problem domains (a "command language for naval vessels" was investigated in one paper I read a while ago-- those DARPA people definitely have too much money on their hands, but I digress), this can be solved, but general-purpose open-domain NLP is what you need to do a web search.

It might happen in my lifetime, but I won't hold my breath for it.

-- Christoph

Re:Won't work. by Anonymous Coward · 2004-12-26 03:18 · Score: 0

But would this system be able to infer that "...acting president..." referred to Dubya however you read the phrase?
Re:Won't work. by vingt · 2004-12-26 03:59 · Score: 1

(2) Anaphora resolution. "I saw a dog on the street this morning. It was barking". So who's barking, street or dog?

Obviously, the morning was barking. That's when it's bloody, farking cold out.
Re:Won't work. by MmmDee · 2004-12-26 04:53 · Score: 1

Back when I worked in this field briefly, (?) mid-1980's (Turbo Pascal was the language if you can believe it), I quickly learned how inherently ambiguous (to use some of the vernacular in vogue then) spoken language truly is.

--
No man's an island, unless he's had too much to drink and wets the bed.
Re:Won't work. by Elektroschock · 2004-12-26 06:28 · Score: 1

Well, then use Lojban. See http://www.lojban.org A logical language.
Re:Won't work. by jelle · 2004-12-26 09:36 · Score: 1

For 2) and 3), using a Hidden Markov Model and doing a viterbi search instead of trying to do direct classification of the meaning will pretty much deal with those problems. I'm sure the other problems can be dealt with too.

Not to say it wouldn't be a big achievement to build a practical system with everything incorporated into it, but IMHO the technologies already exist.

--
--- Hindsight is 20/20, but walking backwards is not the answer.

Re:You fools! This is the beginning of the end! by Xugumad · 2004-12-26 02:14 · Score: 3, Funny

Is it just me who would, if designing an AI, would have have a trivial off switch. Probably a few backups, like wire cutters next to the main power cable, a jug of water near the PSU, things like that.

It is just me, isn't it...

Actually, this technology was developed at CMU by Anonymous Coward · 2004-12-26 02:26 · Score: 1, Interesting

As some of you still remember, the original technology behind this was developed at CMU in the mid 90's when Corey Kosak, Andrej Bauer and a bunch of other talented people created the first ever natural language based neural network with a measurable IQ. People could even post questions to certain personae emulated by the neural network through the web site CGI at forum2000.org. This neural network was really fun and witty, but what you probably do not know is that all the technology in fact consisted of bored postgraduate students answering your questions.

Greetings to Kosak, Bauer and all the anonymous people who tried their best to pretend they're a software based neural network.

Can you imagine by melvo · 2004-12-26 02:26 · Score: 3, Interesting

Can you imagine when a system of this kind is capable of reading google's online library? If knowledge is power, we are looking towards creating a very powerful entity.

IBM--and dozens of others by jeif1k · 2004-12-26 02:36 · Score: 1

Semantic analysis of text has been the holy grail of AI for decades. It's useful for all sorts of things, including information retrieval, translation, speech recognition, and summarization. IBM is hardly the only research lab working on this, or the only company on using it for enhancing search.

Google thing is really not so yesterday .... by leoaugust · 2004-12-26 02:38 · Score: 1

That Google Thing Is So Yesterday

I don't the race should be about beating the results provided by google, but it should be about the interface provided to wade thru the results. By that I don't mean the 3D or clustering interfaces like vivisimo, nor the visual-basic like constructs of "search builder" at beta.search.msn.com - but more so about how to improve your results after you have started.

Of all the advanced mathematics classes that I took, one thing that stands out for me is that out of many possible solutions it was hard to just jump to the right one - what always had to be done was to select a "seed" and then improve upon the feedback that was provided ...

Google suggest is one step in that direction. You key in the first alphabet and then you get feedback ... some day it might anticipate your question itself because so many other people have asked the same question - that to me is a more realistic goal than trying to anticipate the answer.

In other words the direction of the research should be to anticipate quickly (like google suggest does) what the person is trying to ask rather than what answer the person is expecting. I know the differnce is subtle enough to raise the question of if I am saying anything different. Yes, it is - just like 2 isomers are essentially the same in construction but very different in effects -

there is a big difference in trying to anticipate the question that someone wants to ask versus anticipating the answer they are expecting.

--
To see a world in a grain of sand, and then to step back and see the beach where the sand lies ...

citation analysis by jeif1k · 2004-12-26 02:40 · Score: 3, Insightful

The genius being google's success was paying *less* attention to the content of a page when categorizing it, and relying on links *to* the page instead. Why? Because of spammers.

"Genius" would imply some sort of brand new insight, but citation analysis has had a long tradition before Google appeared on the scene as a search engine. Google's biggest achievement is probably in implementing citation analysis on a very large scale, but they didn't break completely new ground in how people search.

And, in the long run, semantics-based analysis, like IBM's Piquant, is probably going to be the better technology: citation analysis for determining relevance to a query is really just a limited substitute for understanding of the content.

Transcript of conversation with Piquant by trveler · 2004-12-26 02:48 · Score: 3, Funny

User: Who is Canada's Prime Minister?

PIQUANT: WHY DO YOU ASK, USER?

Because I want to know.

WHAT IS THE REAL REASON, USER?

I want to test you.

WHAT WOULD IT MEAN TO YOU IF YOU GOT TO TEST ME?

It would mean I can determine whether or not you know who Canada's Prime Minister is.

WHAT MAKES YOU THINK I COULD TELL YOU THAT?

I programmed you to figure it out.

LET'S TALK ABOUT YOU -- NOT ME.

I want to know the name of Canada's Prime Minister.

WHY DO YOU WANT TO KNOW THE NAME OF CANADA'S PRIME MINISTER?

Are you going to tell me his name or not?

WHY ARE YOU INTERESTED IN WHETHER OR NOT I AM GOING TO TELL YOU HIS NAME OR NOT?

Stop avoiding the question! Just tell me his name!

NAMES DON'T INTEREST ME.

Fuck you!

TELL ME MORE ABOUT YOUR OWN PROBLEM, USER.

Shit!

GO ON.

You suck.

YOU'RE NOT REALLY STILL TALKING ABOUT ME, ARE YOU?

--
... is whot bwings os tugevza tsuzay.

this is not "semantic web technologies" by jeif1k · 2004-12-26 02:52 · Score: 1

I for one, welcome our new semantic web overlords! It's really great to hear that something based on semantic technologies is finally breaking through. This could be the dawn of a new era :)

The term "semantic web" refers to technologies that let authors provide markup indicating the semantics of content. That is, the "semantic web" places a burden on the authors of pages.

What natural language analysis is doing is a completely different approach: instead of burdening authors with marking up their pages to become part of a semantic web, it is taking the existing content and inferring semantics for it.

All knowledge available everywhere, any time, that would be a great thing. Heck, it's even quite scary to think about it.

That's been the AI vision for half a century. But implementing it is still way off (and IBM is only one of many institutions working on it).

Re:this is not "semantic web technologies" by Master+of+Transhuman · 2004-12-26 09:57 · Score: 1

Actually the critical component of AI is conceptual processing. Semantic processing cannot possibly succeed without the construction and representation of concepts.

And not very many people are working on it IIRC. Many of the big names who used to work on it, like Roger Schank, have moved on to other things because it was so hard.

CYC was an attempt to brute-force some form of conceptual processing. Since it's been around for decades and has made absolutely no impact, obviously it's not the way to go.

--
Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
Re:this is not "semantic web technologies" by ralphclark · 2004-12-26 11:56 · Score: 1

The CYC project may not yet have come up with the right mechanism to turn their database into a conscious, self aware entity, but the information and semantic relationships they have captured in the process is an essential tool, and must surely remain so, for anybody attempting to develop anything similar. After all, you either have to load the information into the software before power-on, or else it is going to take several years for the information to be captured in the "traditional" way. And who can wait that long just to see how one particular experiment will turn out?
Re:this is not "semantic web technologies" by jeif1k · 2004-12-26 17:12 · Score: 1

Actually the critical component of AI is conceptual processing. Semantic processing cannot possibly succeed without the construction and representation of concepts.

I agree, but many people (myself included) view "conceptual processing" simply as a part of semantics, not as a separate field.

Many of the big names who used to work on it, like Roger Schank, have moved on to other things because it was so hard.

That's not surprising: Schank's approach was naive and unworkable.
Re:this is not "semantic web technologies" by jeif1k · 2004-12-26 17:17 · Score: 1

CYC is being developed without much grounding in particular applications; chances are that its developers have made so many mistakes in its development that it will turn out to be useless. Time will tell.
Re:this is not "semantic web technologies" by ralphclark · 2004-12-27 04:24 · Score: 1

I don't think you quite understand. CYC comprises an utterly huge amount of data. The captured semantic relationships will be useful to future AI researchers no matter what happens. Even if it contains mistakes these will be caught and corrected eventually - just like the unfortunate fellow in the Readers Digest short who thought "hirsute" meant "nevertheless".
Re:this is not "semantic web technologies" by jeif1k · 2004-12-28 05:42 · Score: 1

CYC comprises an utterly huge amount of data. The captured semantic relationships will be useful to future AI researchers no matter what happens.

Not if it turns out that the approach to representations and reasoning used by CYC is fundamentally wrong. In different words, you can collect gigabytes of Roman multiplication tables and still not be able to solve a differential equation.
Re:this is not "semantic web technologies" by ralphclark · 2004-12-28 06:09 · Score: 1

Information which is stored as a semantic net (as it is with CYC), can be converted to any other representation. The information will a be useful starting point *even if* we ended up having to assign fresh weights to every semantic relationship. A lot of CYC's work about how to manipulate this information in order to create intelligence, that may or may not pan out. But the semantic net they arecreating is an uploadable understanding of the world, and its easily convertible to a lowest common denominator format. "Wrong" just doesn't enter into it. It's fundamental.

just a question by adeydas · 2004-12-26 02:55 · Score: 1

does AI technology follow a similar pattern too?! thanks...

arrogance, dishonesty, or ignorance? by jeif1k · 2004-12-26 02:56 · Score: 1

the first program to take advantage of its new strategy for solving search problems. This approach, which it calls unstructured information management architecture, or UIMA, will, according to I.B.M., lead to a third generation in the ability to retrieve computerized data.

IBM researchers are right that AI techniques are getting powerful enough to allow unstructured information retrieval based on semantic content. But what IBM researchers are trying to do here is take credit for technologies and ideas developed by thousands of scientists over decades.

I don't know whether this is arrogance on the part of the IBM researchers, dishonesty, or ignorance, but either way, public statements like that on IBM are not a recommendation for the quality of their research or products.

In fact, this seems to be getting more and more common: while this has always been a problem, companies like IBM, Sun, and Microsoft are increasingly trying to take credit for entire fields of research that they contributed, if anything at all, only a miniscule amount of new work to.

Google already has an unfair monopoly by Anonymous Coward · 2004-12-26 02:58 · Score: 1, Interesting

Google has an unfair advantage over potential rivals. I'm talking about their ownership of the entire Usenet archive (effectively so) in the form of google-groups. No matter how good any potential rival becomes, people will always have to turn to them for access to past Usenet archives.

Google's recent mangling of google-groups (mentioned already on /. ) is proof of the power they hold by virtue of ownership of the Usenet archive, which they acquired when they bought out deja-news. Some legislation should be enacted to address this issue. Otherwise what is to stop them from one day offering pay-per-view or "premium access" to their archive ? After all Usenet is a public resource that shouldn't be at the mercy of any single corp. - no matter how large.

Re:Google already has an unfair monopoly by Anonymous Coward · 2004-12-26 11:40 · Score: 1, Insightful

Legislation?!? Your kidding, right? Yup, Usenet is in the public domain, but the value added is they bought a company that kept copies of it. And they continue to maintain those archives. That costs money in hardware, software and support and they are entitled to charge for that if they want to.
Anyone who wants to can maintain their own archive of usenet - and could have from the beginning if they wanted to.
Just like LexisNexis® searches - sure I can find information on the Internet, but there is an awful lot of garbage passing itself off as fact (such as many commentaries on this site) whereas the value added for the commercial services is access to verified, targeted and reliable content, delivered much faster than a search on Google can deliver.
Not everything is free, nor should it be. Could be worse - could be back in the MIddle ages when only the clergy were taught to read and write - you couldn't pontificate on a site like Slashdot then :P

From factoids to facts by yfnET · 2004-12-26 03:03 · Score: 2, Informative

As it happens, The Economist recently ran an article addressing some of these issues. The article also provides context and perspective that should be of interest to those participating in this discussion. For convenience, the full text is reproduced below; it is also accessible online (may require paid subscription).

----

Computing

From factoids to facts

Aug 26th 2004 | REDMOND, WASHINGTON
From The Economist print edition

At last, a way of getting answers from the web

WHAT is the next stage in the evolution of internet search engines? AltaVista demonstrated that indexing the entire world wide web was feasible. Google's success stems from its uncanny ability to sort useful web pages from dross. But the real prize will surely go to whoever can use the web to deliver a straight answer to a straight question. And Eric Brill, a researcher at Microsoft, intends that his firm will be the first to do that.

Dr Brill's initial crack at the problem is a system called "Ask MSR" (MSR stands for Microsoft Research). This program uses information on web pages to respond to questions to which the answer is a single word or phrase--such as "When was Marilyn Monroe born?" Ask MSR starts by manipulating the question in various ways: by identifying the verb, for example, and then changing its tense or moving it into different positions in the sentence ("Marilyn was Monroe born", "Marilyn Monroe was born" and so on). The resulting phrases are then fed into a search engine, and documents containing matching strings of words are retrieved. It sounds a promiscuous strategy, but gibberish phrases produce few matches, so, as Dr Brill puts it, "being wrong is very cheap."

Once accumulated, the pile of documents is scanned for possible answers, and these are ranked by frequency. In practice, the correct answer appears in one of the first three places around 75% of the time. That might not sound very good, but human intelligence provides a second filter, since wrong answers are often obvious. If you ask how many times Bjorn Borg won Wimbledon, for example, "1980" is not a plausible answer, but "5" is. If in doubt, clicking on an answer produces a list of links to pages which provide support for that answer.

Ask MSR is still a prototype, although Microsoft is trying to improve it and it may be launched commercially under the name AnswerBot. Dr Brill, meanwhile, has moved to a more difficult task. One of his most recent papers, written jointly with Radu Soricut of the University of Southern California, is entitled "Beyond the Factoid". It describes his efforts to build a system capable of providing 50-word answers to questions such as "What are the rules for qualifying for the Academy Awards?" This is harder than finding a single-word answer, but Dr Brill thinks it should be possible using something called a "noisy channel" model.

Such models are already employed in spell-checking and speech-recognition systems. They work by modelling the transformation between what a user means (in spell-checking, the word he intended to type) and what he does (the garbled word actually typed). Just as a telephone line distorts the voice of the person at the other end of the line, this process can be thought of as being a noisy channel that transforms the user's intention into something rather different.

By analysing many pairs of correct and mis-spelled words using statistical techniques, it is possible to predict how such transformations work in general cases. A system can then be designed to work the process backwards. Given a mis-spelled word, it can guess what that word is most likely to be a mis-spelling of.

Dr Brill's question-answering system does something similar. Many question-and-answer pairs exist on the web, in the form of "frequently asked questions" (FAQ) pages. Dr Brill trained his system using a million such pairs, to create a model that, given

--
The extreme centre is the paper's historical position. --Geoffrey Crowther

The real test is.. by Anonymous Coward · 2004-12-26 03:14 · Score: 0

What is the meaning of life, the universe, and everything? My 386 says 41 but I need confirmation.

Thats better than most kids in the U.S! by bigattichouse · 2004-12-26 03:24 · Score: 1

Wow, 90% of US kids can't do that. I say hail our Paragraph-COmprehending-Candian-Prime-Minister-kno wing LSI-based overlords!

--
meh

Entities by kupojsin · 2004-12-26 03:32 · Score: 1

I dont know that a large scale semantic web is "impossible". Certainly what Ibm is accomplishing is nowhere close to the Semantic web utopia we imagine. From what i gather however All it would take is a really effective learning algorithm and the aforementioned "trust system" which i bet could be similiar to trust system of say wikipedia . eventually certain standards could be hardcoded after review by open commmunities. things such as gravity laws languages etc standards that dont change

Re:What do I think? by altstadt · 2004-12-26 03:39 · Score: 1

Everybody knows the Canadian Prime Minister is Jean Poutine.

Called Content Analysis by Anonymous Coward · 2004-12-26 04:03 · Score: 0

Exposed 'knowledge' that wasn't explicitly there is called Contetnt Analysis.
It was developed by British linguist professors during World War II., when the British invented RADAR, but the intelligence services could not verify how well it worked, because the the targetted subs could sink without much visual proof.
When presented with the problem, a few professor valunteered the idea to "expose 'knowledge' that wasn't explicitly there".
They could provide positive proof analysing German news of all kind that the RADAR worked.
Content analysis is now routinly used by all intelligence services, Robert Redford was doing it in the 3 Days of the Condor (or something like that...)

Just-a-random-idea

xxxxxxx
now sit back and watch how Anonymous Coward gets 0 posting point - regardless of the content.

Already done... by spywarearcata.com · 2004-12-26 04:29 · Score: 1

Searching using keywords driving near-synonym lists has been done for more than a decade now.

The hot research right now are keywords driving a state machine composed of encyclopedic dictionaries, real-time text production as on the Internet (used similar to citations in the Oxford English Dictionary), and feedback nudges from the keyword originator (after all the concept the keyword originator is seeking may rapidly be evolving for *them*).

You want to use a dictionary rather than thesaurus because for the same reason you don't a priori page rank Google indices -- you don't want to selectively exclude dilute links that always exist between one concept and another.
It makes a wonderful living dance.

Re:Wow - translations and context by CdBee · 2004-12-26 04:42 · Score: 1

Using a translation engine to compare how the same text looks in two languages might be a good way for a system to "learn" context.. which does, after all, rely upon understanding the other possible meanings of a word

--
I have been a user for about 10 years. This ends Feb 2014. The site's been ruined. I'm off. Dice, FU

SM/2 lives? by Nelson · 2004-12-26 04:55 · Score: 2, Interesting

They used the very same example to demo searchmanager/2 about 10 years ago (maybe more?)

Phenominal technology, IBM built the desktop search that everybody is pushing now, way back when. Cutting edge search and indexing capabilities, fully extendable, you could write your own plugins to deal with your data (use JPEG meta tags to label pictures from your digicam? Write a little plug in so you can search through your photos) and it had semantic and linguisitic searching.

For a long time SM/2 was kind of the poster child for IBM's inability to take remarkably cool technology to the consumer. Everyone that used it thought it was cool, nobody ever knew about it. They had trouble getting the word out within the company about it. Last I heard anything about it, they were turing the technology into some kind of intranet spider. It was the shit, it might have even had primitive cross referencing, like you could search for president and it would find references to Clinton because a third article may have referred to him as the president. They seemed to have some foresight into this area, web searching has to cut out some much bullshit, you wouldn't want to contaminate your semantic searches with all of it, keeping it in intranet space might be a good idea. Local search is hot right now too though so maybe it'll come back.

Who is Canada's prime minister? by kbahey · 2004-12-26 05:00 · Score: 1

After scanning a news article about Canadian politics, the system responded correctly to the question, 'Who is Canada's prime minister?'

Everyone knows he is Tim Horton!

--
2bits.com, Inc: Drupal, WordPress, and LAMP performance tuning.

UIMA available for download by blamanj · 2004-12-26 05:06 · Score: 1

The data annotating technology used by OmniFind (UIMA) is available for download at IBM's Alphaworks site.

In ordinary search, the text is parsed and a giant index is created. UIMA allows you to write annotators that look for additonal information, for example names of elected officials, and add those entires to the index as well.

Re:I wonder by fyngyrz · 2004-12-26 05:19 · Score: 1

More clearly, it isn't "noise" -- random comments about this and that that may or may not be relevant -- it's often misinformation. The web is chock full of bogus claims and incorrect asesertions, both direct and indirect. It is bad enough that searching for information turns up articles written by those who don't have any idea what the facts might be on any particular subject (assuming there are facts to be had, which isn't always a given), but to add inference from context to a mileau where the context is already highly doubtful isn't all that great an idea.

Wikipedia isn't exactly a bastion of accuracy, either. Look at your entry on atheism; you had to lock it because the editors who wanted to work on the topic don't agree on what is even is, though the etymology of the word is crystal clear. Many of the other articles I've seen trivially fail the "neutral point of view" and tend toward either the pompous or are just plain wrong. NPOV takes a concerted effort by a thinking person. Yet you let anyone edit the articles -- so I can't say I'm surprised by this. But in the end, we're not looking at something that is anywhere near as good a reference as it could be if Wikipedia put some effort into vetting and controlling its editors.

IMHO. As someone who enjoys, and contributes, to wikipedia. :)

--
I've fallen off your lawn, and I can't get up.

Now, we've been over this before by dodongo · 2004-12-26 05:34 · Score: 3, Interesting

NLP and semantic extraction and conceputal indexing is nothing new; admittedly, practical implmentations have been few and far between.

However, as I'm often fond of pointing out, the problem is not getting the 80 - 90% accuracy in translation and interpretation that I'm sure these systems can attain.

The challenge quickly becomes how to deal with idioms and idiosyncratic constructions. Is this system even ready to deal with sentences like "The criminal was shot dead by police"? If it is, great. How about "The trolley rumbled through town"? Or the idiomatic "time flies"?

This is what, so far as I know, the field of computational linguistics is now facing in textual interpretation and translation. Coming up with a system to effectively identify what appear to be three-argument verbs ("Mary hammered the metal flat") or constructions or idioms above may well be something that traditional systematic recursive grammars aren't yet up to handling.

Somehow these situations have to be identified, and separated in the parsing process so that they don't get processed like standard grammatical expressions.

Hopefully these problems are how I'll make my living ;)

Canadian Politics? by richjoyce · 2004-12-26 05:44 · Score: 1

You mean he was still awake after he read the article on Canadian politics?
Wow, he's got me beat.

Re:If you knew what's best for you... by Anonymous Coward · 2004-12-26 06:03 · Score: 0

Bush is a room temperature (and we're talking degrees C here) IQ. Live with it.

MOD PARENT UP by Anonymous Coward · 2004-12-26 06:10 · Score: 0

nice one

Re:You fools! This is the beginning of the end! by Anonymous Coward · 2004-12-26 06:39 · Score: 0

That's the 1954 short story "Answer", by Fredric Brown.

Wow it is already smarter than most Americans by RodeoBoy · 2004-12-26 06:48 · Score: 1

Well ok maybe not that great of a feat, but it's a start.

And the answer it gave: by sulli · 2004-12-26 06:50 · Score: 1

"Jean Poutine"

--

sulli
RTFJ.

Probably not viable for large-scale search by melted · 2004-12-26 06:50 · Score: 1

But this is a godsend for what's called "desktop" search right now. If it really works as advertised, that is, which I really doubt.

However, if Intel delivers the promised 10x boost in performance in the next 3 years (which I really doubt, too), who knows, we might see this in a centralized search engine, too.

Re:Probably not viable for large-scale search by Parthraim · 2004-12-26 19:53 · Score: 1

Sounds like the start of E.P.I.C. to me.
2014 here we come.

--
meh.

Who is NOT Canada's prime minister? by bob@dB.org · 2004-12-26 07:02 · Score: 4, Interesting

I've worked for a company making a system that could easily answer a question like that. It really isn't hard to do. If you want to know how much of this is "black magic"/AI and how much is statistics, compare the results of the following two queries:

Who is Canada's prime minister?
Who is NOT Canada's prime minister?

If the system really understand the semantics of the indexed documents, the two result sets should be very different, and both should have a fair number of relevant documents.

If the system is just based on clever use of statistis, the two result sets will include a lot of the same documents, and the result set for the second query will probably have very few relevant documents.

--
Acts@core.mailboks.com Acrux@core.mailboks.com Adam@core.mailboks.com Adar@core.mailboks.com Ada@core.mailboks.com

More competition for Delphi? by Anonymous Coward · 2004-12-26 07:48 · Score: 0

Since when is the Oracle coming from IBM? So if I ask the search engine this question

Oh Great Oracle and all knowing interface, answer me this.... "What is the meaning of the universe and everything!"

Will I get the right answer?

Re:More competition for Delphi? by kamesh · 2004-12-26 08:40 · Score: 1

u r missing the point...totally.

At I.B.M., That Google Thing Is So Yesterday by kamesh · 2004-12-26 08:43 · Score: 1

i have a feeling Google is getting out of date....??

Re:You fools! This is the beginning of the end! by Kippesoep · 2004-12-26 10:42 · Score: 1

But it is a very nice story nonetheless.

Re:You fools! This is the beginning of the end! by burns210 · 2004-12-26 12:41 · Score: 1

Personally. I would build my AI as a clustered rack of servers, for processing, raid backups, etc. Fully self-contained.

However, it would not be plugged into the internet. It would only learn through a cd-rom. an non-burnable sole cd-rom drive. So information is only 1 way.

I wouldn't want it to be able to spider the web. Learn how to make itself into a virus and spread itself to every PC on the planet.

Then again, I would also have it be a self-healing self-administrating Linux system. With remote viewing abiltiy(so we can keep an eye on it) and backup/restore... Almost like a groundhog's day, once it has a certain level of 'intelligence' it decides what and how things are placed and installed. kernel up. When it kernel panics or crashes, we take it 1 day back in history. show it the fuck up, and let it try again.

Talk about natural evolution.

The Strong AI challenge: by jnana · 2004-12-26 18:49 · Score: 1

We all love the Turing test and all of that, but I think the really pressing question on every slashdotter's mind is:

Can it recognize duplicate stories and not allow the primates to submit them again and again?

As Larry Page said by melted · 2004-12-27 06:32 · Score: 1

"The ideal search engine is like the mind of God". Here we come, a hundred million "semantics aware" PCs and we get something resembling someone's mind.

Post-googleism? by Anonymous Coward · 2004-12-31 06:08 · Score: 0

is the article suggesting that Google is going anywhere? Or that they have finished innovating? I doubt it.

Award winning hair styling and hair coloring, 100% natural hair extensions, pictures and photos of perms, glamorous updos, bridal hair updos, wedding updos, prom updos, dressy updo, prom hairstyles, cute hair updos, fun updos, body waves, for men and women. Framesi, Bio Ionic, Wella, Paul Mitchell, Matrix, Rusk, Clairol, Shades Q., and more.

Slashdot Mirror

Post-Googleism At IBM With Piquant

159 comments