Wikipedia Used for Artificial Intelligence

← Back to Stories (view on slashdot.org)

Wikipedia Used for Artificial Intelligence

Posted by Zonk on Sunday January 7, 2007 @06:25AM from the great-it-has-finally-become-self-aware dept.

eldavojohn writes "It may be no surprise but Wikipedia is now being used in the field of artificial intelligence. The applications for this may be endless. For instance, the front of spam fighting is a tough one and it looks as though researchers are now turning towards an ontology or taxonomy based solution to fight spammers. The concept is also on the forefront of artificial intelligence and progress towards an application passing the Turing Test and creating semantically aware applications. The article comments on uses of Wikipedia in this manner: '"... spam filters block all messages containing the word 'vitamin,' but fail to block messages containing the word B12. If the program never saw B12 before, it's just a word without any meaning. But you would know it's a vitamin," Markovitch said. "With our methodology, however, the computer will use its Wikipedia-based knowledge base to infer that 'B12' is strongly associated with the concept of vitamins, and will correctly identify the message as spam," he added.'"

10 of 177 comments (clear)

Min score:

Reason:

Sort:

uh oh, there goes wikipedia by ILuvRamen · 2007-01-07 06:32 · Score: 4, Interesting

don't you think masses of spammers are going to screw with wikipedia strategically on purpose so that it doesn't work properly for that if it starts to work very well to block them? They should just stop being afraid of being called racist and super-filter every e-mail that comes out of South Korea, Indonesia, and especially Nigeria, etc. Type spam map into google image search to see how blatently obvious it is to see where the spam comes from. Something like 98% of spam can be pinned down to 0.01% of the world by square footage. If they added fuzzy logic instead of alterable AI and only block e-mails from south korea with the word vitamin and not block ones from Nebraska with the word vitamin, then the problem would be decreased dramatically.

--
Google's Super Secret Search Algorithm: SELECT @search_results FROM internet WHERE @search_results = 'good'
1. Re:uh oh, there goes wikipedia by ScentCone · 2007-01-07 07:04 · Score: 5, Interesting
  
  You don't think there are hundreds of thousands of zombifiable computers in the United States?
  
  Um, so? That doesn't make it inappropriate to block traffic from places where the overwhelming majority of the packets are toxic. It's a system-by-system, admin-by-admin judgement call, but there's no question that Korea isn't doing nearly enough to stop this problem locally. If the local culture starts to realize that they're isolating themselves from large sections of the internet because they won't do something to prevent 99% of their outbound mail from being spam, then maybe the need to filter will also go away.
  
  And what about people with business connections in China or Korea?
  
  I have a lot of customers with contacts like that. All of them (their Asian contacts) use Yahoo, Gmail, and similar accounts specifically to avoid this problem. Businesses in China and Korea are totally aware that most ISPs in those areas have poisoned outbound SMTP relays and user desktops. Or, they host their western-facing mail servers with providers in the west - I see a lot of that, too, since many of those businesses have two separate messaging platforms for the different international audiences with whom they communicate.
  
  --
  Don't disappoint your bird dog. Go to the range.
2. Re:uh oh, there goes wikipedia by Mr+Chund+Man · 2007-01-07 07:47 · Score: 5, Interesting
  
  Spam Map
  
  "South Korea, Indonesia, and especially Nigeria, etc"
  While we're at it, why not block Alberta, California, North Carolina, Virginia, Colorado, Oklahoma, Kansas, Vermont, New Hampshire, Massachusetts, Spain, France and Portugal - all spam hotspots according to the map cited? What's that, you receive email from people in these places? Tough titties, if we're to block email coming from spam hotspots as you say.
  
  Also, you've managed to point a finger of blame at Indonesia and Nigeria who are saintly in comparison to some more developed nations. Go racism!
Re:Save me! Math. by CRCulver · 2007-01-07 06:43 · Score: 3, Interesting

The Bayesian analysis in spam filters only works on text. Spammers realized that they could get around it by filling the text portion of the message with some random passage from a Project Gutenberg file, thus making it seem innocuous, and then putting the real advertisement in a GIF or PNG file that would be displayed by HTML-capable mail readers. Bayesian analysis can still work, but only in combination with OCR software.
Future trends... by __aaclcg7560 · 2007-01-07 06:46 · Score: 2, Interesting

Articial Intelligence may evolve to the point that it may decide to rewrite Wikipedia from an human-centric point of view to a AI-centric point of view (i.e., World War II resulted in the deaths of six million AIs). Since people will believe anything and Wikipedia can't be wrong, it'll be one step towards the formation of the Matrix. After all, only the victors write history.
Re:Uhh by CoderDog · 2007-01-07 07:32 · Score: 2, Interesting

Presumably, Aunt Sally will be in your white-list and be passed through whether she's you tipping to startling new developments for viagra, or B-12. Most of the anti-spam work is done in an effort to avoid building mammoth personal black-lists of mostly short-lived addresses. I doubt we'll get rid of white-lists anytime soon, if ever.

What would impress me is an AI that filtered spam very effectively, but also noticed that Aunt Sally had a new email address and continued to deliver her mail.
Re:Since when by Kjella · 2007-01-07 08:03 · Score: 2, Interesting

Well, most of the defiitions on artifical intelligence go "intelligence by something artificial", then we're down to intelligence which is so fuzzily defined almost anything can be applied. The first definition on intelligence on wikipedia focuses on individuality, which in other words says it's a bunch of skills rolled up into one. The other is even fuzzier. Quote WP:

A second definition of intelligence comes from "Mainstream Science on Intelligence", which was signed by 52 intelligence researchers in 1994:
"a very general mental capability that, among other things, involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience. It is not merely book learning, a narrow academic skill, or test-taking smarts. Rather, it reflects a broader and deeper capability for comprehending our surroundings"catching on", "making sense" of things, or "figuring out" what to do"

If you're able to use wikipedia to assiociate words, disassociate meanings of the same words (like the disambiguation pages), understand subsets and supersets (B12 relates to vitamin, vitanmin doesn't always relate to B12) then you're certainly emulating a lot of human intelligence. and well... the Eliza test is all about emulating human intelligence. In other words "we don't know what it is, but if you're like us it's intelligence".

In fact, there's a pretty big group of people which almost define intelligence as whatever only humans do. If animals do it, it's instinct and if computers do it, it's logic with no thought involved. Over the years we've been giving computers more and more "open" problems, not finite and deterministic as chess (which in itself was considered intelligence until humans got spanked in it) and it turns out, the computer isn't half bad at it.

So we shrink intelligence to things that are unique or rare, and the computer lacks the in-depth understanding. Goodbye pattern recognition (statistical analysis) and inductive logic (bayesian filters, neural nets) as intelligence. Hell, we got computers hooked up to research labs essentially running the whole scientific method of characterisations, hypotheses, predictions and experiments and yet, intelligence is something else. I think that in the end, that "does computers have intelligence?" will be a question of philosophy along the lines of "do animals have souls?", because well... what we're doing isn't that magical.

--
Live today, because you never know what tomorrow brings
Re:UMMMM wordnet? by modeless · 2007-01-07 08:43 · Score: 1, Interesting

I can't imagine that wikipedia would be better for this than wordnet

You must not have a very good imagination. Wikipedia articles are far larger than wordnet definitions, with much more potential to hold useful information. Wikipedia has a much larger scope than wordnet, including huge amounts of cultural, historical, and scientific data that wordnet ignores. Wikipedia has a larger team of contributors. Wikipedia has data in several other languages besides English. Wikipedia is constantly updated with the latest information in all of its articles.

Wordnet is more structured and carefully maintained, but that is its sole advantage over Wikipedia as far as I can see. And IMHO, that's not really an advantage when talking about real-world AI problems like detecting spam. Spam is not structured or carefully maintained. A successful real-world AI needs to deal with unstructured, ambiguous, even malicious data. An AI that can't tolerate these things will undoubtedly fail.

--
Firebug. It will make your jaw hit the floor.
Re:Since when by timeOday · 2007-01-07 10:12 · Score: 2, Interesting

Maybe creative people just detect more abstract patterns (e.g. lower S/N ratio) than others?
Hutter Prize - a little realism is in order by Anonymous Coward · 2007-01-07 10:35 · Score: 1, Interesting

The new theoretic basis of universal intelligence allows a mathematically rigorous approach to AI that is reviving the field after nearly 50 years of drifting in a stagnant pool of inadequate concepts. That is a gross overstatement of both Hutter's success at solving useful AI problems and his influence in the AI community, to say the least. Just because it happens to be your favorite theory doesn't mean it has actually revolutionized the whole field of AI.