Wikipedia Used for Artificial Intelligence
eldavojohn writes "It may be no surprise but Wikipedia is now being used in the field of artificial intelligence. The applications for this may be endless. For instance, the front of spam fighting is a tough one and it looks as though researchers are now turning towards an ontology or taxonomy based solution to fight spammers. The concept is also on the forefront of artificial intelligence and progress towards an application passing the Turing Test and creating semantically aware applications. The article comments on uses of Wikipedia in this manner: '"... spam filters block all messages containing the word 'vitamin,' but fail to block messages containing the word B12. If the program never saw B12 before, it's just a word without any meaning. But you would know it's a vitamin," Markovitch said. "With our methodology, however, the computer will use its Wikipedia-based knowledge base to infer that 'B12' is strongly associated with the concept of vitamins, and will correctly identify the message as spam," he added.'"
don't you think masses of spammers are going to screw with wikipedia strategically on purpose so that it doesn't work properly for that if it starts to work very well to block them? They should just stop being afraid of being called racist and super-filter every e-mail that comes out of South Korea, Indonesia, and especially Nigeria, etc. Type spam map into google image search to see how blatently obvious it is to see where the spam comes from. Something like 98% of spam can be pinned down to 0.01% of the world by square footage. If they added fuzzy logic instead of alterable AI and only block e-mails from south korea with the word vitamin and not block ones from Nebraska with the word vitamin, then the problem would be decreased dramatically.
Google's Super Secret Search Algorithm: SELECT @search_results FROM internet WHERE @search_results = 'good'
The Bayesian analysis in spam filters only works on text. Spammers realized that they could get around it by filling the text portion of the message with some random passage from a Project Gutenberg file, thus making it seem innocuous, and then putting the real advertisement in a GIF or PNG file that would be displayed by HTML-capable mail readers. Bayesian analysis can still work, but only in combination with OCR software.