Extracting Meaning From Millions of Pages
freakshowsam writes "Technology Review has an article on a software engine, developed by researchers at the University of Washington, that pulls together facts by combing through more than 500 million Web pages. TextRunner extracts information from billions of lines of text by analyzing basic relationships between words. 'The significance of TextRunner is that it is scalable because it is unsupervised,' says Peter Norvig, director of research at Google, which donated the database of Web pages that TextRunner analyzes. The prototype still has a fairly simple interface and is not meant for public search so much as to demonstrate the automated extraction of information from 500 million Web pages, says Oren Etzioni, a University of Washington computer scientist leading the project." Try the query "Who has Microsoft acquired?"
"Who has dumped Vista?"
If I had an Ass, I'd call it Fanny Bottom, then I could slap my Ass; Fanny Bottom, on the Arse.
I suppose the major problem with this is that it cannot tell the difference between truth and lies or urban legends, it just repeats what other people have said, even if they are conspiracy theorists. The query "Who killed JFK?" suggests the CIA did it.
I'll start stockpiling food and armor piercing rounds for the moment Skynet goes live.
Yet strangely, I get a result of:
TextRunner took 9 seconds.
Retrieved 0 results for what is the airspeed velocity of an unladen swallow?.
Meh, call me when this stuff can answer the really USEFUL questions in life.
Seven puppies were harmed during the making of this post.
I learned that
> smoking (387) causes cancer.
I was also surprised to learn that
> girls and women (11) cause most cases of cervical cancer
This is a great resource if you need to cite a reference for a Wikipedia article.
Who is at Area 51
aliens (3), Carter (2), Colonel Sanders (2), Hi Group (2) is at Area 51
Who bombed WTC
Al Qaeda (5), Bush (5), Clinton (2), 4 more... bombed the WTC
Who built the pyramids (example on site):
Egyptians (298), aliens (73), Pharaohs (40), 77 more... built the pyramids
What contains antioxidants (example on site):
Coffee (17), Recent scientific research (15), food (6), 5 more... contain significant amounts of antioxidants
-- man, I gotta get me some more recent scientific research.
Custom electronics and digital signage for your business: www.evcircuits.com
That is how Wikipedia was meant to be. A group of statements about subjects, all of which can be referenced to some original source. So that people can look up something quickly and then look at the sources for more definite information....
Seeing how many people cite Wikipedia directly, use it as the main source for their research and the amount of newspapers that have been reported to directly quote inaccurate facts from Wikipedia... I don't think it is working properly. It requires a lot of optimism to believe "People will use that as a initial source and then verify the information"
That's not wikipedia's failure. Those same people would just be referencing nothing or a web site with zero public review and commenting without it.
"I zero-index my hamsters" - Willtor (147206)
"...that pulls together facts by combing through more than 500 million Web pages."
Correction:
"...that pulls together assertions by combing through more than 500 million Web pages."
Whether those assertions are correct or even reasonable is a completely different issue.
It might be interesting to then take those assertions and have some means to validate or invalidate them, but currently that's going to require meat, not metal.
Now, if you could come up with some form of AI^Walgorithm to do that automatically, then you would have something.
www.eFax.com are spammers