Text Mining the New York Times
Roland Piquepaille writes "Text mining is a computer technique to extract useful information from unstructured text. And it's a difficult task. But now, using a relatively new method named topic modeling, computer scientists from University of California, Irvine (UCI), have analyzed 330,000 stories published by the New York Times between 2000 and 2002 in just a few hours. They were able to automatically isolate topics such as the Tour de France, prices of apartments in Brooklyn or dinosaur bones. This technique could soon be used not only by homeland security experts or librarians, but also by physicians, lawyers, real estate people, and even by yourself. Read more for additional details and a graph showing how the researchers discovered links between topics and people."
For every time homeland security is mentioned as benefitting of a new technology, you should get a swift kick to the nuts. Goddam, there is more than just terrorism in this world.
A relative new method? A difficult task? Sorry, but these are almost laughable, even for a poor spaniard like me.
You mean they can group data by topic? Like clusty.com does when you search?
I just read the stub of the article... because it seemed like it does exactly what clusty does and I don't care to read anymore.
--------========+++Dont Feed The Lab Techs+++========--------
Not revolutionary. In fact, they're late.
Google AdSense network has done this for years to serve contextually-relevant text ads across thousands of websites. Yahoo now, too.