Text Mining the New York Times

← Back to Stories (view on slashdot.org)

Text Mining the New York Times

Posted by ryuzaki0 on Friday July 28, 2006 @10:29PM from the good-place-to-mine dept.

Roland Piquepaille writes "Text mining is a computer technique to extract useful information from unstructured text. And it's a difficult task. But now, using a relatively new method named topic modeling, computer scientists from University of California, Irvine (UCI), have analyzed 330,000 stories published by the New York Times between 2000 and 2002 in just a few hours. They were able to automatically isolate topics such as the Tour de France, prices of apartments in Brooklyn or dinosaur bones. This technique could soon be used not only by homeland security experts or librarians, but also by physicians, lawyers, real estate people, and even by yourself. Read more for additional details and a graph showing how the researchers discovered links between topics and people."

6 of 104 comments (clear)

Min score:

Reason:

Sort:

Homeland security by Anonymous Coward · 2006-07-28 22:42 · Score: 4, Insightful

For every time homeland security is mentioned as benefitting of a new technology, you should get a swift kick to the nuts. Goddam, there is more than just terrorism in this world.
1. Re:Homeland security by mrogers · 2006-07-29 00:21 · Score: 2, Insightful
  
  But the pretty graph clearly shows that some guy called MOHAMMED is the missing link between Religion and Terrorism - without this new technology, homeland security experts might have been kept in the dark about that.
  The graph also shows links betwen US_Military and AL_QAEDA, and between ARIEL_SHARON and Mid_East_Conflict. If only they'd had this technology when they were trying to justify the invasion of Iraq.
  "Look, Saddam Hussein has links to Al Qaeda! You can see it on the graph!"
  "Uh, Mister Vice-President, this graph is based on press conferences in which you repeatedly mentioned Saddam Hussein and Al Qaeda in the same breath. It may not have any statistical value."
  "Shut up and bring me my war britches, dimwit, the computer never lies!"
2. Re:Homeland security by 1u3hr · 2006-07-29 00:54 · Score: 2, Insightful
  
  The compulsory "Homeland Security" link makes me think of the story about a drunk who was crawling about on the sidewalk under a lamppost late one night. A Police Officer came up to him and inquired, "What are you doing?"
  The drunk replied, "I'm looking for my car keys."
  The Officer looked around in the lamplight, then asked the drunk, "I don't see any car keys. Are you sure you lost them here?"
  The drunk replied, "No, I lost them over there", and pointed to an area of the sidewalk deep in shadow.
  The policeman then asked, "Well, if you lost them over there, why are you looking over here?"
  The drunk looked at him and said, "Because the light is better over here."
  Searching for terrorists by datamining from the comfort of your cubicle is about as likely to be successful.
Funny by vllbs · 2006-07-28 22:48 · Score: 1, Insightful

A relative new method? A difficult task? Sorry, but these are almost laughable, even for a poor spaniard like me.
You mean clusty.com? by SirStanley · 2006-07-28 23:21 · Score: 3, Insightful

You mean they can group data by topic? Like clusty.com does when you search?

I just read the stub of the article... because it seemed like it does exactly what clusty does and I don't care to read anymore.

--
--------========+++Dont Feed The Lab Techs+++========--------
They're late to the game. by alcohollins · 2006-07-29 00:09 · Score: 3, Insightful

Not revolutionary. In fact, they're late.

Google AdSense network has done this for years to serve contextually-relevant text ads across thousands of websites. Yahoo now, too.