Slashdot Mirror


Text Mining the New York Times

Roland Piquepaille writes "Text mining is a computer technique to extract useful information from unstructured text. And it's a difficult task. But now, using a relatively new method named topic modeling, computer scientists from University of California, Irvine (UCI), have analyzed 330,000 stories published by the New York Times between 2000 and 2002 in just a few hours. They were able to automatically isolate topics such as the Tour de France, prices of apartments in Brooklyn or dinosaur bones. This technique could soon be used not only by homeland security experts or librarians, but also by physicians, lawyers, real estate people, and even by yourself. Read more for additional details and a graph showing how the researchers discovered links between topics and people."

7 of 104 comments (clear)

  1. Go away Roland by Anonymous Coward · · Score: 0, Funny

    Nobody likes you.

  2. Plus some other words by stimpleton · · Score: 4, Funny

    For example, the model generated a list of words that included "rider," "bike," "race," "Lance Armstrong" and "Jan Ullrich."

    From this, researchers were easily able to identify that topic as the Tour de France.


    I imagine "testosterone", "doping", and "supportive mother", would have found the Tour de France topic even faster.

    --

    In post Patriot Act America, the library books scan you.
  3. Mining? by Eudial · · Score: 5, Funny

    "Home atlast after another long day in the salt^H^H^H^Htext mines.

    We lost four more miners today, bless their souls. The foreman kept insisting they'd dig another tunnel between bicycling and Tour de France. They told him it was too dangerous, but no... he never listens. One of these days... They've got us working 20 hour shifts in the abyss that is the text mines, barely pay us enough to afford the rent, I'm telling you, one of these days..."

    --
    GAAH! MY PRINTER IS ON FIRE!!! PUT IT OUT! PUT IT OUT!
  4. Re:Funny by kfg · · Score: 2, Funny

    You'll have to forgive them, these are computer scientists. Until now they have been completely unaware that natural language has grammar, syntax and that even individual words have structure and meaning; despite the complete absence of a metatag blizzard to inform them that [color]red is a [/color].

    KFG

  5. in other news by tompee · · Score: 4, Funny

    Google buys the University of California computer science school

  6. grep? by muftak · · Score: 2, Funny

    Wow, they figured out how to use grep!

  7. Text mining is... by SlashSquatch · · Score: 5, Funny

    ...a load of grep.

    --
    Autonomous Retard -- Is your camp safe? UnsafeCamp.com