Text-Mining Technique Intelligently Learns Topics

Posted by ryuzaki0 on Wednesday August 2, 2006 @11:20AM from the sound-of-google-knocking-on-your-door dept.

Grv writes "Researchers at University of California-Irvine have announced a new technique they call 'topic modeling' that can be used to analyze and group massive amounts of text-based information. Unlike typical text indexing, topic modeling attempts to learn what a given section of text is about without clues being fed to it by humans. The researchers used their method to analyze and group 330,000 articles from the New York Times archive. From the article, 'The UCI team managed this by programming their software to find patterns of words which occurred together in New York Times articles published between 2000 and 2002. Once these word patterns were indexed, the software then turned them into topics and was able to construct a map of such topics over time.'"

2 of 84 comments (clear)

Min score:

Reason:

Sort:

Latent Dirichlet Allocation by Anonymous Coward · 2006-08-02 11:33 · Score: 2, Informative

Here's the source code Latent Dirichlet Allocation
Re:Latent Dirichlet Allocation code by FleaPlus · 2006-08-02 12:36 · Score: 3, Informative

While that's certainly LDA code, it's actually from a lab different from the one discussed in the story, and I think they use some slightly different techniques. For topic-modeling code from Mark Steyvers' lab, who produced the paper in question, here's the link:

Matlab Topic Modeling Toolbox