Tracking the Congressional Attention Span
Turismo writes "Ars Technica covers a new research project that uses computers to look at 70 million words from the Congressional Record. The project's goal was to track what our representatives were talking about at any given time, and researchers were able to do it without human training or intervention. From the article: '...researchers found, for instance, that "judicial nominations" have consumed steadily more Congressional attention between 1997 and 2004. In fact, the topic produced the most number of words published in a single "day" of the Congressional Record: 230,000 on November 12, 2003.' It looks like automated topic analysis has truly arrived."
is this a double dupe by Ars AND /. ?
7 ( http://arstechnica.com/news.ars/post/20060802-7408 .html )
http://slashdot.org/article.pl?sid=06/08/02/22122
from the current one:
While text mining 330,000 New York Times articles poses an interesting challenge, it's not as interesting as sifting through 70 million words (from over 70,000 unique documents) found in the Congressional Record. A team of political science researchers has done just that (PDF), and found that their software was able to answer questions too difficult for humans to handle on their own.
from the one posted yesterday:
The discipline of text mining took a step forward recently as a team from the University of California-Irvine used a new technique called "topic modeling" to sift 330,000 articles from the New York Times archive (hardcore geeks can read one of the team's papers [PDF] for more information). The team's goal was to have their computers sort the stories by topic--without requiring any human training or intervention. Computers have trouble understanding large fields of unstructured text without guidance, but the new approach enables them to engage in some unsupervised learning that could soon pay huge dividends for academics, corporations, and government security programs alike.