Slashdot Mirror


Web Log 'Word Bursts' Could Identify New Crazes

Zorgatron writes "New Scientist reports that a researcher from Cornell University has come up with clever method of identifying what's cool by automatically searching weblogs. Sudden increases or "bursts" in the usage of particular words may reflect a new craze, according to Jon Kleinberg. He has demonstrated the technique by searching through state of the union addresses given since 1790." I wonder how long before this can be done real time enough to really make this useful.

12 of 239 comments (clear)

  1. Applications by benjiboo · · Score: 2, Interesting
    This work has been around for a long time in the data mining literature. For instance, searching the logs of customer service calls to identify common problems etc.

    These techniques could easily be expanded to searching weblogs - I imagine the findings could be very interesting for content providers - eg a simple measure of what people want to read about.

    --
    Vacancy for signature. Apply within.
  2. Apache Logs too by josephgrossberg · · Score: 3, Interesting

    "Joe Millionaire winner" and "Bubb Rubb" have generated most of my personal blog's hits.

    I, myself, am a distant third.

    Write about enough things and then check your referral logs for Google and Yahoo searches (which include the query in the URL), and you get an imperfect idea of what people are interested in this week.

  3. The state of the World from google by MrBlic · · Score: 2, Interesting

    The ultimate way of watching trends on a month-to-month basis has to be Zeitgeist from Google.

    --
    Celebrate Excellence!
  4. A new apache module...? by stroudie · · Score: 2, Interesting

    I can see a nice distributed implementation for burst-searching - a "mod_ephemera" module for apache.

    The module would count words/phrases most commonly served (less tags and the top-n most common words in the language-encoding), then serves out the top-10 as HTTP header messages. That way, the results are unobtrusive and easy to recover.

    Of course, this approach would inevitably be easy to skew/cheat. Anyway, that's my sixpeneth :)

  5. Feedback loop and dotcom crash by skillet-thief · · Score: 4, Interesting
    It is kind of like the stock market craze and the theory that "all the information you need to know about a stock is contained in the market itself" (ie. in the stock's chart). Enough people start believing that theory, and the stocks quit behaving rationally.

    The analysis only works if your tool doesn't start modifying the data you are analyzing. If this thing ever caught on, it would quickly become meaningless, because everybody wants to be part of whatever craze is going on. Every morning you check which words are hip, you put them on your website... etc. etc.

    You are right about feedback: the buzz would become a terrible din. That said, it is a cool idea.

    --

    Congratulations! Now we are the Evil Empire

  6. Interesting use in science research by nfk · · Score: 2, Interesting

    I attended a conference last year, where they proposed a similar method to find trends in scientific fields, and more importantly, link them and predict future connections. For instance, when words from two unrelated fields start showing up associated in many papers, there is possibly a trend for those fields to meet and merge in the near future. Of course Informatics doesn't replace traditional methods, because it needs the input data, but it's a helpful tool.

  7. strange... by dotgod · · Score: 2, Interesting

    Our definition of "cool" is the output of a computer analysis of weblogs then sit there wondering why nerds are so unpopular?!?

  8. Amazon.com by DarylBeattie · · Score: 2, Interesting

    Is this news considered "new"? This is exactly what Amazon did in order to forecast what book titles would sell the most money. They became the biggest web retailer because of this very same idea -- but many years ago. And now somebody at Cornell copies the idea but uses weblogs instead of IRC and newsgroups and suddenly he's "clever"? I know lots of people are complaining that the information gleamed from this is not useful; but it is! It's an amazing way to forecast what will sell.

  9. Prior Art -- The Economist's "Recession Index" by JPMH · · Score: 2, Interesting
    Since the early '90s, the Economist has from time to time published occasional tongue-in-cheek articles about its "Recession Index", a useful leading indicator of the state of the US economy -- namely, the number of times the 'R-word' appears per month in the New York Times and the Washington Post. This appears to correlate strongly with the future state of the economy...

    eg:

    Dec 10, 1998

    Nov 21, 2002

  10. Stamp consumer on my forehead... by tazochai · · Score: 4, Interesting

    .... one more time why don't you. And I quote,

    "For example, identifying word bursts in the hundreds of thousands of personal diaries now on the web could help advertisers quickly spot an emerging craze."

    Gonfonit!!! Why does cool new social technology have to be related to ways to help people sell things to Americans! Why is it okay for us to be considered a nation of consumers, otherwise basically useless biological skinsacks?!

    I'll just strap my wallet to my chest with duct tape now and write my social security number in huge numbers on the back of my t-shirt for fast credit checks.

  11. this is so old hat by Anonymous Coward · · Score: 1, Interesting

    I can think of two now defunct internet startups that did this like four years ago. One was a financial analysis tool that looked for stock symbols on particular financial chat boards. The other was based on usenet posts.

    If I wasn't going senile I would remember their names.

  12. Re:Art Exhibit by ACNeal · · Score: 2, Interesting

    Found it, after some digging over my lunch hour.

    The listening post is an art exhbit that more or less lives. It monitors certain chat rooms, and posts messages from those chat rooms to a wall of small lcd displays.