Web Log 'Word Bursts' Could Identify New Crazes
Zorgatron writes "New Scientist reports that a researcher from Cornell University has come up with clever method of identifying what's cool by automatically searching weblogs. Sudden increases or "bursts" in the usage of particular words may reflect a new craze, according to Jon Kleinberg. He has demonstrated the technique by searching through state of the union addresses given since 1790." I wonder how long before this can be done real time enough to really make this useful.
These techniques could easily be expanded to searching weblogs - I imagine the findings could be very interesting for content providers - eg a simple measure of what people want to read about.
Vacancy for signature. Apply within.
"Joe Millionaire winner" and "Bubb Rubb" have generated most of my personal blog's hits.
I, myself, am a distant third.
Write about enough things and then check your referral logs for Google and Yahoo searches (which include the query in the URL), and you get an imperfect idea of what people are interested in this week.
Joe
http://www.joegrossberg.com
The ultimate way of watching trends on a month-to-month basis has to be Zeitgeist from Google.
Celebrate Excellence!
I can see a nice distributed implementation for burst-searching - a "mod_ephemera" module for apache.
:)
The module would count words/phrases most commonly served (less tags and the top-n most common words in the language-encoding), then serves out the top-10 as HTTP header messages. That way, the results are unobtrusive and easy to recover.
Of course, this approach would inevitably be easy to skew/cheat. Anyway, that's my sixpeneth
The analysis only works if your tool doesn't start modifying the data you are analyzing. If this thing ever caught on, it would quickly become meaningless, because everybody wants to be part of whatever craze is going on. Every morning you check which words are hip, you put them on your website... etc. etc.
You are right about feedback: the buzz would become a terrible din. That said, it is a cool idea.
Congratulations! Now we are the Evil Empire
I attended a conference last year, where they proposed a similar method to find trends in scientific fields, and more importantly, link them and predict future connections. For instance, when words from two unrelated fields start showing up associated in many papers, there is possibly a trend for those fields to meet and merge in the near future. Of course Informatics doesn't replace traditional methods, because it needs the input data, but it's a helpful tool.
Our definition of "cool" is the output of a computer analysis of weblogs then sit there wondering why nerds are so unpopular?!?
Is this news considered "new"? This is exactly what Amazon did in order to forecast what book titles would sell the most money. They became the biggest web retailer because of this very same idea -- but many years ago. And now somebody at Cornell copies the idea but uses weblogs instead of IRC and newsgroups and suddenly he's "clever"? I know lots of people are complaining that the information gleamed from this is not useful; but it is! It's an amazing way to forecast what will sell.
eg:
Dec 10, 1998
Nov 21, 2002
.... one more time why don't you. And I quote,
"For example, identifying word bursts in the hundreds of thousands of personal diaries now on the web could help advertisers quickly spot an emerging craze."
Gonfonit!!! Why does cool new social technology have to be related to ways to help people sell things to Americans! Why is it okay for us to be considered a nation of consumers, otherwise basically useless biological skinsacks?!
I'll just strap my wallet to my chest with duct tape now and write my social security number in huge numbers on the back of my t-shirt for fast credit checks.
I can think of two now defunct internet startups that did this like four years ago. One was a financial analysis tool that looked for stock symbols on particular financial chat boards. The other was based on usenet posts.
If I wasn't going senile I would remember their names.
Found it, after some digging over my lunch hour.
The listening post is an art exhbit that more or less lives. It monitors certain chat rooms, and posts messages from those chat rooms to a wall of small lcd displays.