Web Log 'Word Bursts' Could Identify New Crazes
Zorgatron writes "New Scientist reports that a researcher from Cornell University has come up with clever method of identifying what's cool by automatically searching weblogs. Sudden increases or "bursts" in the usage of particular words may reflect a new craze, according to Jon Kleinberg. He has demonstrated the technique by searching through state of the union addresses given since 1790." I wonder how long before this can be done real time enough to really make this useful.
Could this be what Google wants with Blogger?
They have the capacity to do this, I don't see why they wouldnt.
Pain lasts, kid. Its how you know you're alive. Sometimes I think this growing up thing is just pain management-TheMaxx
By my definition "cool" is that which most people have not yet discovered. Example: that... ah, but I'm not going to tell you. Perhaps this method can tell you what just became cool, but it's hard to track something that is by definition under the radar. Otherwise, just track Google searches. You'll soon see what's popular.
Sig for sale or rent. One previous user. Inquire within.
I wonder how long before this can be done real time enough to really make this useful.
Yes, I bet the spammers can't wait until they can use it...
There are fewer illiterates than people who can't read.
Imagine the feedback loop that could develop...
Of course, since there is only a very specific socioeconomic subset of the world population weblogging, what real usefulness does this give us? Honestly, even if you did ranking based on the most popular weblogs, that wouldn't help you very much.
:P. Unless this thing actually can find out the things that people are excited about that aren't well-known, it's pretty much just another search tool limited to blogs.
Furthermore, this thing isn't telling me anything I don't know. So it finds the word "Vietnam" during the Vietnam years. Hooray. I bet it finds the word Iraq today, or the phrase "Bin Ladin" last year.
Whoopdie-do. I'm impressed
Why have to wait until it's realtime? Historical analysis is very useful, and not just to historians. Linguists, anthropologists, social scientists, etc.. Taking such a body of texts is called studying a "corpus," and such studies often yield surprising and interesting results (better than "atomic" showing up in the ocld war). A new method like this would be very useful to nearly every discipline in the humanities I can think of
Not all geeks are computer geeks. Not all nerds care only about the future.
I'm eager to see what will come up next with Google's recent entry in weblog world.
It's just what I thought when someone said " Blogs are like dreams; they're only interesting to the people they belong to".
Is more subtle than that, is not what you are searching for, but it tracks how you (or society) changes it way to express itself based in current trends, news, etc. That can be related or not with what you are currently searching in google.
:)
:)
In a way, it should track even how languages evolve, how new meanings are given to existing words (i.e. in the past would anyone think that defensive attack were not opposite words?
I wonder if this kind of analysis can be affected by people like me that without proper knowledge of english write in it
He found that particular word "bursts" could indeed be linked to important events at the time the speeches were delivered.
Does anyone else find this painfully obvious ? Certainly you wouldn't expect to hear the word "computer" much in FDR's state of the union addresses; just as you wouldn't expect to hear "icebox" in GWB's addresses.
The idea isn't as revolutionary as the author makes it out to be. People have been searching for terms in literature and using counts as indices of "importance" for a long time. Just to cite one example, researchers commonly use citation indexes to find out which fields are/were "hot".
they'll think that goatse.cx is now considered cool.
Which begs the observation: once poeple know the rules that determine what a "word burst" is and when it's happening, then tools will be developed to artificially inflate desired word burts
Create a few hundred shill accounts across thousands of blogs, then each accounts on each blob will make a couple posts with the pre-determined phrase, and you have a manufactured word burst.
Like a few years ago, when poeple sold the ability to seed search engines so your site is in the top of the results list based on certain keywords.
Google makes that harder now, but it's always a contest between those who develop the rules (or algorithm) and those who seek to manipulate the data or the rules of the game.
A manufactured word burst I can remember from before the 2000 election was 'gravitas'. That word came out of nowhere, and was suddenly all over the media, used to describe a quality that Dubya was lacking. There was a talking points memo somewhere that was very widely distributed -- which is the analog version of what I am describing.
Look it up.
Software Wars
once poeple know the rules that determine what a "word burst" is and when it's happening, then tools will be developed to artificially inflate desired word burts
The Three Theorems of Psychohistorical Quantitivity:
1. The population under scrutiny is oblivious to the existence of the science of Psychohistory.
2. The time periods dealt with are in the region of 3 generations.
3. The population must be in the billions (±75 billions) for a statistical probability to have a psychohistorical validity.