Web Log 'Word Bursts' Could Identify New Crazes
Zorgatron writes "New Scientist reports that a researcher from Cornell University has come up with clever method of identifying what's cool by automatically searching weblogs. Sudden increases or "bursts" in the usage of particular words may reflect a new craze, according to Jon Kleinberg. He has demonstrated the technique by searching through state of the union addresses given since 1790." I wonder how long before this can be done real time enough to really make this useful.
Could this be what Google wants with Blogger?
They have the capacity to do this, I don't see why they wouldnt.
Pain lasts, kid. Its how you know you're alive. Sometimes I think this growing up thing is just pain management-TheMaxx
Theres another "what's popular on blogs" webpage at Blogdex. It tracks links, showing which pages are most linked to.
daed si luap
In a simple historical test of the technique, Kleinberg analysed all the annual State of the Union addresses given by US Presidents since 1790. He found that particular word "bursts" could indeed be linked to important events at the time the speeches were delivered.
Has an important increase of the use of the word "nukular" been reported in the last few weeks then?
Google can do much the same thing, on a real-time basis, by examining what phrases are searched for.
..Now we're going to see Pepsi add's slinging "in soviet russia, you drink pepsi' , and Nike yelling about "all your sports belong to us..."...
I lost my concept of community when my community lost all concept of me.
I can already see the collusion of weblog editors.
"Okay, everyone write about polka dot socks tomorrow. And throw in something about drinking rotten milk. I bet we can start a new fad..."
Scott, Keeper of the Crystal Flame
By my definition "cool" is that which most people have not yet discovered. Example: that... ah, but I'm not going to tell you. Perhaps this method can tell you what just became cool, but it's hard to track something that is by definition under the radar. Otherwise, just track Google searches. You'll soon see what's popular.
Sig for sale or rent. One previous user. Inquire within.
These techniques could easily be expanded to searching weblogs - I imagine the findings could be very interesting for content providers - eg a simple measure of what people want to read about.
Vacancy for signature. Apply within.
"Joe Millionaire winner" and "Bubb Rubb" have generated most of my personal blog's hits.
I, myself, am a distant third.
Write about enough things and then check your referral logs for Google and Yahoo searches (which include the query in the URL), and you get an imperfect idea of what people are interested in this week.
Joe
http://www.joegrossberg.com
I wonder how long before this can be done real time enough to really make this useful.
Yes, I bet the spammers can't wait until they can use it...
There are fewer illiterates than people who can't read.
Imagine the feedback loop that could develop...
And think, the DMCA will become the most popular piece of legislation in existance - at least on slashdot.
And CowboyNeal is the most popular man alive!
That what was all this school was for... to teach us how to solve our own problems. -- janeowit
http://www.daypop.com
Its got the top 40 every day. Doing it some other way would only catch memes sooner. And if the system doesn't catch it until its popular, it really doesn't help. What we need is a large and complete database of all meme type things.
The GeekNights podcast is going strong. Listen!
Whoopeee. The marketers will start using this to identify trends, and next thing you know, we'll have some fast food named "Cheese-Eating Surrender Monkeys."
Not In Our Brand Name, say I.
Of course, since there is only a very specific socioeconomic subset of the world population weblogging, what real usefulness does this give us? Honestly, even if you did ranking based on the most popular weblogs, that wouldn't help you very much.
:P. Unless this thing actually can find out the things that people are excited about that aren't well-known, it's pretty much just another search tool limited to blogs.
Furthermore, this thing isn't telling me anything I don't know. So it finds the word "Vietnam" during the Vietnam years. Hooray. I bet it finds the word Iraq today, or the phrase "Bin Ladin" last year.
Whoopdie-do. I'm impressed
Why have to wait until it's realtime? Historical analysis is very useful, and not just to historians. Linguists, anthropologists, social scientists, etc.. Taking such a body of texts is called studying a "corpus," and such studies often yield surprising and interesting results (better than "atomic" showing up in the ocld war). A new method like this would be very useful to nearly every discipline in the humanities I can think of
Not all geeks are computer geeks. Not all nerds care only about the future.
Seriously, just read /. if you want to know the important stuff of the day. :)
Twice usually.
The ultimate way of watching trends on a month-to-month basis has to be Zeitgeist from Google.
Celebrate Excellence!
Did anyone read the article? Amazingly enough this wonderful software with its POWERFUL algorithms proved a true point of "no shit". While running this gem of coding genius, the authors managed to find reoccuring references to "Depression" while scanning texts from the 1930's. Imagine that, finding the word depression from a time period thats been nicknamed "The Great Depression" I would of never linked the word "Depression" with "The Great Depression". Have we really reached the point where we can just do the same shit over and over again and it's magically a new invention?
MS is bringing out 3 Degrees which is reinventing IRC, this guy is telling us the painfully obvious, and I've been working on this little trick thats gonna really change the way we think of food, get this guys: I take two pieces of bread, a piece of cheese, and a piece of meat and stack it together.. I call this wonderful new life shaping discovery "The meat-and-cheese-on-bread" I really think it's gonna change how we eat!
I can see a nice distributed implementation for burst-searching - a "mod_ephemera" module for apache.
:)
The module would count words/phrases most commonly served (less tags and the top-n most common words in the language-encoding), then serves out the top-10 as HTTP header messages. That way, the results are unobtrusive and easy to recover.
Of course, this approach would inevitably be easy to skew/cheat. Anyway, that's my sixpeneth
...Yahoo, today, was accused of seeding 2.5 million user blogs with keywords designed to influence/fool/skew robots that attempt to identify what's cool by automatically searching weblogs for so called 'word bursts'.
I guess this pretty much lays to rest the article about how nerds don't work to be popular. We automate it!
They have a realtime search mechanism that can search within Chat rooms also , and TV and radios streams. (Kevin Kelly is on the Board). Used to be a downloadable personal edition. there is a free trial. Not a plug !!! , they became a corporate (financial and others) company , turning back on "Free Information Now" roots. but at least it works :)
http://www.relegence.com
Sounds like a combination of Google's Zeitgeist and LiveJournal's MemeTracker. In other words, nothing that new.
It's also the basis for Computational Lexicography. Doing analysis on large corpora. One of the interests people have in this field is introduction of new words in society. The field used to use corpora such as the British National Corpus, but since the explosion of the Web, sites such as Google can far exceed that size. Weblogs are simply a good example of a more natural form of language. The interesting thing would be not so much to find new trends through words... but if we can truly solve the whole natural language parsing problem and use such information to extract higher-level knowledge
The analysis only works if your tool doesn't start modifying the data you are analyzing. If this thing ever caught on, it would quickly become meaningless, because everybody wants to be part of whatever craze is going on. Every morning you check which words are hip, you put them on your website... etc. etc.
You are right about feedback: the buzz would become a terrible din. That said, it is a cool idea.
Congratulations! Now we are the Evil Empire
Should have been entitled "Nerds Find Automatic Method to Enable Them to Talk to Other People." I have this picture in my head of some poor guy who is a social outcast that wants to figure out a way to be able to talk to a girl about things she might be interested in.
In Republican America phones tap you.
The approach could also be applied to sifting through other types of information. Identifying word bursts within email messages sent to a company's customer support address might help maintenance staff spot a major new problem.
/dev/null (through the filter of course).
I'm sure customer support employees are going to love this idea... This way you can keep up an appearance of actually having read the customer emails, while really just redirecting to
"I don't know that Atheists should be considered as citizens, nor should they be considered patriots." -George H.W. Bush
I attended a conference last year, where they proposed a similar method to find trends in scientific fields, and more importantly, link them and predict future connections. For instance, when words from two unrelated fields start showing up associated in many papers, there is possibly a trend for those fields to meet and merge in the near future. Of course Informatics doesn't replace traditional methods, because it needs the input data, but it's a helpful tool.
Oh great. Just what we need. "Well, after careful analysis computer analysis with my powerful algorithms, I have concluded that break-dancing is now cool. I will be the first nerd in history to be atop this new trend."
Our definition of "cool" is the output of a computer analysis of weblogs then sit there wondering why nerds are so unpopular?!?
Is this news considered "new"? This is exactly what Amazon did in order to forecast what book titles would sell the most money. They became the biggest web retailer because of this very same idea -- but many years ago. And now somebody at Cornell copies the idea but uses weblogs instead of IRC and newsgroups and suddenly he's "clever"? I know lots of people are complaining that the information gleamed from this is not useful; but it is! It's an amazing way to forecast what will sell.
Except now popularity will last about 6 hours, tops, before some new wave of pop culture replaces it. By the time craze "X" hits the craze detector, all the really cool people will already be onto craze "Y", which will be detected a few hours later.
It's like the whole "avant-garde/in-style/out-of-style/retro/back-in-s tyle" cycle managed by a Perl script in an infinite loop.
Congratulations! Now we are the Evil Empire
eg:
Dec 10, 1998
Nov 21, 2002
.... one more time why don't you. And I quote,
"For example, identifying word bursts in the hundreds of thousands of personal diaries now on the web could help advertisers quickly spot an emerging craze."
Gonfonit!!! Why does cool new social technology have to be related to ways to help people sell things to Americans! Why is it okay for us to be considered a nation of consumers, otherwise basically useless biological skinsacks?!
I'll just strap my wallet to my chest with duct tape now and write my social security number in huge numbers on the back of my t-shirt for fast credit checks.
they'll think that goatse.cx is now considered cool.
Which begs the observation: once poeple know the rules that determine what a "word burst" is and when it's happening, then tools will be developed to artificially inflate desired word burts
Create a few hundred shill accounts across thousands of blogs, then each accounts on each blob will make a couple posts with the pre-determined phrase, and you have a manufactured word burst.
Like a few years ago, when poeple sold the ability to seed search engines so your site is in the top of the results list based on certain keywords.
Google makes that harder now, but it's always a contest between those who develop the rules (or algorithm) and those who seek to manipulate the data or the rules of the game.
A manufactured word burst I can remember from before the 2000 election was 'gravitas'. That word came out of nowhere, and was suddenly all over the media, used to describe a quality that Dubya was lacking. There was a talking points memo somewhere that was very widely distributed -- which is the analog version of what I am describing.
Look it up.
Software Wars
If Slashdot were used these would be the word that burst:
"Natalie" "Portman" "Soviet" "Russia" "1337" and "Dell"
Just a guy with an opinion
Data from state of the union addresses here.
Out of the six billion people on the planet, only 3 percent can afford one. Of those that can afford one, half decide they actually want one. Combine that half with the lonely few in cyber cafes and markets and you have the world's top spenders in one place, perfect for advertisers.
I Browse at +4 Flamebait
Open Source Sysadmin
What will future searchers make of Slashdot (and by extension, the net as a whole), what with the waxing and waning in the popularity of Natalie Portman, Hot Grits, Soviet Russia, All your base, gonads and strife, MEEEEEEPT!, and the ever-present FIST PROST.
This is a significant tool for the post-information age. It could reliable guage the effectiveness of viral marketing. It could also intercept sub-culture developments before they become popular, and introduce them to the general population in association with a corporate brand.
Imagine if Nike or Pepsi, or *shudder* Microsoft, had caught the "All Your Base" thing on the upswing. They'd have a better slogan than the top down "Dude, you're gettin a Dell".
The REAL jabber has the user id: 13196
What you do today will cost you a day of your life
once poeple know the rules that determine what a "word burst" is and when it's happening, then tools will be developed to artificially inflate desired word burts
The Three Theorems of Psychohistorical Quantitivity:
1. The population under scrutiny is oblivious to the existence of the science of Psychohistory.
2. The time periods dealt with are in the region of 3 generations.
3. The population must be in the billions (±75 billions) for a statistical probability to have a psychohistorical validity.
Found it, after some digging over my lunch hour.
The listening post is an art exhbit that more or less lives. It monitors certain chat rooms, and posts messages from those chat rooms to a wall of small lcd displays.
What a great opportunity for culture jamming! We just need a few thousand webloggers to start using weird words designed to repel "normal" people.
Obviously this could backfire and we could actually start a real trend. So, I propose that the first words we need to put out are ( geek || nerd ) && sexy. (And if you understood that, you must be hot stuff.) I'm willing to take this risk if you are.