Slashdot Mirror


Web Log 'Word Bursts' Could Identify New Crazes

Zorgatron writes "New Scientist reports that a researcher from Cornell University has come up with clever method of identifying what's cool by automatically searching weblogs. Sudden increases or "bursts" in the usage of particular words may reflect a new craze, according to Jon Kleinberg. He has demonstrated the technique by searching through state of the union addresses given since 1790." I wonder how long before this can be done real time enough to really make this useful.

23 of 239 comments (clear)

  1. Google? by irc.goatse.cx+troll · · Score: 4, Insightful

    Could this be what Google wants with Blogger?
    They have the capacity to do this, I don't see why they wouldnt.

    --
    Pain lasts, kid. Its how you know you're alive. Sometimes I think this growing up thing is just pain management-TheMaxx
    1. Re:Google? by tmark · · Score: 3, Insightful

      Except that Google already has the de-facto capability of rapidly searching as many weblogs as they care to. Sure, it takes long to spider them across the web, but it takes a long time to spider damn well near every single page in the world.

      As for how long it will be before we can do this in "real time", this all depends on what your definition of "real time" is. If you're happy with doing a few thousand blogs and getting results back in a few minutes, since at most only a few pages change on the aveage blog a day, I'd say any decent Perl guy could do that for you now.

  2. "What's cool"? by ites · · Score: 4, Insightful

    By my definition "cool" is that which most people have not yet discovered. Example: that... ah, but I'm not going to tell you. Perhaps this method can tell you what just became cool, but it's hard to track something that is by definition under the radar. Otherwise, just track Google searches. You'll soon see what's popular.

    --
    Sig for sale or rent. One previous user. Inquire within.
    1. Re:"What's cool"? by Anonymous Coward · · Score: 1, Insightful

      Actually, for most Americans, cool is not
      what they "discover" but what television
      and the media tell's them is cool.

    2. Re:"What's cool"? by deanc · · Score: 4, Insightful

      That's what the researchers seem to track. Not the commonality of a phrase, but the "burstiness" of a certain word or phrase... ie, the delta of the word use over time. High delta values indicate something is starting to take off, though it may not yet have become popular or mainstream. That's a decent metric of "coolness."

    3. Re:"What's cool"? by tekunokurato · · Score: 2, Insightful

      But Marketable is what is becoming cool, and things stay marketable for longer than a few instants. This is not really about cool, it's about marketably cool.

    4. Re:"What's cool"? by Fishstick · · Score: 4, Insightful

      Ever see Merchants of Cool on Frontline?

      A Report on the Creators & Marketers of Popular Culture for Teenagers

      Yeah, that's right. Popular Culture is manufactured -- everything the teenies think is "cool" or "hot" is identified months in advance by a highly sophisticated machine that probes the minds of kids to predict what will be the next trend so that the marketing establishment can gear up to take advantage of the short window where the "thing" is "cool" and can be sold to teens in such a way that they don't even realize what is going on.

      --

      There is much cruelty in the universe, John.
      Yeah, we seem to have the tour map.

  3. Useful? by Longjmp · · Score: 4, Insightful

    I wonder how long before this can be done real time enough to really make this useful.

    Yes, I bet the spammers can't wait until they can use it...

    --
    There are fewer illiterates than people who can't read.
  4. Imagine by jos091 · · Score: 3, Insightful

    Imagine the feedback loop that could develop...

  5. Let the webloggers determine what's cool? Heh. by rubberpaw · · Score: 4, Insightful

    Of course, since there is only a very specific socioeconomic subset of the world population weblogging, what real usefulness does this give us? Honestly, even if you did ranking based on the most popular weblogs, that wouldn't help you very much.

    Furthermore, this thing isn't telling me anything I don't know. So it finds the word "Vietnam" during the Vietnam years. Hooray. I bet it finds the word Iraq today, or the phrase "Bin Ladin" last year.

    Whoopdie-do. I'm impressed :P. Unless this thing actually can find out the things that people are excited about that aren't well-known, it's pretty much just another search tool limited to blogs.

    1. Re:Let the webloggers determine what's cool? Heh. by barnaclebarnes · · Score: 4, Insightful
      Unless this thing actually can find out the things that people are excited about that aren't well-known, it's pretty much just another search tool limited to blogs.


      Thats the whole point. Weblogs are not the mainstream media so he is betting that a new craze (or refresh of an old one) will show up there beofore the mainstream sites get a hold of. Face it, once it has hit CNN it is already past its sell by date.


      Take the whole potato gun thing for instance. if this was appearing on peoples weblogs 6 months ago and an underground following had started then it would pick this up. Could be a perfect time for one of the toy companies to start producing a parent friendly version (Not sure how...but hey!). By the time the craze hits CNN Toys 'R Us is stocked with a version that fires water ballons, only uses compressed air and comes in 10 different plastic colours. Then they would have the advantage before the other companies jump on the bandwagon.


      Of course, since there is only a very specific socioeconomic subset of the world population weblogging, what real usefulness does this give us?


      A lot! Let me see, I have a large group of people who are rich, computer owning, and probably middle /Upper Middle Class all saying they want X. Now who is your target audience again? Not low income, no disposible cash types.

      /b

      --
      [Please type your sig here.]
  6. It's useful *now* by backlonthethird · · Score: 3, Insightful
    I wonder how long before this can be done real time enough to really make this useful.

    Why have to wait until it's realtime? Historical analysis is very useful, and not just to historians. Linguists, anthropologists, social scientists, etc.. Taking such a body of texts is called studying a "corpus," and such studies often yield surprising and interesting results (better than "atomic" showing up in the ocld war). A new method like this would be very useful to nearly every discipline in the humanities I can think of

    Not all geeks are computer geeks. Not all nerds care only about the future.

  7. No Kidding? by BuBu_ · · Score: 2, Insightful

    Did anyone read the article? Amazingly enough this wonderful software with its POWERFUL algorithms proved a true point of "no shit". While running this gem of coding genius, the authors managed to find reoccuring references to "Depression" while scanning texts from the 1930's. Imagine that, finding the word depression from a time period thats been nicknamed "The Great Depression" I would of never linked the word "Depression" with "The Great Depression". Have we really reached the point where we can just do the same shit over and over again and it's magically a new invention?

    MS is bringing out 3 Degrees which is reinventing IRC, this guy is telling us the painfully obvious, and I've been working on this little trick thats gonna really change the way we think of food, get this guys: I take two pieces of bread, a piece of cheese, and a piece of meat and stack it together.. I call this wonderful new life shaping discovery "The meat-and-cheese-on-bread" I really think it's gonna change how we eat!

    1. Re:No Kidding? by Anonymous Coward · · Score: 1, Insightful

      Don't be silly.

      He thought he'd invented a method to show emerging trends, by analysing how a word's frequency of use increases. So, you want to test that method. So you pick something like the great depression, and hey, sure enough, the word "depression" is used more as the great depression starts. It's entirely possible that it might not have been (this is science, you know), in which case you'd think that the method didn't work. But it passes the test, so you can be more confident about using it to spot trends you didn't previously know about.

      It's like when Newton invented his laws of motion and gravity: they predict that the Moon should stay in the sky, and that's important (and divots who say "no shit!" should be pilloried and ignored) because if they said otherwise, they'd be wrong.

  8. Re:Google by XCondE · · Score: 3, Insightful

    I'm eager to see what will come up next with Google's recent entry in weblog world.

    It's just what I thought when someone said " Blogs are like dreams; they're only interesting to the people they belong to".

  9. New article title by Samus · · Score: 2, Insightful

    Should have been entitled "Nerds Find Automatic Method to Enable Them to Talk to Other People." I have this picture in my head of some poor guy who is a social outcast that wants to figure out a way to be able to talk to a girl about things she might be interested in.

    --
    In Republican America phones tap you.
  10. Re:Google by gmuslera · · Score: 3, Insightful

    Is more subtle than that, is not what you are searching for, but it tracks how you (or society) changes it way to express itself based in current trends, news, etc. That can be related or not with what you are currently searching in google.

    In a way, it should track even how languages evolve, how new meanings are given to existing words (i.e. in the past would anyone think that defensive attack were not opposite words? :)

    I wonder if this kind of analysis can be affected by people like me that without proper knowledge of english write in it :)

  11. This is great for customer support! by G.+W.+Bush+Junior · · Score: 2, Insightful

    The approach could also be applied to sifting through other types of information. Identifying word bursts within email messages sent to a company's customer support address might help maintenance staff spot a major new problem.

    I'm sure customer support employees are going to love this idea... This way you can keep up an appearance of actually having read the customer emails, while really just redirecting to /dev/null (through the filter of course).

    --
    "I don't know that Atheists should be considered as citizens, nor should they be considered patriots." -George H.W. Bush
  12. Re:Nukular weapons by tmark · · Score: 3, Insightful

    He found that particular word "bursts" could indeed be linked to important events at the time the speeches were delivered.

    Does anyone else find this painfully obvious ? Certainly you wouldn't expect to hear the word "computer" much in FDR's state of the union addresses; just as you wouldn't expect to hear "icebox" in GWB's addresses.

    The idea isn't as revolutionary as the author makes it out to be. People have been searching for terms in literature and using counts as indices of "importance" for a long time. Just to cite one example, researchers commonly use citation indexes to find out which fields are/were "hot".

  13. Re:Hopefully they don't read slashdot for this by mshiltonj · · Score: 4, Insightful

    they'll think that goatse.cx is now considered cool.

    Which begs the observation: once poeple know the rules that determine what a "word burst" is and when it's happening, then tools will be developed to artificially inflate desired word burts

    Create a few hundred shill accounts across thousands of blogs, then each accounts on each blob will make a couple posts with the pre-determined phrase, and you have a manufactured word burst.

    Like a few years ago, when poeple sold the ability to seed search engines so your site is in the top of the results list based on certain keywords.

    Google makes that harder now, but it's always a contest between those who develop the rules (or algorithm) and those who seek to manipulate the data or the rules of the game.

    A manufactured word burst I can remember from before the 2000 election was 'gravitas'. That word came out of nowhere, and was suddenly all over the media, used to describe a quality that Dubya was lacking. There was a talking points memo somewhere that was very widely distributed -- which is the analog version of what I am describing.

    Look it up.

  14. To paraphrase Stevenson by xenocide2 · · Score: 2, Insightful

    Out of the six billion people on the planet, only 3 percent can afford one. Of those that can afford one, half decide they actually want one. Combine that half with the lonely few in cyber cafes and markets and you have the world's top spenders in one place, perfect for advertisers.

    --
    I Browse at +4 Flamebait

    Open Source Sysadmin

  15. But just think! by jabber01 · · Score: 2, Insightful

    What will future searchers make of Slashdot (and by extension, the net as a whole), what with the waxing and waning in the popularity of Natalie Portman, Hot Grits, Soviet Russia, All your base, gonads and strife, MEEEEEEPT!, and the ever-present FIST PROST.

    This is a significant tool for the post-information age. It could reliable guage the effectiveness of viral marketing. It could also intercept sub-culture developments before they become popular, and introduce them to the general population in association with a corporate brand.

    Imagine if Nike or Pepsi, or *shudder* Microsoft, had caught the "All Your Base" thing on the upswing. They'd have a better slogan than the top down "Dude, you're gettin a Dell".

    --

    The REAL jabber has the user id: 13196
    What you do today will cost you a day of your life

  16. Reminds me of Asimov's Foundation by AntonyBartlett · · Score: 3, Insightful

    once poeple know the rules that determine what a "word burst" is and when it's happening, then tools will be developed to artificially inflate desired word burts

    The Three Theorems of Psychohistorical Quantitivity:

    1. The population under scrutiny is oblivious to the existence of the science of Psychohistory.

    2. The time periods dealt with are in the region of 3 generations.

    3. The population must be in the billions (±75 billions) for a statistical probability to have a psychohistorical validity.