Slashdot Mirror


Mining Neologisms from Wikipedia

holy_calamity writes "Natual Language Programming researchers have developed a tool called Zeitgeist that can discover the meaning of new words for itself using Wikipedia. It looks for entries for words not in the WordNet database and works out their meaning by looking for known words linked to them. Development of the tool is focusing on using it to understand what bloggers (using slang and neologisms) are saying about companies' products."

11 of 93 comments (clear)

  1. Id love to see what it came up with... by mdhoover · · Score: 5, Funny

    if they pointed it at slashdot...
    "ass-hat" and "tard" could take on a whole new meaning

    1. Re:Id love to see what it came up with... by Harmonious+Botch · · Score: 4, Funny

      You're assuming that it would be spelled correctly.

  2. Just imagine... by packetmon · · Score: 3, Funny

    Imagine the chaos and reboots as the program analyzes a George W. Bush speech

  3. Marketing research on the net by perkr · · Score: 4, Insightful

    Figuring out what people on the net says about your products is the "new" thing apparantly. IBM has their own engine for the task too. Kind of makes you wonder how much power the net community will in fact have in day-to-day decision making in the corp head quarters' marketing strategy depts.

  4. say hello to dictionary bombing by brunascle · · Score: 4, Funny

    George W. Bush
    n.
    1. 43rd president of the United States.
    2. miserable failure.

  5. But Wikipedia seeks to avoid Neologisms! by sbaker · · Score: 4, Informative

    The trouble is that Wikipedia has a policy of not writing about (or using) Neologisms:

        http://en.wikipedia.org/wiki/WP:Neologism

    Many articles about neologisms *do* get created in violation of this policy - but they are generally put up for deletion via the Wikipedia process for deleting inappropriate material - so they only exist briefly.

    So, for example, the article entitled "Windows Rot" is being debated today, Although it looks like this one will be merged into an existing article, it won't survive as the name of an article - so Zeitgeist presumably won't be able to find it.

    It may be that enough of these kinds of articles slip through the system to be useful to Zeitgeist but that is not by design - so coverage will be patchy at best.

    A further consequence of this is that the articles that Zeitgeist does find will most likely be so new that only one person will have worked on them - which will make for poor quality.

    Also, it is very common for people such as bloggers who come up with what they consider to be clever new words to try to wedge them into common usage by writing about the word in Wikipedia. This 'vanity word' problem is one of the main reasons that Wikipedia seeks to avoid articles on neologisms.

    --
    www.sjbaker.org
  6. What if it went in to a loop by clickclickdrone · · Score: 5, Funny

    and started creating its own gazornaplatting words that no-one but the program itself could middlybundy? It could eat up bibblys of disk space as all the new words chimmdudlied in a grawn.

    --
    I want a list of atrocities done in your name - Recoil
  7. Step One is Complete by Hoplite3 · · Score: 4, Funny

    Time for step two: deliver a mild electric shock to neologism users. Then I won't have to hear "blogosphere" ever again.

    --
    Use the Firehose to mod down Second Life stories!
  8. Santorum! by mr_stinky_britches · · Score: 4, Funny

    One of my personal favorites is the word Santorum.

    --
    Censorship is obscene. Patriotism is bigotry. Faith is a vice. Slashdot 2.0 sucks.
  9. Hello? by MarkusQ · · Score: 4, Interesting
    Development of the tool is focusing on using it to understand what bloggers (using slang and neologisms) are saying about companies' products."

    You do not need a fancy program to do this. I can do it for you, without even reading the blogs in question.

    Watch.

    They are saying your products suck, and that your customer support is worthless.

    See how easy that was? Now, you might be wondering how I know this. Simple. They don't use made up words to say good things about you. I'm not sure why (maybe they aren't worried about being sued for saying good things?), but the pattern is very consistent. If somebody goes to the trouble of writing about you in their blog using made up words, they don't like you or the horse you rode in on.

    Likewise, if you are a journalist, they call you funny names (Steno Sue, Laura Dildo, Kneepads Miller, "Dollar a Word" Armstrong, etc.) because they've noticed that you consistently write to favour a certain party, position, politician, company, or lifestyle, even when this requires ignoring a pile of facts the size of Paraguay, any one of which would shred your position.

    And if you're a politician, it means that someone noticed that what you say in speeches is so unconnected to what you do with the office you hold that the only link between them is the way in which they combine to mollify your nominal constituents while maximizing the benefit to your corporate sponsors.

    If you are an industry association, they are saying they hate you, period, and that you are evil incarnate.

    See how easy this is? If you still don't get it, I am willing to come out of retirement as a consultant to explain it to you, provided the price is right.

    --MarkusQ

    1. Re:Hello? by Anonymous Coward · · Score: 3, Funny

      Mod parent doubleplusspiffy.