Slashdot Mirror


Mining Neologisms from Wikipedia

holy_calamity writes "Natual Language Programming researchers have developed a tool called Zeitgeist that can discover the meaning of new words for itself using Wikipedia. It looks for entries for words not in the WordNet database and works out their meaning by looking for known words linked to them. Development of the tool is focusing on using it to understand what bloggers (using slang and neologisms) are saying about companies' products."

3 of 93 comments (clear)

  1. slashdotting (n., neolog.) by ettlz · · Score: 2, Informative
    The Slashdot, Digg, or Fark effect is the term given to the phenomenon of a popular website linking to a smaller site, causing the smaller site to slow down or even temporarily close due to the increased traffic. The name comes from the huge influx of web traffic that often results from sites being mentioned on Slashdot, Digg, or Fark.com, popular user submitted news and information sites. Typically, less robust sites are unable to cope with the huge increase in traffic and become unavailable - either their bandwidth is consumed or their servers fail to cope with the high number of requests.
  2. But Wikipedia seeks to avoid Neologisms! by sbaker · · Score: 4, Informative

    The trouble is that Wikipedia has a policy of not writing about (or using) Neologisms:

        http://en.wikipedia.org/wiki/WP:Neologism

    Many articles about neologisms *do* get created in violation of this policy - but they are generally put up for deletion via the Wikipedia process for deleting inappropriate material - so they only exist briefly.

    So, for example, the article entitled "Windows Rot" is being debated today, Although it looks like this one will be merged into an existing article, it won't survive as the name of an article - so Zeitgeist presumably won't be able to find it.

    It may be that enough of these kinds of articles slip through the system to be useful to Zeitgeist but that is not by design - so coverage will be patchy at best.

    A further consequence of this is that the articles that Zeitgeist does find will most likely be so new that only one person will have worked on them - which will make for poor quality.

    Also, it is very common for people such as bloggers who come up with what they consider to be clever new words to try to wedge them into common usage by writing about the word in Wikipedia. This 'vanity word' problem is one of the main reasons that Wikipedia seeks to avoid articles on neologisms.

    --
    www.sjbaker.org
  3. For slang, it is useles without a context by aadvancedGIR · · Score: 2, Informative

    For example, in french slang, the same person could use the word "batard" as either an insult or a display of respect, and neither of these meaning is related to the target's father.

    I wish them good luck...