Mining Neologisms from Wikipedia
holy_calamity writes "Natual Language Programming researchers have developed a tool called Zeitgeist that can discover the meaning of new words for itself using Wikipedia. It looks for entries for words not in the WordNet database and works out their meaning by looking for known words linked to them. Development of the tool is focusing on using it to understand what bloggers (using slang and neologisms) are saying about companies' products."
if they pointed it at slashdot...
"ass-hat" and "tard" could take on a whole new meaning
Imagine the chaos and reboots as the program analyzes a George W. Bush speech
Infiltrated dot Net
Figuring out what people on the net says about your products is the "new" thing apparantly. IBM has their own engine for the task too. Kind of makes you wonder how much power the net community will in fact have in day-to-day decision making in the corp head quarters' marketing strategy depts.
George W. Bush
n.
1. 43rd president of the United States.
2. miserable failure.
The trouble is that Wikipedia has a policy of not writing about (or using) Neologisms:
http://en.wikipedia.org/wiki/WP:Neologism
Many articles about neologisms *do* get created in violation of this policy - but they are generally put up for deletion via the Wikipedia process for deleting inappropriate material - so they only exist briefly.
So, for example, the article entitled "Windows Rot" is being debated today, Although it looks like this one will be merged into an existing article, it won't survive as the name of an article - so Zeitgeist presumably won't be able to find it.
It may be that enough of these kinds of articles slip through the system to be useful to Zeitgeist but that is not by design - so coverage will be patchy at best.
A further consequence of this is that the articles that Zeitgeist does find will most likely be so new that only one person will have worked on them - which will make for poor quality.
Also, it is very common for people such as bloggers who come up with what they consider to be clever new words to try to wedge them into common usage by writing about the word in Wikipedia. This 'vanity word' problem is one of the main reasons that Wikipedia seeks to avoid articles on neologisms.
www.sjbaker.org
For example, in french slang, the same person could use the word "batard" as either an insult or a display of respect, and neither of these meaning is related to the target's father.
I wish them good luck...
31g 3r0+her iz wa+ch1ng U!
and started creating its own gazornaplatting words that no-one but the program itself could middlybundy? It could eat up bibblys of disk space as all the new words chimmdudlied in a grawn.
I want a list of atrocities done in your name - Recoil
Sounds like a excellect chance to inject some new perfectly cromulent words into wide use.
-- 3 events that reshaped the world in the 20th century: WW1, WW2, and WWW
Time for step two: deliver a mild electric shock to neologism users. Then I won't have to hear "blogosphere" ever again.
Use the Firehose to mod down Second Life stories!
One of my personal favorites is the word Santorum.
Censorship is obscene. Patriotism is bigotry. Faith is a vice. Slashdot 2.0 sucks.
You do not need a fancy program to do this. I can do it for you, without even reading the blogs in question.
Watch.
They are saying your products suck, and that your customer support is worthless.
See how easy that was? Now, you might be wondering how I know this. Simple. They don't use made up words to say good things about you. I'm not sure why (maybe they aren't worried about being sued for saying good things?), but the pattern is very consistent. If somebody goes to the trouble of writing about you in their blog using made up words, they don't like you or the horse you rode in on.
Likewise, if you are a journalist, they call you funny names (Steno Sue, Laura Dildo, Kneepads Miller, "Dollar a Word" Armstrong, etc.) because they've noticed that you consistently write to favour a certain party, position, politician, company, or lifestyle, even when this requires ignoring a pile of facts the size of Paraguay, any one of which would shred your position.
And if you're a politician, it means that someone noticed that what you say in speeches is so unconnected to what you do with the office you hold that the only link between them is the way in which they combine to mollify your nominal constituents while maximizing the benefit to your corporate sponsors.
If you are an industry association, they are saying they hate you, period, and that you are evil incarnate.
See how easy this is? If you still don't get it, I am willing to come out of retirement as a consultant to explain it to you, provided the price is right.
--MarkusQ