Slashdot Mirror


New Online Dictionaries Automate Away the Linguistic Middleman

An article in The New York Times highlights two growing collections of words online that effectively bypass the traditional dictionary publishing system of slow aggregation and curation. Wordnik is a private venture that has already raised more than $12 million in capital, while the Corpus of Contemporary American English is a project started by Brigham Young professor Mark Davies. These sources differ from both conventional dictionary publishers and crowd-sourced efforts like the excellent Wiktionary for their emphasis on avoiding human intervention rather than fostering it. Says founder Erin McKean in the linked article, 'Language changes every day, and the lexicographer should get out of the way. ... You can type in anything, and we'll show you what data we have.'

10 of 60 comments (clear)

  1. Isn't that called Googling? by hawks5999 · · Score: 3, Insightful

    You can type in anything and we'll show you the data we have sounds a lot like Google search.

    1. Re:Isn't that called Googling? by Samantha+Wright · · Score: 4, Insightful

      Here's the results for 'magic'.

      Gee, it sure looks like they're returning random search engine results next to—oh look, a list of opinions as proferred by so-called "linguistic middlemen."

      I like how the top example for how 'magic' is used in English isn't even purely English, but a bullet point about features in the Zend framework. I'll make a habit of saying "__magic()" in everyday speech more often!

      I think the worst outcome of this is that PHP now somehow has influence on the evolution of a natural language. I do not believe I am alone in feeling terrified by this prospect.

      --
      Bio questions? Ask me to start a Q&A journal. Computer analogies available for most topics!
  2. Re:Wikitionary? by Trepidity · · Score: 4, Informative

    "The Free Dictionary" appears to be just a spammy repackaging of Wikipedia content. Lots of their articles even have a footer saying they're licensed under the GFDL from Wikipedia.

  3. Re:Good idea? by vlm · · Score: 4, Funny

    Though I still cringe when people say they "could care less."

    That begs the question if inappropriate use of "begs the question" is like, worse, like, than like using the word like, like in as the first like word after every like lung inhalation. I think that is a full 360 degree reversal from your suggestion.

    --
    "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
  4. Lexicographers out of the way by Compaqt · · Score: 3, Informative

    Obviously, I'd suppose you still needed a few lexicographers to come up with the system.

    And to maintain it, right?

    The problem seems to be when you've put 95% of lexicographers out of a job, who's going to train the next bunch, and will it be cost-effective at a university level to have a graduate program in such for 1 or 2 individuals?

    --
    I'm not a lawyer, but I play one on the Internet. Blog
    1. Re:Lexicographers out of the way by VortexCortex · · Score: 4, Funny

      Obviously, I'd suppose you still needed a few lexicographers to come up with the system.

      And to maintain it, right?

      The problem seems to be when you've put 95% of lexicographers out of a job, who's going to train the next bunch, and will it be cost-effective at a university level to have a graduate program in such for 1 or 2 individuals?

      Syntax error on line(s): 1 thru 1
      Ambiguous contraction in "I'd".

      Syntax error on line(s): 1 thru 1
      Mixed tense in "still needed".
      Note: Root word "need" satisfies the expression.

      Syntax error on line(s): 3 thru 3
      Incomplete sentence.

      Syntax error on line(s): 5 thru 5
      Expected colon after "be" in "to be when".

      Syntax error on line(s): 5 thru 5
      Expected capitalization of "when" in "to be when".

      Syntax error on line(s): 5 thru 5
      Extraneous comma.
      Note: This message is generated only once for multiple errors.

      Point taken: Screw the Lexicographers!

  5. Re:Good idea? by bigstrat2003 · · Score: 3, Interesting

    This post proves that there should be a "made my brain explode" moderation option.

    --
    "16MB (fuck off, MiB fascists)" - The Mighty Buzzard
  6. Wordnik is a dictionary aggregator by NaCh0 · · Score: 3, Funny

    I wonder what kind of sales pitch it takes to get $12 million for a free web dictionary.

    'Just imagine if we could provide 100 definitions from other people for the word "butt", how much is that worth to you?'

  7. Telivision by aembleton · · Score: 4, Insightful

    It doesn't detect that telivision is an incorrect spelling because there are so many authoritative examples of that spelling: http://www.wordnik.com/words/telivision

    Google seems to do a good job of detecting spelling errors and automatically updating it's dictionary and of course it also shows you websites where that word is used. I don't really see what Wordnik provides.

  8. Re:Good idea? by Samantha+Wright · · Score: 5, Insightful

    Oh, that's purely typographical. When moving blocks of metal type around, a full-stop/period or comma is more delicate than a quotation mark, since it's only x-height and not capital letter height. Typographers got in the habit of putting them on the inside to keep them safe. That's also why certain ligatures of f and the long s were preserved from scribal writing: those letters were designed to hook over others, and if the next letter was tall then it would create a structural instability (an x-height hole.) If modern punctuation had evolved before the invention of moveable type, we would probably put the quotation mark directly above the other punctuation mark, and use logical punctuation for ? and !. However, it didn't, so it was all put inside to stay consistent.

    To be honest, I find it visually more pleasant. After looking at code that passes strings around as arguments in C-style imperative languages all day, it's nice to see something without a big gap on the baseline (this "is," an "example", for you.) Since the quotation mark is already floating up and away from the letters, it's less jarring to see it separated from the word than a comma or period. (This is more or less the modern aesthetic justification for keeping it the traditional way. However, modern typographers don't always agree with traditionalists: watch what happens when you point out that the "single" space used to separate sentences prior to the invention of the typewriter was actually larger than a standard double space.)

    --
    Bio questions? Ask me to start a Q&A journal. Computer analogies available for most topics!