Slashdot Mirror


Distributed Translation Project

moon unit beta writes "New Scientist has this story about a new plan to build a multi-language translation database called the World Wide Lexicon, using a distributed community of volunteers. The designer compares it to a distributed computing project and believes it could make it easier to translate more obscure languages."

14 of 216 comments (clear)

  1. Universal Translator by lxmeister · · Score: 3, Funny

    The Universal Translator is finally here! But will they ever release it in fish form?

  2. i wonder by runtimeerror7 · · Score: 3, Insightful

    "This will automatically detect when the computer user is less busy and ask them to translate a word or phrase."

    i wonder how its gonna detect when the user is not busy. this software can never be installed on something like my home computer where i leave my DSL on to make it work on SETI.

  3. How is this sustainable? by food-n-bev · · Score: 3, Insightful
    ...believes it could provide a free way to translate the many languages not included in existing online translators...

    What's in it for the volunteers? Seems that novelty might bring experts in to volunteer short term, but when businesses, academics, etc. begin using the service in volume, it really will cry out for commercialization. The volunteers won't stick around performing translations gratis forever. At some point you have to pay them per translation or provide some other compensation (perhaps a /. like karma system?)

    The related bigger question will be whether this model ultimately proves to deliver quality translations at a lower cost than a traditional translation service. I don't see how this could happen if you have to still have a language expert look at the full translation as a whole to ensure that contextual subtleties are not lost.

  4. Deterioration of the whole language by Liora · · Score: 3, Insightful

    Great! Now we'll have Engrish resulting not just terrible Japanese->English translation, but all kinds of other languages too. Eventually the web will be so filled with bad grammar that the next generation will have no idea how to string a simple sentence together. Looks like we will have to start compiling our correspondance after all... for coherence.

    --
    Liora
  5. very cool.. but only for hobby use by soap.xml · · Score: 5, Insightful

    [snip]"One of the main problems is quality assurance," says Ramesh Krishnamurthy, a linguistics expert at the University of Wolverhampton, in the UK. "Translation is a highly developed skill." [snip] But Paul Rayson, a research fellow at Lancaster University, adds that unskilled translators may confuse the meaning of individual words. "The problem is you generally need the context to get a good translation," he says.[snip]

    This looks like it will be a very cool project, but for corporate/buisiness use I don't think it would ever fly.

    If you have ever played in the area of i18n then you will quickly understand why this pbly won't work perfectly. There are so many caveats to each language, tone, context etc... This might be a useful starting point for transaltion services, but for the final cut, it would still need to be checked and double checked by a translation service.

    I still think its very cool though ;)

    -ryan
  6. It's not going to work... by carm$y$ · · Score: 3, Insightful

    It's a matter of days until someone will request a log of people connecting to the server during work-hours... Here is the beauty of the seti@home client: computers can have spare cycles, people don't.

    --
    -- No sig today
  7. This must be the smartest software ever by Control+Group · · Score: 4, Interesting

    If it's going to detect when I'm "less busy." Is this going to pop up a window in my face every time I spend more than a couple minutes mentally composing prose or code? The potential for user annoyance here seems incredibly high to me...

    Distributed computing is an elegant and efficient use of otherwise untapped resources--cycles that are literally "going to waste" (in one sense). By hitting up the users, though, you're attempting to use a resource that is anything but untapped: that user's time. It might work, but let's not bill this as anything other than what it is--asking for volunteer work from people.

    Which isn't really that new an idea.

    --

    Reality has a conservative bias: it conserves mass, energy, momentum...
  8. Could work, but.... by ThinkingGuy · · Score: 4, Insightful

    One of the big issues with translating between human languages is context. While many words have more or less direct equivilants in other languages ("dog"(en) "perro"(es)), you're always going to run into slang, cultural references, and especially, jargon, where the particular usage will not be in a standard dictionary, and only by the context can the actual meaning be inferred (Example: the word "anchor" in the context of sailing versus the context of webpage design).
    Not that this can't be overcome with the distributed model the article discusses, but I still think it will be a while before we see computer translation that doesn't require at least some degree of human assistance.

  9. Some basic information omitted in NS article by brianmsf · · Score: 5, Informative

    Hello,

    I am the lead developer working on the WWL project. There are actually two components to this project. Overall, the NS article did a good job of explaining it, but it was based on a phone interview so some material got lost in translation, no pun intended.

    There are two components to the project.

    1. One is a simple SOAP based protocol (WWLP) that will be published soon, in early May. This protocol creates a standard set of methods for discovering and communicating with existing dictionary and semantic network servers (of which there are many).

    Think of this as GNUtella for dictionaries. A WWLP aware program starts up, invokes a SOAP method to a supernode to locate Russian-Spanish dictionaries. Then, it contacts one or more of these dictionaries to search for words, synonyms, etc.

    The basic goal is to standardize the client/server interface for dictionaries. They all provide the same basic services, but have slightly different front ends. So just doing this will make it easy to incorporate dictionary functions into many types of apps (and also make existing dictionaries more visible to internet users).

    The idea is similar to an older TCP based protocol called DICT, except that it is easy to implement in high level languages, SOAP aware scripting languages, etc. It also provides a discovery mechanism so you can automate the process of finding an Urdu-English dictionary for example.

    2. The distributed computing (or distributed human computing) project. The NS article mainly focused on this. The idea here is to enlist a large number of internet users to help build and maintain a dictionary (which will also be visible through the WWLP interface).

    The goal here is to create a mechanism for collecting definitions and translations for words and phrases in less common language pairs (as well as for slang terms that are not covered by most formal dictionaries).

    ....

    The goal in both cases is to make it easy to find and use dictionary services throughout the web, and create an incentive for people to build their own dictionaries. This is NOT a translation system, although it can be incorporated into translation software (for example, to extend the number of words covered).

    Thanks for your time.

    Brian McConnell

    PS - if you want more information, check out www.worldwidelexicon.org

    1. Re:Some basic information omitted in NS article by dvdeug · · Score: 3, Informative

      I've looked at DICT previously. Too bad it's defunct.

      Why do you think it's defunct? The dict protocol works fine, and there are many dictionaries out there for it. dict.org is up and working, if not terribly well maintained. Debian has many packages, mostly named dict-*, that are dictionaries for dict, including a full English dictionary, the Jargon file, a Biblical dictionary and a Russian dictionary. www.freedict.de has a wide variety of bilingual dictionaries for dict.

  10. If you actually want to sign up by prizzznecious · · Score: 4, Informative

    then you should go to their site, which was completely unmentioned in the article: wwl page

    --

    visit the hwky website for a lyrical genius infusion.
  11. HOW to GET really BAD translations by maggard · · Score: 4, Insightful
    First off I'm going to guess that 90% of the folks who will be posting gung-ho comments on this will be unilingual Americans. The folks posting against it will be those who're bilingual and ever read the "same" document in both languages.

    It doesn't work. If translating were so simple for machines to do they'd be doing a fine job. However good translation requires context, insight, emotional inflection, etc. Even then each and every one ends up different; sometimes subtly sometimes blatantly.

    Just as machine translation sux at these so will distributed translation. Reading a paragraph or a page doesn't tell enough about the feel, flow, or tone of a document. There are numerous words and phrases that can be interpreted multiple ways between any two languages and will be, each time differently by each interpreter.

    If you don't know this already then go and look up any document (books and short stories are easy to find, so is poetry) that has been translated more then once. Take a look at the different translations and ask yourself - "Are these really from the same source document?"

    Now imagine trying to read something composed of alternating paragraphs or pages from each translation: Incoherence.

    Distributed problem solving works for subjects with clearly defined data sets, methodologies, and standards; not human language.

    --
    I don't read ACs: If a post isn't worth so much as a nom de plume to its author then I wont bother either.
  12. Re:Let's get started right now by susano_otter · · Score: 3, Informative
    Do you mean the verb "to fuck", or the multipurpose expletive "fuck"?

    In Portuguese, the translation of the first would be "foder", while the second might be "c'os pariu" (but I'm not up on current slang, so that may be outdated).

    NOTE: The multipurpose expletive in Portuguese would be a totally different cognate from the English version.

    --

    Any sufficiently well-organized community is indistinguishable from Government.

  13. www.logos.it by MS · · Score: 3, Interesting
    Something related was already done about 6 years ago by Logos. It's not a network like Seti@Home, but it involves lots of people distributed all over the world. It still works - check it out!

    ms