Slashdot Mirror


Open-Source Language Translator Opens For Beta

mind21_98 writes "A new machine-translator designed for language translation has offically opened for public testing. GPLTrans is a translator similiar to Babelfish. Pre-alpha testing has shown that it is the most accurate of the major Web-based machine translators. More information can be found here. "

155 comments

  1. /.'ed? by Tarnar · · Score: 1

    Or is my net connection typically slow?

    1. Re:/.'ed? by pb · · Score: 1

      It's pretty dead.

      I can ping it, but port 80 is pretty non-responsive...
      ---
      pb Reply or e-mail rather than vaguely moderate.

      --
      pb Reply or e-mail; don't vaguely moderate.
    2. Re:/.'ed? by Anonymous Coward · · Score: 1

      I swear, the moderation is just getting worse. How can the first post be redundant? Restating what has been said is redundant. Restating the obvious is the other. The article doesn't state the site was /.'ed, so it's not redundant from there. And as the first post, nothing has been said, period. So redundant is just DUMB. Offtopic maybe. God, idiot moderators.

  2. Cool! by pb · · Score: 2

    Hopefully we'll see some better translators, because the current ones suck.

    And maybe we'll be able to add on some custom vocabulary, that would be really nice for computer journals (or chemistry, medicine, whatever...)

    ...at least the article wasn't in German, or something. :)
    ---
    pb Reply or e-mail rather than vaguely moderate.

    --
    pb Reply or e-mail; don't vaguely moderate.
    1. Re:Cool! by mong · · Score: 2

      Woohoo!

      Now I can write back to the Mexicana Chica who works here and explain CLEARLY and CONCISELY, that whilst very attractive and nice, I can't respond to her advances because I am already "with woman".

      The last time I tried, Babelfish somehow made me inform her that I'd love to " kiss her making angry other woman".

      Muy Bien!

      Mong.

      * Paul Madley ...Student, Artist, Techie - Geek *

      --

      *...Slacker, Artist, Techie - Geek *
      Remember: Nothing is Cool.
  3. Premier Stick in ground... by T.Hobbes · · Score: 1

    ... is how babelfish might translate "First Post"

    1. Re:Premier Stick in ground... by ffatTony · · Score: 1

      This was funny. It deserves a little more than a rating of 'flamebait'

    2. Re:Premier Stick in ground... by matthewg · · Score: 2
      "First Post"->English->Italian->English->German->English- >Portuguese->English->French->English:

      "first wave of the pallet of the beginning"

    3. Re:Premier Stick in ground... by derobert · · Score: 1
      Unover the true meaning of words...

      Try it with "slash dot":

      nonpersonal of the opening

      There went my karma...

      --

  4. Swear Words by Greyfox · · Score: 2

    We need a web translator that accurately translates swear words, or that will at least handle "Will you please fondle my buttocks?" correctly. My nipples explode with delight!

    --

    I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

  5. Forward Progress by jstepka · · Score: 1

    It's really good to see that there is work going on to progress these type of programs. What part of the problem with babelfish is that it doesn't quite get the job done. Several of my classmates have tried to cheat when writting a paper in a different language. Someone in Germany said this to me once in response to my translation...

    "I know what you say, but I don't know what you say. You funny American!"

    --
    Justen Stepka
  6. How do you say.. by bmetz · · Score: 2

    "My hovercraft is full of eels" in foreign languages? Now I can find out! :)

    --
    What did you eat today? http://www.atetoday.com/
    1. Re:How do you say.. by ToastyKen · · Score: 1

      Just curious: Is that a reference to something?

    2. Re:How do you say.. by realkiwi · · Score: 1

      Mon aeroglisseur est rempli d'anguilles...

      My bad spelling...

      --
      realkiwi
    3. Re:How do you say.. by _Marvin_ · · Score: 2

      It's from an episode of "Monty Pthon's flying
      circus". It was about a hungarian phrasebook
      translating (if I recall it correctly) a
      hungarian phrase with the meaning "How can I
      get to the train station" to "My hovercraft
      is full of eels" (and other such nonsense).

      --
      "We won't use guns, we won't use bombs, we'll use the one thing we've got more of and that's our minds" - Pulp
    4. Re:How do you say.. by Pig+Hogger · · Score: 1

      A very famous English learning method published in France (that's how you say "french english-learning method") actually starts with "My taylor is rich" and other such useless phrases, to the point that an old cliché of a frenchmen who speaks english is one who can only say "My taylor is rich"...
      -- ----------------------------------------------
      Vive le logiciel... Libre!!!

  7. I was just thinking about this.... by friedo · · Score: 1

    I was actually just thinking about a practical way to interface to some translation software to write a real-time IRC bot to translate conversations as they happen. The only free translation software I knew of to do this was Babelfish, and writing an interface to that would be slow as hell for a real-time app, but this thing might be the answer. :)

    1. Re:I was just thinking about this.... by GoRK · · Score: 2

      I had an ircii (ok well it was BitchX, but probably would have worked on ircii) that worked with babelfish to translate stuff. I will try to dig it up and post a url (or you could e-mail me) It was called gtrans.bx and I dont know if anyone out there kept it up to date.. anyway it wasn't even that bad in realtime since it made a seperate connection for each translation it did.. There was some latency, sure, but it could do 10 or 20 translations at once. I suppose it could have worked on a queue system with http 1.1 to further expedite things but i really didnt get that deep with it. Anyway, it worked like this:

      /mylang
      Sets your default language (put in your startup)

      /de, /en, /es, /pt, /fr, /it
      Translates your typing into the language of choice and funnels it to the current dialog. With all of the translation commands if $mylang is set to a non-english, translation to english is done before translating to another language due to babelfish.

      /mde, /men, /mes, /mpt, /mfr, /mit
      Sends a message to in the specified language

      /flag
      Sets autotranslation of a person or a channel. This was really the coolest command. If some spanish-speaking person came in, you can just /flag juan es and everything you /msg juan was translated to spanish and everything juan said was translated from spanish. Also if you addressed juan in the channel.. eg. said "juan:" it would print in both spanish and your language.

      /trans
      Self explanitory. Output for your eyes only.

      Additionally, there were some new functions that people could use to implement their own fun foreign language commands..

      I have heard there is a babelfish library out there that provides a standard way to interface a program with babelfish. Plus, only one thing has to be updated for all of your babelfish-ized programs to work. With a client like BitchX this would be very easy to simply load and use!

      This GPLTrans thing sounds very exciting and i'd very much like to see about building a new (better) irc script on top of it!

      ~GoRK

      PS. Since the site is slashdotted, could anyone who knows please tell us a little more about it? Can we do the translation on our own hardware or is it central-server based? Can it directly translate between languages where english is neither the source nor the target? Does it provide a standard (e.g. .so loadable lib) interface for other programs to call?

      I would very much like some day to see all of my basic network communications apps (mail client, newsreader, web browser, instant messenger, irc client, etc) have the ability to machine-translate both incoming and outgoing stuff. Everyone seems to be so bent on how "good" the translation is. If a machine can translate something so that I have a basic grasp of what is going on, then the translation has been a monumental success! I would like to machine translation people focus on getting the technology more widespread before they go trying to make their software translate everything perfectly!!

      ~GoRK (again)

    2. Re:I was just thinking about this.... by Binder · · Score: 1

      I am looking forward to the day when you can take speech recognition, speech synthesis, and translation software so that we have realtime translation of audio. Some of the recognition software is getting good enough but I do not think there is any that is opensource.

      What is the state of speech recognition software for linux? Especially continuous recognition.

      Binder

    3. Re:I was just thinking about this.... by JensR · · Score: 1

      I think Origin licensed Babelfish for Ultima Online for this... Interesting idea, it's quite obvious. I'd really like to see it.

  8. skript kiddie by mistabobdobalina · · Score: 1

    okay its gpl'd...they're using linux...i'd think that 'skript kiddie' should definitely be a supported language!

    --
    -- your knees hurt, don't they?
    1. Re:skript kiddie by MonkeyPaw · · Score: 1

      At one point I was going to write a program to decrypt 5kr1pt k1dd13 5p3ak. But,. figured,. no. I don't really care what they have to say,. heh heh

      --
      My studio - www.graylands.ca
    2. Re:skript kiddie by Anonymous Coward · · Score: 0

      But there are so many dialects of script kiddie =)

  9. Machine translators by theSheep · · Score: 3

    While machine translation is very practical, it can also provide entertainment. I remember a story about scientists testing an English-Russian-English translator by translating phrases to Russian and back. Input: "The spirit is willing, but the flesh is weak." Output: "The vodka is good, but the meat is rotten."

    --
    -- The Sheep --
    1. Re:Machine translators by Kris_J · · Score: 1

      Yeah, but the people aren't much better. I've got a book at home that claims; When Pepsi got their slogan "Come Alive with Pepsi" translated into Mandarin Chinese it translated as "Pepsi brings your ancestors back from the grave" - which I actually think is a pretty good slogan...

      The problem is that we're all expecting a "Universal Translator" ah la Star Trek.

    2. Re:Machine translators by Kris_J · · Score: 1

      Ooo, ooo. What about "Out of sight, out of mind" translating to "Blind Idiot"? Think about it. (I think it was an English-to-Japanese real-time voice translator that managed that effort.)

    3. Re:Machine translators by large · · Score: 1

      Actually, I heard "Out of sight, out of mind" translated as "invisible idiot".

    4. Re:Machine translators by Vidar+Hokstad · · Score: 1

      That's an old story about what is said to have happened with one of the first machine translation system back in the 60'ies. Don't know if it's true, or just another urban legend.

    5. Re:Machine translators by adolf · · Score: 1

      I heard it as the same idea, different phrase:

      "Out of sight, out of mind" [English-Russian-English] = "Invisible; insane"

    6. Re:Machine translators by Arjen · · Score: 4
      This is an urban legend. According to MACHINE TRANSLATION: An Introductory Guide:

      The `spirit is willing' story is amusing, and it really is a pity that it is not true. However, like most MT `howlers' it is a fabrication. In fact, for the most part, they were in circulation long before any MT system could have produced them (variants of the `spirit is willing' example can be found in the American press as early as 1956, but sadly, there does not seem to have been an MT system in America which could translate from English into Russian until much more recently --- for sound strategic reasons, work in the USA had concentrated on the translation of Russian into English, not the other way round). Of course, there are real MT howlers. Two of the nicest are the translation of French avocat (`advocate', `lawyer' or `barrister') as avocado, and the translation of Les soldats sont dans le café as The soldiers are in the coffee. However, they are not as easy to find as the reader might think, and they certainly do not show that MT is useless.

      BTW, since this book is no longer available in the stores, the whole contents is placed online. I recommend reading this book to anyone who is interested into the subject of MT. It really is a nice introduction into the subject.

    7. Re:Machine translators by Jonas+�berg · · Score: 2

      There are several examples like that. I don't know the origin, but I think it was a newspaper article quite a few years ago (I have it around here.. somewhere).
      For example, when Nova (the car) was brought to Spain, it didn't sell very well since Nova (no va) translates into "doesn't go". Ford Pinto didn't fare much better; who would drive a car named "small male appendage"? Nike cought on fire (literally!) when an angry mob informed them of that "air" on one of their products was strikingly similar the arabic "Allah". Branif translated it's airline slogan, "Fly in leather" into Spanish as "Fly naked", and the most horrible error was probably some random baby food manufacturer who began selling their product in South Africa. What they didn't think of was that most products in South Africa are labeled with a picture of the food inside the container (due to illiteracy). Their product was of course labeled with a baby, since that was whom the product was supposed for. Imagine the horror -- tinned babies!?

    8. Re:Machine translators by Anonymous Coward · · Score: 0
      > For example, when Nova (the car) was brought to Spain, it didn't sell very well...

      No wonder it did not sell very well, considering that they never attempted to do it in any serious fashion... I think you have Spain confused with Mexico or some other place.

    9. Re:Machine translators by Kris_J · · Score: 1
      Nike cought on fire (literally!) when an angry mob informed them of that "air" on one of their products was strikingly similar the arabic "Allah".
      I actually believe that this one wasn't an innocent mistake, but either an amazing marketing ploy, or a single designer with they own agenda.
  10. I need to try this at work by MonkeyPaw · · Score: 3

    I'll give this a test at the office,. because half the time I don't understand half of what the customers are saying.

    Perhaps I can use it to translate my words to the customer,. so when I say "Ok,. click on My Computer" they don't hear "restart the computer and click on the first icon you see while hitting the esc key and pulling on the power cord".

    --
    My studio - www.graylands.ca
    1. Re:I need to try this at work by nmos · · Score: 1

      The reverse would be a lot harder. How do you deal with :

      "I clicked on the thing but it didn't work so I clicked on the other thing and it gave me some message"

      Would it be smart enough to translate "I didn't do anything" to "I didn't do anything except replace half the software on the system in a lame attempt at fixing it."

  11. AI&Babelfish by T.Hobbes · · Score: 3

    I'm not sure if it has been done yet, but it would be quite helpful if an AI could 'evolve' along with the language (because, as we all know, language changes all the time) based on monitoring of user-editing of the post-process text. For example, if at time 'a' it was programmed to translate 'Cool' to 'Froid' in french, it would (after monitoring the changes made by users) learn to translate 'Cool' to the french equivilent of 'hip'. or something. 'cause, dammit, i can't wait until the AIs take over ;)

    1. Re:AI&Babelfish by reptilian · · Score: 2
      In the context you speak of, in france, they say "cool." Not much tranlsation work there, but it'd be pretty hard for any translator to figure out which context you're talking about. For exmaple, saying "Liquid Nitrogen is cool" is either an understatement or it was from someone who enjoys pouring it over soft solids and shatting then with a hammer, yet it's a perfectly valid statement in either context. But, if you say something more obvious like "Molten Lead is cool" it's pretty easy to assume which version of cool you mean.

      I wonder how current translators solve this problem, or if they even bother. That is, where one word means different things in different contexts, but in another language, there are two different words for it, when the context can be so ambiguous that both contexts can be the same statement.

      Man's unique agony as a species consists in his perpetual conflict between the desire to stand out and the need to blend in.

      --

      72656B636148206C72655020726568746F6E41207473754A

    2. Re:AI&Babelfish by K8Fan · · Score: 2

      Babelfish yields some really funny stuff when English creeps into other lauguages. For instance, the English word "teenager" has crept into German; Babelfish translates it as "tea rodent". Reading this in a movie review, a room full my friends nearly died laughing.

      --
      "How perfectly Goddamn delightful it all is, to be sure" Charles Crumb
    3. Re:AI&Babelfish by ToastyKen · · Score: 1

      But, if you say something more obvious like "Molten Lead is cool" it's pretty easy to assume which version of cool you mean.

      Even then, you couldn't be sure, because it could easily just be sarcasm.

    4. Re:AI&Babelfish by Uncle_Al · · Score: 1
      • Babelfish yields some really funny stuff when English creeps into other lauguages. For instance, the English word "teenager" has crept into German; Babelfish translates it as "tea rodent". Reading this in a movie review, a room full my friends nearly died laughing.

      Well even if this word has crept into the german language(we do use it), this mistake has a different origin. If you take the word "teenager" apart you have "Tee" = Tea and "Nager" (shortform of "Nagetier") = rodent.
    5. Re:AI&Babelfish by lordhades · · Score: 1

      Not just sarcasm, but simple fallacy. Context (and non-sensicality) has to play a major role in refining things like that.

    6. Re:AI&Babelfish by Anonymous Coward · · Score: 0

      you do realize "Molten Lead" is the new Led Zeplin cover band don't you... they're very cool.

    7. Re:AI&Babelfish by Wodin · · Score: 2

      I came across another funny Babelfish translation a while ago while reading a German article that mentioned Microsoft. The translation was pretty good until I came across the phrase "talking moon" in the middle of a sentence. It didn't make any sense at all, so I looked at the original and reallised Babelfish had translated "Redmond" as "talking moon."

      Of course, after I saw this, I remembered from my high school German than "to speak" is "reden" and "moon" is "Mond," so I can understand how Babelfish got confused ;)

      --
      -- Wodin
  12. IRC? Never! by ghoti · · Score: 1

    Don't you think IRC is one of the most difficult translation jobs there is? I mean, with all the abbreviations, misspellings and stuff. And few people use complete sentences at all. You would need an immense amount of knowledge gathered from following the conversation (and several at once!) to be able to get anything useful.

    Sorry, but I don't believe it's possible, even if a perfect translator for normal speech existed.

    --
    EagerEyes.org: Visualization and Visual Communication
    1. Re:IRC? Never! by friedo · · Score: 1

      Right - when I was getting the idea, it wasn't for regular IRC conversations, rather the "lectures" or "meetings" that happen on IRC sometimes which are conducted with a degree of formality. In those, usually people speak in full sentences. The bot (if and when I get around to writing it) would allow someone to "listen" in any language it supported. Obviously, it would be far from perfect; it would be the equivelent of reading a babelfished web page: nowhere near perfect, but you could get the gist of what was going on.

    2. Re:IRC? Never! by Rob+Kaper · · Score: 1
      On the other hand, there is also a benefit that IRC has for translations: nobody would notice incomplete sentences or 'odd' words (words that didn't translate too well). Native talk on IRC is already so full of bullshit that even Babelfish could not make it worse.

      I've used a script in bitchX and I've spoken spanish and never ever got the feeling the other person knew I do not speak spanish at all. In fact, he wanted to visit me when he would come to Madrid.

      IRC is the definition of low quality: thus translators make a lot of sense there!

    3. Re:IRC? Never! by Anonymous Coward · · Score: 0

      Well, I tend to use full sentences on IRC... I've found most others do to, with the exception of a few phonetic abbreviations like 'u' for you, l8r etc... you can just replace these with their full equivilents.

      There have been a few translators on IRC for various clients. Even Infobot's can do translations, and insult people in different languages (and do SlashDot headlines ;)

      Granted, translation isn't likely to be perfect with typo's etc, but at least you can have a hope of understanding what people are saying.

      On the other hand, is there really much of a use for translation over IRC more than "speak xyz or go somewhere else"? I can't see anyone wanting to spend much time on a channel they have to have translated for them...

      Oh, and speaking of IRC...

      uk1.arcnet.vapor.com #linux (replace uk with a tld of your choice ;)

      irc.shagged.org #worms, or any other channel you want to create ;)

      --
      Fweeky - can't remember his passwd...

  13. Make it a standard desktop component! by Anonymous Coward · · Score: 4

    It would be nice if someone were to make a CORBA translation service and add this to one or more of the linux desktops. Then it could be used for email, documentation, irc, coding, etc, not just for the occasional web page. It would also be good if the data at gpltrans was snapshotted regularly and pushed around, ideally so that everyone would have their own copy.

  14. It's the Stamp Collector syndrome by SurfsUp · · Score: 5

    It's common to here the pundits opine that "open source may be good at improving 30-year-old operating systems, but the open-source model just doesn't work when it comes to large scale applications." Various reasons are given, for example: "open source programmers only do what is fun and interesting, and applications aren't interesting". But here we see yet another large-scale application falling to the barbarian hordes.

    Those pundits are wrong: there is no genre of software that the open-source model will never absorb. Simply because the open-source model results in better software, for reasons that are well-known. And no, there is no no software application that is so uninteresting that no volunteer anywhere in the world will touch it. On the contrary: the more an application area remains untouched, the more interesting it becomes to open-source programmers, simply because it's virgin territory.

    This is the "stamp collector" syndrome: when you already have a goodly number of stamps in your collection, adding the missing ones becomes an obsession.

    --
    Life's a bitch but somebody's gotta do it.
    1. Re:It's the Stamp Collector syndrome by anatoli · · Score: 2
      Those pundits are wrong: there is no genre of software that the open-source model will never absorb.
      I have a few F16s in my backyard, and I want their control software to be OpenSource(tm)d, 'coz I won't trust Lockheed Martin. They probably use cookies to track my flight patterns! Let's start coding! NetBSD people will port it to every piece of hardware in exsistence, from F117s to RC helicopters.

      Moderate this down, citizen.
      --

      --
      Industrial space for lease in Flatlandia.
    2. Re:It's the Stamp Collector syndrome by jackmott · · Score: 1

      umm hmmm

      so linux is BETTER than windows?

      I see it as different. More stable, less ram hungry, much harder to use, many missing features.

      these things are improving, sure, but its taking a damn long time too.

      remember not everyone is techie, and the goodness of software cant be evaluated by techies alone.

      My guess is that if open-source translator is written better or not depends not so much on wether its open source, but on how talented the main contributors and/or designer(s) are.

      --
      -I go to Rice, so figure out my email address
    3. Re:It's the Stamp Collector syndrome by Jose · · Score: 1

      I see it as different. More stable, less ram hungry, much harder to use, many missing features.
      "less ram hungry"???
      If you are comparing windows and Linux..then you are probably comparing the desktopedness of linux to windows (running X, wm, a net browser, etc), and Linux is very RAM hungry...netscape/mozilla both devour RAM, so does X itself.
      Now, if you are comparing Linux the server to windows the server, then ya sure Linux doesn't use half as much ram..(assuming that you aren't running X)

      sorry to pick nits.

      PS try out corel Linux...its should be pretty nice for non-techie people.

      --
      The basic sleazeware produced in a drunken fury by a bunch of UCBerkeley grad students was still the core of BIND. --PV
    4. Re:It's the Stamp Collector syndrome by Cironian · · Score: 1

      Well, actually if I could really get that chance (and if the "OS" there wouldnt be hardcoded) I would *love* to toy with that; which only proves the point... :-)

    5. Re:It's the Stamp Collector syndrome by anatoli · · Score: 1
      What I have right now is v0.01-a-pre. Wanna test?

      11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.

      Moderate this down, citizen.
      --

      --
      Industrial space for lease in Flatlandia.
    6. Re:It's the Stamp Collector syndrome by nhowie · · Score: 1

      Any application can be fun and interesting, because of the related challenge to produce it. The reason that larger applications are not as proliferant (sp?) is because they require much more time and skill to produce -- this is where the 'bazaar' phenomenom works perfectly, since the time and skill can be shared to produce something that is greater than the sum of its parts (see the GIMP for a perfect example), but the big challenge is getting the application 'off the ground', i.e. writing the framework and organising CVS, the mailing list, etc.

      Machine translation is in fact a fascinating area of AI, and very difficult to achieve -- hence why this is the first GPL'd application.

      I agree with the stamp collecting analogy, what's the point in trying to rewrite something that's already been done well? Why write yet another window manager, when you can write the first ever GPL'd goat/sponge simulator (this needs to be done, btw - anyone?).
      --

    7. Re:It's the Stamp Collector syndrome by CConkle · · Score: 1

      Maybe even incorporate it into UOX, eh? :)

      You might remember me as 'CM-Gandalf' :) Nice to see you've popped up around here. :)

    8. Re:It's the Stamp Collector syndrome by PurpleBob · · Score: 2

      Okay... consider that Windows (starting from 1.0) has been around longer than Linux (starting from 0.whatever), has been developed as a desktop OS for a MUCH longer time, and has consistently been able to stifle other OSes. Considering what it's working against, Linux as a desktop OS is progressing very quickly. Maybe it hasn't surpassed Windows for the average user yet, but it will.
      --

      --
      Win dain a lotica, en vai tu ri silota
    9. Re:It's the Stamp Collector syndrome by Q*bert · · Score: 2
      I need NetBSD ported to a tank gun (a.k.a. a bazooka) so I can deal with all the obnoxious space-hogging SUVs here in the Silicon Valley. Does anyone want to pool coding resources? I'll set up a project here at http://www.bsdvssuv.org/.

      Vovida, OS VoIP
      Beer recipe: free! #Source
      Cold pints: $2 #Product

  15. User sumbissions by ffatTony · · Score: 2

    Will users be able to add/update/correct translations or modify dictionaries ala the APT bot in #debian on irc.openprojects.net?

    It seems to me the growth would be incredible if users could modify the dictionary (or atleast add suggestions that could later be added by someone with the appropriate power.

  16. Distributed AI...? by eries · · Score: 2

    I wonder if the open-source model for something like this could extend to the program's users as well. The idea would be that, as people used the program, it could learn from their input. Thus, every time someone inputs a new word into their local copy, this information could be replicated at some central repository and made available to other users. In fact, you could even ask the user to categorize, define, and give usage examples for each new thing.

    For that matter, you could even have the users refine the system's grammar.

    How hard would that be to implement? Is it totally far-fetched?

  17. Better Context Analysis by Pingster · · Score: 5

    What most of these language translation programs need is a better understanding of context. I was surprised to find that Altavista's Babelfish utility has very poor analysis of context (possibly none at all). For example, when translating from English to French, "run" always translates to "exécute". For a sentence like
    The computer ran the program.
    you get
    L'ordinateur a exécuté le programme.
    ("The computer executed the program.")
    which is reasonable, but if you translate
    I ran home.
    you get
    J'ai exécuté à la maison.
    ("I executed at the house.")
    which doesn't make any sense. More incredibly, "store" always translates to "mémoire". You would think that, if they were going to force every word to be interpreted in one sense, they would choose the most common meaning. But this choice leads to insanity where
    Tom ran to the store.
    translates to
    Tom a exécuté à la mémoire.
    ("Tom executed to the memory.")

    With knowledge of context, a more advanced system could notice situations in which it was more reasonable for "run" to have a particular meaning. In the last example, "run" is followed by a prepositional phrase indicating a direction, which would imply that the meaning involving physical movement is appropriate, and so on.

    Even more revealing is the fact that the confusion of meaning happens differently for different languages. If you translate

    Tom ran to the store.
    into Spanish, you get the hilarious result:
    Tom se ejecutó al almacén.
    ("Tom executed himself to the warehouse.")
    For translation software that has multiple language targets, i would have expected it to first resolve the meaning of the English sentence into an internal semantic representation before using it to emit Spanish or French. The above would be evidence that the Systran software has no such representation -- or at least that their representation is too weak to indicate the difference between "store" as in "memory" and "store" as in a warehouse.


    -- ?!ng

    1. Re:Better Context Analysis by jackmott · · Score: 1

      there was something in a scientific american or some similar magazine about the intermediate representation idea. Someone has thought that out and actually developed such a representation, dont know if any implementations using it exist or not. definitely the right way to go.

      --
      -I go to Rice, so figure out my email address
    2. Re:Better Context Analysis by PGillingwater · · Score: 1

      There has been some work (apparently) done in the area of an intermediate representation language for machine translation of human languages, by a team of people associated with the United Nations University in Tokyo. They claim to have had a conference on the topic on November 18th, but there's no indication of progress or announcements since then on their Web page.
      --
      Paul Gillingwater

      --
      Paul Gillingwater
      MBA, CISSP, CISM
    3. Re:Better Context Analysis by moore · · Score: 3

      The problime is that most if not all of
      these systomes know nothing about meaning at all.
      All that do is try to match one set of strings to
      a difrent set of strings.
      GPL Trans works by the substuation methoud.

      >from: Mooneer Salem
      >
      > It is a system where words in a phrase that
      > can be substituted are
      > marked by %phrase%
      > For example:
      >
      > English: My name is %phrase1%.
      > Spanish: Me llamo %phrase1%.
      >

      This genreal systome can be extended in to a
      phrase sturcture grammer with pares of rules for
      each language. ex:
      english: S -> NP1 V NP2
      irish: S -> V NP1 NP2

      these rules would modal sentences like:
      english: the cat chased the dog.
      irish: chased the cat the dog.

      All this is oversimplifyed but you get the poin.
      The real problime is that you need to be trained
      as a linguist to understand what the structer of
      many seantences are and even linguestes aruge a
      LOT. The phrase structal aprouch is probly what
      altavista a such do. All thoe I rilly like the
      idea to GPL Trans I do not thik there aproch will
      get them to far; but it will be fun to see what
      thay can do.

    4. Re:Better Context Analysis by pb · · Score: 1

      You're right, this is a big problem, and one which Cyc will hopefully solve. Don't expect *that* to be Open Source anytime soon, it requires a huge amount of tedious work to make something like Cyc, so they're pretty careful about holding onto it...
      ---
      pb Reply or e-mail rather than vaguely moderate.

      --
      pb Reply or e-mail; don't vaguely moderate.
    5. Re:Better Context Analysis by radish · · Score: 1


      Agreed - but that's what makes language difficult - particularly English!

      It gets better - I originally though that the problem was because you were using the american "store" (which can mean "memory") rather than the english "shop" (which is pretty definite!). Unfortunatly ...

      "Tom ran to the shop"
      becomes
      "Tom a exécuté au système"
      which when translated back again becomes
      "Tom carried out with the system".

      Now that I can't explain....

      --

      ---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

    6. Re:Better Context Analysis by ralphclark · · Score: 2

      For Pete's sake, get a spell checker, will you? Spelling is *not* supposed to be made up as you go along. I almost needed Babelfish just to read what you wrote!

      The real problime is that you need to be trained as a linguist to understand what the structer of many seantences are and even linguestes aruge a LOT.

      IMO, linguistics is just as woolly as psychology. That's why they argue; because many of the more subtle assertions about grammar that have been published aren't much more than unsubstantiated opinion.

      The human brain uses grammar up to a point, and then dispenses with it. There is no reason to expect that the grammar that has evolved in every language has to be completely regular. So you can formulate a consistent set of grammatical rules to deal with basic usage, but the more complex things get the more often the rules will be broken.

      The difference between linguistics and zoology or botany is that the latter subjects only attempt to catalogue a finite number of real living species. But when grammatical rules are flexible or disposable, the number of potential structures is almost as limitless as the number of potential utterances (which Chomsky put a number to, I seem to remember).

      In this case, beyond a small core of prescriptive grammar everything else is purely descriptive. To catalogue the resulting infinity of possible verbal blunders and call this zoo a formal grammar is pointless.

      Also, even with simple phrases you can have two different interpretations (and two complete but mutually exclusive superimposed structures) whose meaning cannot be resolved without context.

      Because of all this, a phrase structural approach, or any other rule based method is ultimately doomed. However, insofar as the linguistics community utilises Artifical Intelligence concepts (as in natural language processing studies), they are it appears still dominated by those who swear by symbolic logic.

      I'm inclined to believe that the most effective natural language parsers will always be connectionist rather than rule-based. Connection machines (such as neural nets) can encompass rule-based logic but also have the flexibility to make an "educated guess". Thus they are much more capable of parsing ungrammatical language.

      After all, our brains work the very same way when we speak or listen.

      Consciousness is not what it thinks it is
      Thought exists only as an abstraction

    7. Re:Better Context Analysis by Anonymous Coward · · Score: 0

      Ok this depends on if you're talking strictly translation or if you're talking in general about linguistics. Of course translation is doomed without context but that doesn't mean phrase-structure is bad. Grammar doesn't break down as you seem to imply if it did how could you communicate? there's still an underlying system arranging the elements. Elementary school english teachers don't know anything about real grammar. It's innate and everyone obeys the rules. (The rules vary from person to person and language to language of course) I've seen english grammar written just as formally as C or pascal's I admit the fine points are fuzzy but it's not psychology run amok.

    8. Re:Better Context Analysis by ralphclark · · Score: 2

      You misunderstand me. Of course grammar exists but it is not a complete, consistent logical system like mathematics (is meant to be), it is completely invented, mostly by accident.

      The result is that there are some phrase structures which you want to add to in order to complete the sentence but you can't do it without breaking the rules or generating a sentence of incomprehensible drivel.

      Most people prefer to break the rules than spout drivel, so for complex sentences in the real world, grammar often breaks down.

      BTW, It's obvious that there is an innate potential for grammar in the human brain but I don't agree with Chomsky that we are all born with the same basic grammatical structures hardwired. If you wonder how it is that so many of us end up sharing a similar meta-grammar (to coin a phrase) then you ought to read William H Calvin's book The Cerebral Code (yes, the whole thing is online, thanks Prof!). He shows at the end precisely how neural structures to support basic grammar could form spontaneously to enable thoughts about who did what to whom, and with what. The same structures are probably used to generate the word order when the thought is spoken.

      You may have noticed that the higher apes (principally chimps and gorillas) used in language experiments have demonstrated the ability to form simple grammatical structures too. There were also reputedly some experiments with an African Grey parrot which demonstrated similar ability (but I've not often heard the work cited and don't know how reliable it is).

      PS. If you like Calvin's book, his latest one Lingua ex Machina is all about the evolutionary development of language. Like all of his books this one's online too.


      Consciousness is not what it thinks it is
      Thought exists only as an abstraction

  18. Someone doing something right! by m0e · · Score: 1

    Finally, a project that has been needing to come around. A translator that's fast AND accurate. Best of all, it lets you correct phrases! Babelfish better stick around though.. i always get a kick out of doing things like translating

    'I like to soak my feet in gallons of whipped vanilla pudding'

    and having it finally come out as

    'I appreciate to impregnate my feet in the gallons of the pudding that I have exposed to the flash of the vaniglia.'

  19. Translation methods by Y · · Score: 4

    Although the site has been slashdotted, it would be interesting to see what sort of algorithms it uses to perform the translations. Mmm, open source.

    I would be inclined to say that if it is based on grammar rules, the project won't make much headway - machine translation has been butting its head against this brick wall for forty years. The problem with hard-and-fast grammar rules, e.g.,

    S = NP VP
    NP = Det (Adj)* N
    VP = V (Adv)

    is that they don't account for rapid linguistic change, and people have this nasty habit of twisting grammar to express themselves in new and creative ways. :) In addition to this, it's very difficult to write simple, lucid grammar rules that also count for the myriad exceptions found in language.

    I imagine GPLTrans would probably be using some sort of probability frame of phrases and words occurring together, but one can't be sure without looking at the source. I think the best way to do translation software would be to convert the text into syntax, then into a more abstract semantic form, and from the semantic form, translate back into the target language's syntax, and then into the target language's text. Of course, the trick is to figure out just exactly how to do this. :) The parsing itself is a hefty (and not terribly exciting) task. I attempted to make a term project of a fairly basic English parser and ended up changing the project.

    My 2 cents/Pfennig/lire/pesos,
    Y

    --
    "There is no culture in computer science, only cults." - M. Felleisen
    1. Re:Translation methods by shub · · Score: 1

      I had a senior-level research project that I did on the subject of comparing a variety of language parsing systems. It was supposed to be a straight comparison of augmented transition node networks (ATNs) as compared to something else (I can't remember what).

      However, my conclusion was that each method (and there are more than two) had both its strengths and weaknesses, and no one of them was "better" than any other in general.


      I then went on to propose that the best solution would be to have a "blackboard" system, whereby you allow each parsing methodology to do what it does best and you don't try to twist each of them to handle everything, and they each contribute their own part to the mapping and parsing of the input.

      The result being that you can have multiple feedback loops, and the total output should be better than the sum of individual outputs of the various subsystems.


      It wasn't exactly the paper that had originally been envisioned, and my adviser only gave me a "B" for it. I wish I had a copy of it online, so that I could provide an URL to it. Hopefully, I've still got a floppy disk around somewhere that I could pull up that has a copy of it. If I ever manage to get a copy and put it up, I'll let you folks know.


      Anyway, it seems to me that the sort of systems that Systrans and GPLtrans have created would be ideal applications of this methodology -- take what they have now (strict sentence/phrase/word substitution, or whatever), and combine that with a system that could tag and direct the substitution based on contextual clues.

      Implemented properly, you should be able to continue to extend and improve this sort of a system pretty much indefinitely.

      --
      Brad Knowles
      http://daily.daemonnews.org/ -- if you're not
    2. Re:Translation methods by MarkH · · Score: 1

      Just had a quick look at the tarball. It is a 22 MB substitution phrase based DB read by a reasonably simple PHP script.

      It seems to have basis ability to correctly position Proper-Nouns using wild card characters within phrases.

      No clever grammar rules etc which is probably a good thing. Stick on a 'did this translate properly' button and let users add to the vocabulary is probably a better approach long term approach with enough users that a clever grammatical algorythm.




    3. Re:Translation methods by forthy · · Score: 1

      Grammar+vocabulary based translation is doomed. The main problem is that natural language neither have strict grammar nor real vocables. I.e. what you have is a somewhat loose grammar, and ambiguous words that often have a lot of different meanings, depending on context.

      The problem of translation is that the classes of meanings are different in different languages. If people would only write unambiguous sentences, translation would be easy, but often enough, people use ambiguousities deliberately (jokes, oracles, poems, etc.). So it isn't enough to deduce the meaning of a sentence/paragraph/text by looking at the context, but also you have to carry the remaining ambiguousity into the target language. And that's one of the harder parts of translation.

      I'm talking about translation by humans, not by machines here, that's what is difficult enough for humans. I don't expect a good machine translation any time soon, as often enough human translations are too bad. Literature, with rhymes and tone has much more problems than just meaning, and here good translations are often impossible.

      So IMHO translation can't be done using string-based rules. You have to collect "meanings" from the words, the grammar rules and the vocabulary files can help you, but after all, you have to search for a sentence in the target language that matches the remaining set of meanings of the source language sentence best.

      --
      "If you want it done right, you have to do it yourself"
    4. Re:Translation methods by moore · · Score: 1

      well the trouth about translation is that
      the state of the art "wprking" systome do use a
      form of phrase structer grammer called HPSG
      (Head Driver Phrase Structer Grammer). There
      are older systomes which use GPSG
      (Genrlised Phrase Structer Grammer) which is the
      predisesor to HPSG. The tick is that the grammers
      are neatherer simple nore lucid and have thousands
      of rules. There are othere ways of buliding a
      natural language paser using theroys such as
      purly systacal aproches or GB (Government and Binding)
      but systacal atempts have not worked well
      and where GB has good theroy and is simple so far
      no one hase been able to write one which can run
      at a good speed. (I bleave that the best ones have
      taken up to and hour to parce one seantence).

      There is also the problime that there are still
      manny kinds of pharases for which no one is sure
      what there structure is and even more which the
      sturcter is hotly debated.

      anny way check my othere post on this story to learn how GPL Trans aculy works.

    5. Re:Translation methods by Luis+Casillas · · Score: 1
      The problem with hard-and-fast grammar rules, [...] is that they don't account for rapid linguistic change

      The issue is far more complicated than this. You cannot make such a statement without making it relative to some theory of what grammar rules are like and what depends on them, and what depends on other stuff, like, say, properties of lexical items.

      I'd say that the overall scheme of grammatical rules for a language can stay on a relatively firm ground for a while, especially with an international language literarily used. Hey, after all, people can read 16th century English and Spanish still.

      and people have this nasty habit of twisting grammar to express themselves in new and creative ways. :)

      Yeah. Actual language use is really fun, isn't it?

      In addition to this, it's very difficult to write simple, lucid grammar rules that also count for the myriad exceptions found in language.

      To hell with the exceptions. It's difficult enough to write simple, lucid grammar rules that count for the myriad generalities found in language :).

      Anyway, there is no natural concept of "exception" you can apply here; it is always theoretically loaded to call something an exception. How do you make a principled decision about what is a "genuine exception", and what is something your grammar should cover but doesn't?

      ---

  20. Can Open Source improve the design of this thing? by jquiroga · · Score: 2
    It is taken for granted that the Open Source process would take out all the bugs in this, if enough people look at the code and contribute.

    GPLTrans can be quite good, but imagine it's not (I still can't access). Let's suppose that its translation strategy is not very sophisticated and this system ends up being only marginally better than the others. Now, if somebody comes up with a great idea to improve the design of a machine translation system and wants it to be free, what is (s)he supposed to do?
    1. post it here and hope for the best ?
    2. report it as a bug fix ?
    3. do the coding and contribute a patch ?
    4. fork ?
    5. start from scratch ?
    6. try the first five options, in that order ?
    Does the outcome depend on the people running the original project?

    If they are closed to design improvements contributed by others, is their project truly Open?
  21. The _real_ question is ... by urgleburgle · · Score: 1

    ... how good is it at translating the GPL? Urgleburgle

  22. More ambiguity by gargle · · Score: 1

    Back in the 80s, a company produced software which they advertised with the tagline: "Finally, a machine that understands you like your mother."

    The great irony, of course, was that no machine natural language system in the world - even today - can deal with the sentence "Finally, a machine that understands you as well as your mother." (think about the possible shades of meaning)

    1. Re:More ambiguity by Dasein · · Score: 1

      My favorite example of ambiguity is from a book called "Natural Language Processing" (i think). Anyway the sentence is:

      Rice flies like sand.

      This could mean that the noun "Rice flies" enjoy sand or the the noun "Rice" flies in the same manner as sand.

      --
      You are not a beautiful or unique snowflake -- but you could be if you got off your ass.
    2. Re:More ambiguity by Anonymous Coward · · Score: 0

      Even better:
      Fruit flies like a banana.
      Time flies like an arrow.

    3. Re:More ambiguity by Type-R · · Score: 2
      In conclusion, a machine which includes/understands you love your mother.

      english to french to english on babel. Not as bad as it could have been... :)

  23. The book (was Re:Machine translators) by K8Fan · · Score: 2

    Do you mean Phillip K. Dick's novel "Galactic Pot-Healer"? (Stupid title, I know). In it, bored office workers sending a book title or folk saying through multiple translator machines, and challenging their friends to guess the original title.

    • Some of the examples:
    • The Cliche is Inexperinced - The Corn is Green
    • The Chesspiece made Insolvent - The Pawnbroker

    It's just called "The Game" in the book.

    --
    "How perfectly Goddamn delightful it all is, to be sure" Charles Crumb
  24. Mirror, Please? by Bruce+Perens · · Score: 2
    Someone please mirror this software. No surprise their site is down if a slashdot-sized audience is trying a new translation program on one server :-) I can't get at it.

    I hope the word databases and algorithm are easily separable from the implementation. I'm sure they can't have bound it too tightly to PHP and MySQL - the presentation layer should be determined by the user, and use of other databases should be possible.

    Bruce

    1. Re:Mirror, Please? by WilliamX · · Score: 2
      If anyone wants to mirror this, please email me. I've had to point the domain to an unused IP at the moment, absolutely couldn't handle the load any longer (especially for a freely hosted user).

      Of course, it would of helped had the author (who had hours of advance notice apparently) had emailed with I or my associate that agreed to host his site letting us know he was going to be on slashdot, then arrangements could of been made much earlier. He posted a notice on his site that it was happening, but failed to notify either one of us. (Can you tell I'm not real happy with him right now?).

      So if anyone has the resources to mirror this, contact me and I'll arrange it with the author, or contact him directly and arrange it. Either way works.

      --
      William X. Walsh
      william@dso.net

    2. Re:Mirror, Please? by WilliamX · · Score: 1
      Man I can't type very well after being up most of the night.......

      Forgive the gross errors above.

      --
      William X. Walsh
      william@dso.net

    3. Re:Mirror, Please? by Anonymous Coward · · Score: 1

      SlashDot should contibute temporary mirrors to all the sites it features... a least /.'s accessable most of the time, the sites it points to aren't...

    4. Re:Mirror, Please? by arafel · · Score: 1

      And you think that slashdot would still be available if it was hosting mirrors for other sites? I don't think so... :-)

  25. According to Bill Gates, this isn't possible! by Anonymous Coward · · Score: 1

    Either Bill Gates or one of his henchman is once quoted as saying something to the effect of "yeah, open source is great and all, but there are certain things that simply REQUIRE corporate backing, such as automatically translating an email message into another language." While obviously isn't the exact same thing, its pretty darn close. Anybody remember the mention of HTTP-DAV in the Halloween documents... the saga continues. If anybody can find the URL of the quote, please post it... I'm sure I saw it on Linux Today, but I can't find it readily in the search.

  26. Finally by Egorn · · Score: 2

    Maybe I can stop send letters to my french relatives that say: "I am ambiguously gay" instead of "I love my brothers" etc...


    --

    Movie News - "Entertainment news, bitch!"
  27. hmm... language translation... hmm... by Anonymous Coward · · Score: 0

    i need something to translate my sloppy java to ultraoptimized C... and if it can convert the comments to Mandarin as well, all the better.

  28. TurboTax: The Final Frontier! by Bruce+Perens · · Score: 2
    Forgive me if this sounds off-topic. It's nice to see another new problem set covered by Free Software. In thinking about what can't be covered by free software, the application I focus on is TurboTax. It's the laborious product of accountants and auditors building an expert system, not really the work of programmers. It needs to be accurate enough to persuade IRS not to audit in a tremendous number of situations. It can't ever be optimal, but it shouldn't be too much worse.

    I don't think it's tenable under the Open Source paridigm. I'm sure there are other, similar examples. So, there's room for proprietary software, coexisting with free software and running on a free infrastructure. I'd just rather keep the proprietary stuff in the leaf nodes of the software "tree", where nothing else depends on it.

    Bruce

    1. Re:TurboTax: The Final Frontier! by adolf · · Score: 1

      Bruce, while I typically agree with your ideology, I take issue with the message you attempt to convey in your above comment:

      In America, at least, everything depends on taxes. Thus, what you wish for is impossible.

    2. Re:TurboTax: The Final Frontier! by wnissen · · Score: 1

      I think the distinction that you're looking for here is the difference between applications that are fairly open-ended with respect to the feature set versus those where the specification is as complex as the program. In the TurboTax example writing the spec from the tax code is 90% of the work, and the spec cannot be written except by experts. 90% of the work would have to be done beforehand by experts, who presumably would want to be paid. Heck, the more popular TurboTax is, the *less* work for the accountants who helped write it. Compare this to apache, where the specified behavior is fairly loose and subject to modification in many different directions at once. You might only need one person who really knows HTTP to spec that part, the rest is determined by what people want to do.

      It's probably safe to say that most systems that require more domain knowledge than programming knowledge will remain difficult to open source. Can anyone come up with an example of such a system that is an open source success?

      Walt

    3. Re:TurboTax: The Final Frontier! by Splork · · Score: 2

      Ah yes, but do you really consider turbo tax a "software application"? Its value is not in its ability to do computation based on questions it asks you. Instead it is more of a service; you're paying for their expertise in preparing the expert system correctly for this years tax laws.

      I'd argue that we already have created this software as opensource: the web browser or other UI toolkits.

      Service will always sell.

    4. Re:TurboTax: The Final Frontier! by nmos · · Score: 2

      There are lots of companies that make tax software, it wouldn't surprise me if one of them decided to release the core program free (maybe even open source) and just sell "form modules" for various tax situations.

    5. Re:TurboTax: The Final Frontier! by Submarine · · Score: 1

      Perhaps it would be simpler to change the US tax code. :-)

      When I worked in the US, I couldn't believe that employees would need to pay an accountant to fill their taxes. I mean, I know of no other country like that... In all the Europeans countries I know, you fill in some numbers in a form, you sign and that's it!

    6. Re:TurboTax: The Final Frontier! by Rilke · · Score: 1

      If you really want the final frontier, think about kid's software. One of the best-selling software packages last year was a Barbie dress-up program. It's really hard to imagine Gnu 'Rugrats at the beach'.

      And you couldn't even get started if you wanted to. Trademarks are so tightly entwined with the software in that field, that it's just about impossible to Open Source anything.

      So, yes, there's plenty of room for proprietary software in the leaf nodes. It's funny, folks talk about the "desktop" as if the home market and the business workplace were similar markets. They're very different in many ways, but luckily much of the traditional home apps are moving to the web, where we can use them on decent operating systems.

      As far as TurboTax goes, an open sourced Tax program would be a great thing, since stability and lack of error is one of the major goals. I don't think it will happen though. Accountants don't rush home after work to work on personal accounting projects in the way many programmers do.

    7. Re:TurboTax: The Final Frontier! by cabbey · · Score: 2

      Actually the difficulties here wouldn't be technical at all. You're correct in that the lion's share of the work is in translating the tax code into directions simple enough a computer would understand, but in the US at least this is already done for you by the IRS... actually they take it a step farther and translate the tax code into directions any idiot with a GED can understand. When I was in college I translated the 1040EZ and 540E (State of California) forms into Pascal in under a week for crying out loud.

      No... the hard part about tax software isn't the code... it's the legalesee... who is Joe SixPack going to sue when GnuTax-1040 causes him to be audited? Can we get an addendum to the GPL that says if you use the results without verifying them then you use them at your own risk (oh... wait the wording on that reminded me... isn't there already a "use at your own risk" clause in the GPL?)

      Another sticky situation is the trust aspect... are people going to trust us to not collect their personal info? Lately I'm not so sure they're going to trust anyone... OSS or not. ('cause even if they *can* read the source it doesn't mean they'll understand it.)

      Being OSS also brings up another point... let's say you and I put out GnuTax and have correctly translated all the tables and formulary... then some 'leet haxor goes and patches it for something (say performance... or "privacy") and breaks the math... who's to blame? (I hate to think this way... but with something like this the blame game is going to be important... just ask Intuit's legal department).

      well... just to be on topic I was going to translate this to french or something... but the poor server is slashdotted....

  29. Learning translators by ViGe · · Score: 1

    I don't know what's the case with Babelfish etc. but I know that at least one finnish ->english translator site has used it's logs to improve it's translations. Of course the changes have been made manually, but I see it as a good thing to see it translating something totally wrong, and after some it translates the same sentence correctly.
    --

    --
    It has to work - rfc1925
  30. Re:Can Open Source improve the design of this thin by wnissen · · Score: 1

    You've just enumerated most of the options that are always open for any open source project. Obviously the best thing is to get involved, with code if possible, with the existing project and hope that the coordinator(s) are smart enough to recognize your contribution as valuable. If not, then you can fork or start from scratch, although at some later date the original project might choose to incorporate your changes anyway. This is precisely what happened with libc and glibc.

    Does anyone have an URL they can send that explains these issues in more detail? The question is just too broad to answer in a /. post.

  31. How does it work? by KillBot · · Score: 2

    I've studied compiler design, and I've wondered about how human languages compare to programming languages. I would think the biggest hurdle is interpreting ambiguous phrases like, 'fruit flies like a banana'. And all the implied words seem like typecasting, but are also ambigous. '(you/I/they) Come here, dammit'. But I wonder if the entire thing is more than just a really complex language description (in BNF or something) with a big database and a few enumerated phrases.

    1. Re:How does it work? by Anonymous Coward · · Score: 1

      Look here. Cool stuff. You just have to associate more info with each word (its semantic type(s), in addition to its syntactic type(s)).

    2. Re:How does it work? by rp · · Score: 1
      (This is what I learnt in college many years ago and it may not be entirely accurate.)

      In the 50s, American linguists tried to determine the grammar of language by statistical methods. Their hope was to 'discover' grammatical structures simply by examining large samples of spoken or written language and counting the distribution of words.

      In principle, this allows word categories, sentence structures, etc., to be discovered. In practice, it took too long, espcially for the impatient Noam Chomsky, why simply postulated that grammatical structure is the result of a sentence generating capacity within humans, an innate part of the human intellect.

      It is interesting to see that mechanical translation and mechanical language recognition based on this 'generative' notion of grammar have largely failed, while statistical methods are much more successful. It is much more practical to apply the methods of the 50s today, an I wonder to what extent the statistical methods used today are actually using the same principle of trying to 'discover' grammar as they go.

    3. Re:How does it work? by rp · · Score: 1
      (This is what I learnt in college many years ago and it may not be entirely accurate.)

      In the 50s, American linguists tried to determine the grammar of language by statistical methods. Their hope was to 'discover' grammatical structures simply by examining large samples of spoken or written language and counting the distribution of words.

      In principle, this allows word categories, sentence structures, etc., to be discovered. In practice, it took too long, especially for the impatient Noam Chomsky, why simply postulated that grammatical structure is the result of a sentence generating capacity within humans, an innate part of the human intellect, and attracted so many followers (statistics is boring) that the 50s method was soon considered ineffective and obsolete by a majority of linguists.

      It is interesting to see that mechanical translation and mechanical language recognition based on this 'generative' notion of grammar have largely failed, while statistical methods are much more successful. It is much more practical to apply the methods of the 50s today, an I wonder to what extent the statistical methods used today are actually using the same principle of trying to 'discover' grammar as they go.

  32. The effect on Slashdot by ajlitt · · Score: 1

    Now all of us German-impaired Slashdotters can
    read the c't articles.

  33. does it work at all? by Submarine · · Score: 1

    When I checked it during the week-end, it looked like GPLTrans computed the identity function in all directions. I mean, when you fed it a text x in English and told it to do English->French, it'd output the same text, without any translation.

    And now their server looks like it's down...

    1. Re:does it work at all? by pb · · Score: 1

      All that means is that they've perfected English->English, French->French, Spanish->Spanish, etc., etc. :)

      I wish I could look at the source, if anyone has it, post a link or something.

      Someone moderate this up, along with the (real) first post unfairly marked as redundant, and then spank the moderators for me.
      ---
      pb Reply or e-mail rather than vaguely moderate.

      --
      pb Reply or e-mail; don't vaguely moderate.
  34. More babelfish fun. by Anonymous Coward · · Score: 0

    Translate "Sorry, dude!" to French and then back to English -- you will get "Afflicted, standard!" (???!!!)
    "Die young" ends up as "the young people of matrix" (!!!)
    "Die hard" is "hard matrix", ",die hard" (with a comma) is translated correctly, "live fast, die hard" is again about some silly matrix.
    Better yet, try several iterations of english->french->english->... until it settles down, then have your friends guess the initial phrase. "I carry out with biscuits except the function", anyone?
    It's oh so easy to make fun of them.

  35. This stuff is hard by moore · · Score: 3

    I posted this a reply to a comment but then thought maby it should be its own thread.

    The problime is that most if not all of
    these systomes know nothing about meaning at all.
    All that do is try to match one set of strings to
    a difrent set of strings.
    GPL Trans works by the substuation methoud.

    >from: Mooneer Salem
    >
    > It is a system where words in a phrase that
    > can be substituted are
    > marked by %phrase%
    > For example:
    >
    > English: My name is %phrase1%.
    > Spanish: Me llamo %phrase1%.
    >

    This genreal systome can be extended in to a
    phrase sturcture grammer with pares of rules for
    each language. ex:
    english: S -> NP1 V NP2
    irish: S -> V NP1 NP2

    these rules would modal sentences like:
    english: the cat chased the dog.
    irish: chased the cat the dog.

    All this is oversimplifyed but you get the poin.
    The real problime is that you need to be trained
    as a linguist to understand what the structer of
    many seantences are and even linguestes aruge a
    LOT. The phrase structal aprouch is probly what
    altavista a such do. All thoe I rilly like the
    idea to GPL Trans I do not thik there aproch will
    get them to far; but it will be fun to see what
    thay can do.

    1. Re:This stuff is hard by PurpleBob · · Score: 2
      Okay. Here it is: the Moore-ish -> English translator! And it's open source!

      aproch -> approach
      problime -> problem
      systome -> system
      difrent -> different
      all thoe -> although
      linguestes -> linguists
      aruge -> argue
      substuation -> substitution
      rilly -> really

      This is obviously not complete, but hey, it's the first version :)

      The interesting thing about Moore's spelling is that he's consistent. More consistent than, (to bring it back on topic) translating from German to English.
      --

      --
      Win dain a lotica, en vai tu ri silota
    2. Re:This stuff is hard by Michael+Woodhams · · Score: 1
      "these rules would modal sentences like:
      english: the cat chased the dog.
      irish: chased the cat the dog."

      So English uses infix notation and Irish uses prefix notation. I hope the Polish use Polish Notation. Anyone use postfix/RPN? German?

      I had to modify a stellar evolution code written in Fortran by Poles once. I was worried I'd have to read the comments backwards before I could understand them.

      --
      Quattuor res in hoc mundo sanctae sunt: libri, liberi, libertas et liberalitas.
  36. Attn: Hemos by Runna^Muck · · Score: 0

    Did you double check, triple check, email, call, write a letter to make sure it was ok to post this story? Don't want people getting upset now do we?
    Sarcasm not only implied but required. Yeah off topic, tired of Hemos bashing.

  37. More Babelfish abuse! by pb · · Score: 1

    In order to come creature who lives with voltages he on the Pepsi-voltages!



    Woo-ee, babelfish is smoking crack tonight. It's starting to sound like a religious prophet. The Bible, by Babelfish, anyone?
    ---
    pb Reply or e-mail rather than vaguely moderate.

    --
    pb Reply or e-mail; don't vaguely moderate.
    1. Re:More Babelfish abuse! by Anonymous Coward · · Score: 0
      Actually, not half-bad.

      Genesis, chapter 1
      1: In starting God created the skies and the ground.
      2: The ground was without form and vacuum, and the darkness was on the face of the deep one; and the spirit of God moved above the face of water.
      3: And said God, " leave there is light "; and there was light.
      4: And God saw that the light was good; and God separated the light from the darkness.
      5: God called the light day, and the darkness which it called Night. And there was evening and morning ago, one day.
      6: And said God, " left there is one firmament in the medium of water, and let it separate water from water "
      7: And God made firmament and separated water which was under firmament water which was above firmament. And it was thus.
      8: And God called the sky of firmament. And there was evening and morning ago, a second day.

    2. Re:More Babelfish abuse! by cabbey · · Score: 1

      was this one pass from english to something and back to english? or the full degenerative case of repeating until stability?

    3. Re:More Babelfish abuse! by Anonymous Coward · · Score: 0

      one pass e->french->e.

    4. Re:More Babelfish abuse! by pb · · Score: 1

      History of emersione, chapter 1
      1: When one to begin with the God to produce to skies and the movement.
      2: The track was without form and esvazía and the density was in the consideration with of the deep one; and ragia of the water of the God was moved in the consideration with of water.
      3: And the visualized God, " is he here light of the left "; light
      4 of and had one: This God of and has the light of buoa; and the God separated the light of the density.
      5: The God visualized the system to ignite in the day and the density, of that had indicated the night. Irradiates one and had had one night and one morning, a day
      6: And the visualized God, " was implied is firmament with the average of the water and the f4ez here he with her, who separated, to innaffiare of...
      ---
      pb Reply or e-mail rather than vaguely moderate.

      --
      pb Reply or e-mail; don't vaguely moderate.
  38. Context and internal semantic representations by Arjen · · Score: 3
    I was surprised to find that Altavista's Babelfish utility has very poor analysis of context (possibly none at all).

    While contextual knowledge can increase the qualitiy of a translation; the amount of world knowledge necessary to translate a typical web page is simply astounding. Most users of a translation system simply do not want to wait for hours to translate a simple sentence.

    And, there is the problem of linguistic knowledge. Most web pages are not written in "proper" English, but in some Web-speak-lingo. This requires the system to be very robust.

    The most successful use of MT in corporations today are situations where a very simple grammar and lexicon is used, and very little world knowledge ois required. For instance, the Xerox corporation has its own translation system that translates component manuals. The technical writers that write the original version of the manual are required to use very simple English only, without any ambiguities and with very simple constructions.

    For translation software that has multiple language targets, i would have expected it to first resolve the meaning of the English sentence into an internal semantic representation before using it to emit Spanish or French.

    This "internal semantic representation" is called an Interlingua. It has been used in various MT systems, with varied amounts of succes.

    The most important advantage of an Interlingua-based MT system is that is does not require a translation engine for each language pair. For instance, if you create a system for English, French, Dutch and German texts, you only need to create four analysis engines:

    1. English -> interlingua
    2. French -> interlingua
    3. German -> interlingua
    4. Dutch -> interlingua
    And four generation engines:
    1. interlingua -> English
    2. interlingua -> French
    3. interlingua -> German
    4. interlingua -> Dutch
    With a non-interlingua system (which is called a Transfer system), you'd have to create 3^2=9 engines:
    1. English -> French
    2. English -> German
    3. English -> Dutch
    1. French -> English
    2. French -> German
    3. French -> Dutch
    etc..

    Clearly, it is easier to integrate new languages into a interlingua system than into a transfer system.

    1. Re:Context and internal semantic representations by mlc · · Score: 1
      With a non-interlingua system (which is called a Transfer system), you'd have to create 3^2=9 engines:

      Just to pick nits, I think you'd need 4P2 = 4*3 = 12 engines. In general, [and this is all speculation from the above post, but I can't see where I'd be wrong], to translate among n languages, you'd need 2n engines with interlingua and nP2 = (n)(n-1) engines with transfer. Of course, as babelfish can only translate to/from English, they only need 2(n-1) engines.

  39. this will fail ! by Rock-n-Rolf · · Score: 1

    Hello,

    Ive a masters degree in computational linguistics, and I predict this effort will totally fail. Research on automatic translation is about 40 years now and a lot of money has been spent.

    However there is still no working solutions, as problems are still far too big. Id suggest everybody participating in discussion should read a good book on linguistics.

    --
    In Korea, all your base are Only For Old People
    1. Re:this will fail ! by Anonymous Coward · · Score: 0

      Apropos translations: it's bad form in English to start anything off by telling everyone about your degree.. Better to say, "I stuidied" or "worked in the field", "though my experence" etc.. :-)

  40. It only does EnglishSpanish so far! by pelrun · · Score: 1

    If you'd read the update text at the top of the page, you'd have realised that it says "French, German and Portuguese have been added, but they currently don't do anything"!

  41. POP3 by Bassthang · · Score: 1

    Anyone else worried by the fact that they ask for your POP3 password and sent it to their server?

    I ran the English->Spanish translation on my homepage and, although I don't speak Spanish, it is quite clear that it sucked! Much development work to be done I think. A VERY good idea in principle though.

    --
    "What I look forward to is continued immaturity followed by death."
  42. Weak translation, funny note by Emil+Brink · · Score: 2

    Now this is what I call a powerful demonstration of the quality of open source software:
    English: "I am a small fish who wants to live in your ear."
    German: "Ich bin a small fish who wants to live in your ear."
    Astounding. I couldn't have done it better myself, and it was 6 years since I last took a German class... Wow. Also, I find this part of the Note at the bottom of each page particularily qualitative, too:
    Note: this computer-automated translation is not guranteed. It'll screw up with some text. If it does in fact screw up, first make sure you spelt everything properely.
    My note: I have mucho respect and understanding for alpha releases. It's just that I'm a nitpicking bastard, and this was quite funny. ;^)

    --
    main(O){10<putchar(4^--O?77-(15&5128 >>4*O):10)&&main(2+O);}
    1. Re:Weak translation, funny note by Anonymous Coward · · Score: 0

      Well, spelt is the British spelling of the past tense of the verb spell. Spelled is the Am form. Maybe give them the benefit of the doubt on that (I've no idea whence they hail), but there is no defending "properely".

    2. Re:Weak translation, funny note by Anonymous Coward · · Score: 0

      ah but it was intentional... like the sign in typesetters' cubes that reads "Thimk!"

  43. Here's a source mirror by Anonymous Coward · · Score: 0

    Since the provider appears to have pulled the page: Here's a mirror of the source (uuencoded to protect it from geocities)... http://www.geoci ties.com/SiliconValley/Foothills/7223/gpltrans.txt It's fairly uncomplex...

  44. Anything _CAN_ be open sourced. by kapplepc · · Score: 1

    >>>> I want their control software to be OpenSource(tm)d, 'coz I won't trust Lockheed Martin.

    I still agree with the original statement. Even control systems for military hardware could be open sourced. A system that guides a missle to it's target could be similar to what might guide some self driven transportation device of the future. The open source model might allow for reuse and faster developement.

    You might think that there are dangers to giving Terrorist group X the software. I argue the materials and mechanical designs would still be secret and difficult to access. I don't think there is any greater risk then we already have today.

    Language software like any and all monsterous and small software projects is perfect for the open source model. Since the marginal cost of copying software is zero to the writer there is no good reason for him to charge the second person who wants the software.

    However, the closed model is the reason we have some of the technology we have today. Sometimes we are willing to share the cost of the first writing but no-one will do it for society. In the situation of the this software lots of people are willing to do it for society.

    Anything _CAN_ be open sourced.

    1. Re:Anything _CAN_ be open sourced. by anatoli · · Score: 1
      How about opensourcing some blueprints too? Modern CAD systems represent objects in a way that is very similar to a special-purpose programming language. So blueprints are software for all intents and purposes.

      Now, I didn't mean specifically military applications -- merely dangerous ones. You know, if a nuclear powerplant goes Chernobyl mode because of faulty software, somebody ought to be held accountable. And if I was that person, I'd hesitate to accept submissions from general public, even if I decide to make my software free for all to view and copy. This is not quite OSS development model.

      And on a completely unrelated note: I typed this, hit "preview", hit "back" -- and my typing was gone. Bad, bad Slashdot.

      Moderate this down, citizen.
      --

      --
      Industrial space for lease in Flatlandia.
  45. Cool? Yes and no. by Anonymous Coward · · Score: 0

    I mean, this is really a good thing and everything, but it is, after all, a web based translator, hey, everyone doesn't spend all their time in the net, maybe they must even pay for their online time instead of some monthly fee.

    So the question is, when do we get a translator that works on your own machine, console or X, doesn't matter to me, as long as it doesn't require connecting to anywhere. Something like euroword etc. (but better, of course)

  46. Re:This stuff is hard(especially to read) by Anonymous Coward · · Score: 0


    I think I get your point. If everybody writes in their own little language, a translator will not work!


    I also suspect that your use of the English Language were a joke.



    Idea: A syntax Checker for languages. Make an alias CE (Compile English) that pipes your text through unix spell. Then someone might actually use it !

    >CE myslashdotposting.html > correctpost.txt
    >----------------------
    >--Compiling ----
    >--FATAL ERROR - there is no such word "THIER"
    >--ABORTED.
    >
    --
    An AC who wants a syntax "compiler" for languages.

  47. All Links Forbidden! by Anonymous Coward · · Score: 0

    At around 13.00 GMT 29-Nov-99 I can only get 'Forbidden'.

  48. Forbidden? by QZS4 · · Score: 1

    Forbidden
    You don't have permission to access / on this server.
    Apache/1.3.9 Server at gpltrans.zzweb.com Port 80

    Same result with the other link... This makes it a wee bit hard to check out the site. Are there any mirrors out there?

  49. It didn't work very well... by Anonymous Coward · · Score: 0

    I stumbled across this site a couple of days ago. I was going to submit a story about it, but I decided I'd better try it first. So I typed in one sentence (from a news story about the MS FoF) in English and asked to have it translated into French. It returned the same sentence with one word translated into French. I don't think that they are ready for prime time.

    1. Re:It didn't work very well... by Anonymous Coward · · Score: 0

      It can not work very well: if its sources are what thain claim to be, then gpltrans is trivial string substituting system with few hundred words in dictionary. And that just can not work too well. (I actually downloaded uuencoded version from geocities, but I guess it is not a hoax.)

  50. Insightful my arse by Hugo+Graffiti · · Score: 1

    Dear oh dear, what is this "score 5, insightful" nonsense? How come any old "Open Source is rilly cool" comment gets moderated up, regardless of the evidence. Slashdot is beginning to resemble some wacky fundamentalist cult. The only way something as complex as natural language translation could become Open Source is if an academic institution just gave away their source. The last time I checked about a year ago, the only decent software out there was either commercial or it was released by universities as binary only. Suddenly here's a story about an Open Source translator. So you go check on google to learn more about the history of gpltrans. No hits. Same story on DejaNews. A large-scale Open Source development that nobody's ever talked about before? Yeah right.

  51. Bruce Perens offtopic? by Anonymous Coward · · Score: 0

    Who else thinks he'll get a 5 even though this post was completely unrelated to what was being discussed?

    Hey Bruce, you have technocrat.net. Keep the mindless ranting there.

    (Yes, this was trolling. But I couldn't resist.)

  52. Be sceptical by vlax · · Score: 2

    Alas, the website has been /.'ed, so I can't look at the translator, but there are some serious questions to ask.

    1 - testing: They claim to be the most accurate of the web-based translators. Based on what corpus and measured in what way? This isn't a trivial question, there are no benchmarks for translation programmes.

    2 - parsing. If this program uses American style phrase grammar, it will inevitably break down. Phrase grammar is counterintuitive and for AI purposes pretty unproductive. It is computationally simple - see Charniak's last book for good parsing algorithms - but almost certainly isn't the way humans process language.

    All of the most successful natural language translation systems are, in one way or another, dependency grammar based. Dependency based systems are also generally more portable to other languages.

    3 - morphology. English is very morphology poor. If morphology is only minimally accounted for (as a lot of poorly thought out, English based NLP systems are), I don't see how it can hope to work in Russian, or Turkish or dozens of other major languages with rich morphology. Furthermore, what kinds of morphological rules can it accept? There are languages that use prefix, postfix and infix morphology. The kinds of simple rules that can account for English will not go vert far with other languages.

    I haven't seen this program, and I don't know how seriously these issues have been considered, but they are the kinds of things to keep in mind when looking at machine translation programs.

  53. The FAQ holds the answers... by gregstoll · · Score: 1

    Read the FAQ for reasons why slashdot doesn't do this...

  54. A bunch more info by Luis+Casillas · · Score: 1
    I just thought I should mention that GPSG, HPSG and GB are not parsing technologies per se. They are serious linguistic theories of syntax.

    GB stands for "Government and Binding" theory; it is the outgrowth of Noam CHomsky's model of Universal Grammar from the beginning of the 80's, and possibly the theory on which most theoretical syntax has been done.

    GPSG stands for "Generalized Phrase Structure Grammar"; it was developed in the late 70's, initially by Gerald Gazdar. Basically, it is an enhanced form of context-free grammar, that is more suitable for description of natural language syntax.

    HPSG was derived from GPSG in the mid-80's at CSLI in Stanford, by Pollard and Sag. It incorporates ideas from other theories of syntax like LFG and GB. HPSG, in comparison to GB, is concerned with making its grammars as useful as possible for computational linguistics. Therefore, many HPSG researchers work in projects like LinGO, trying to apply HPSG to computational projects.

    LFG, which I mention above, is another theory of syntax (if you have guessed by now that theoretical linguists are an unagreeing bunch, add 100 points to your total). It is also used in computational projects, like the Xerox NLTT.

    I hope people find this info useful.

    ---

  55. Disclaimer by Luis+Casillas · · Score: 1
    I just though I should add that my list is in no way exclusive. In fact, it has an obvious Stanford bias :).

    ---

  56. forgot to warn the web guys by cabbey · · Score: 1

    does any one else have visions of the IBM tv ad about the guy in the support group that says they had this great idea... they got all kinds of publiclity (substitutite /. for the superbowl comercial) they were going to be huge... but they forgot to warn the web guys... and the site crashed.

    bump-ba-dee-dumm-dup

    "that was stupid dave...."

    that old lady at the end just cracks me up....

  57. Not real yet. by Bruce+Perens · · Score: 2
    I just got a look at the source code. I think in a few years they might have a real translation database, but right now they only have a few hundred Spanish words and a few dozen German, French, and Portugese. It's a toy program. Not a bad place to start, but hardly worth the press release.

    Bruce

  58. Context of translation (& meta-moderation on /.) by timothy · · Score: 2

    Some respondents have pointed out the difficulty in making translations contextually sensible ... whether 'run' should be translated as 'execute,' rather than 'quick bipedal motion.'

    I don't see an easy way to get out of this -- the needed 'world knowledge' that people have pointed out as necessary for this really is huge.

    But (and this is why I mention slashdot's metamoderation), there is a certain amount of brute-forcing which could serve as a useful basis for creating improved context interpretation. For instance, let's say you visit this translation engine and choose some text for it to translate ("Mein Hund ist in dein Aktentasche," say). At the same time, there might be a few selections of recent translations requested by others, and the resultant translations, which could be shown to you based on the languages you know. (Not telepathically ;) -- based on your own self-declaration, perhaps followed by a quiz to establish competency.)

    The resultant translations could be joined with alternate tranlations / permutations, and each reader could (say), rank-order them, or choose the best one, as far as they can determine by context, etc.

    And hopefully, the program can then be taught (wrong word, but I'm being figurative)that (anthropomorphically), something like "OK, if there are several computer-related terms in the translated text, like megabyte and power-supply, 'run' is likely to mean 'execute.' If 'run' however appears in a context which does not indicate computer use, and / or directly before the paired words 'away from,' it should probably be the bipedal-movement one. And if it's in front of a business-type name, like 'bank,' 'lemonade stand' or 'brothel,' then it is likely to mean 'manage' or 'administer."

    In my (interested but ignorant layman's) understanding of AI translators, this is the kind of discrimination that they try to make, nothing out of the ordinary. But, because words can fit into so many categories, I think this sort of gradual, piecemiel accumulation holds hope of making it work better over the long haul. It would take too many linguists to account for all the wacky ways that words get used.

    Just thoughts,

    timothy


    --
    jrnl: http://tinyurl.com/c2l8yr / foes: http://tinyurl.com/ckjno5
  59. Inside your what? by Anonymous Coward · · Score: 0

    "The last time I checked about a year ago,
    the only decent software out there was
    either commercial or it was released by
    universities as binary only."

    So check back more often, 'K?

  60. Rice flies like sand. by Luis+Casillas · · Score: 1
    Rice flies like sand.

    There is a third interpretation, in which this is a noun phrase. You know, that kind of "rice fly" which is "like sand".

    Of course, one can make some even stranger sentences, like All black english literature professors know some rice flies like most sand. Hell, this one must be ambiguous in well over a hundred ways :-).

    ---

  61. The Matrix has us. by Q*bert · · Score: 2
    This just goes to show, the Matrix has us. It surrounds you. Everything you see, hear, feel, or taste is part of it. ;)

    I'll take a stab at your puzzle: "I toss my cookies down the toilet." Just a guess, highly dependent on humorous context. ;)

    Vovida, OS VoIP
    Beer recipe: free! #Source
    Cold pints: $2 #Product

  62. Re:Context of translation (& meta-moderation on /. by Anonymous Coward · · Score: 0

    Maybe the translator could consult a search engine ... count the hits for each attempted translation (e.g. "execute away from" should generate much fewer hits than "run away from") and base its translation on these counts (so choose "run away from"), i.e. use the internet as your "world knowledge" database? Just a silly idea ...

  63. Using search engines to determine context by timothy · · Score: 2

    Yeah, I think this is also a good idea. The problem with it is that search engines themselves can only supply answers based on statistics, not judgement. It would be useful to do a search engine search like you say, but the translator engine would have to have a good idea of what size chunks to divide the original text into.

    Anyhow, no conflict here -- I think translation engines are going to have to use a number of strategies on every input text and see which ones make the most sense in the end, then applying the information that for text-chunk X, translation X-prime (or whichever) was the best translation. That way when phrasings similar / identical to ones in text-chunk X appear again, there is at least a reference to check against.

    timothy

    --
    jrnl: http://tinyurl.com/c2l8yr / foes: http://tinyurl.com/ckjno5
  64. Data data data! (and when will the site be back?) by LarryTheCucumber · · Score: 1

    The key to making progress with any natural language processing system is lots of quality, annotated data. My M.A. long paper project involved adapting a natural language parser to identify errors made by Japanese language students. The hardest, most time consuming part was getting examples of errors that real students made and then getting a Japanese teacher to diagnose the errors. For another project, I wrote a program that automatically deduced rules for identifying proper names, places, times etc. from sentences in which these entities were already tagged.

    There are lots of ways to do statistical analyses that result in better NLP systems, but the key is having lots and lots of quality data. For developing translation systems, having lots of translated sentence pairs done by a good human translator is almost crucial.

    Bruce Perens just pointed out that gpltrans is a toy system at this point; an engine plus a small vocabulary. Developing the lexicon (words + definitions) and grammars will probably be the part of this project that will require the most effort. Kind of like all of the device drivers needed to make Linux a really useful system.

    Does anyone know if there are free (speech) annotated corpii/lexicons/grammars/translation pairs out there that could be used in this and other NLP projects? Does anyone want to contribute some?

    And does anyone know when the site is coming back up (or a mirror)? I'm dying to have a look at the source!

    -jimbo

    --
    "Hold me Bob!" "I would if I could man!" -Larry and Bob in VeggieTales
  65. New Mirrors Coming Soon by mind21_98 · · Score: 1

    http://gpltrans.grmbl.com/ (should be up, but the database is still messed up)
    http://gpltrans.sourceforge.net/ (will be up by tomorrow)

    Sorry for the inconvience. And thanks William X Walsh for forwarding those mirror requests.

  66. antonyms by Hard_Code · · Score: 2

    "But, if you say something more obvious like "Molten Lead is cool" it's pretty easy to assume which version of cool you mean."

    Couldn't one also use antonyms in this case. I.e. a word/phrase can be a replacement, if it is synonymous, and /not/ antonymous in the context. For example, molten describes the noun. Molten is probably also partially synonymous with "hot". Since "hot" is the antonym of "cool", in the temperature sense, then one would not use "froid" to describe it in French, but instead the appropriate term for "cool" ("cool" itself I guess), which would not be antonymous with "hot".

    --

    It's 10 PM. Do you know if you're un-American?
  67. Some thoughts... by WorldMaker · · Score: 1

    Who's going to sponsor this project (like Microsoft and TerraServer)? It seems to me like a major server should be set up, but that it would need to be big, close to the backbone, and quick because it would get a lot of translation work (if it did well). As a fan of Artificial Linguistics, I have to ask... Should the "main" server support adding new languages? What about artificial languages like Esperanto, Lojban, Klingon, or even languages that are less well known? Should a Language Suggestion function be present? Or even a language addition utility?
    WorldMaker