Open-Source Language Translator Opens For Beta
mind21_98 writes "A new machine-translator designed for language translation has offically opened for public testing. GPLTrans is a translator similiar to Babelfish. Pre-alpha testing has shown that it is the most accurate of the major Web-based machine translators. More information can be found here. "
Or is my net connection typically slow?
Hopefully we'll see some better translators, because the current ones suck.
:)
And maybe we'll be able to add on some custom vocabulary, that would be really nice for computer journals (or chemistry, medicine, whatever...)
...at least the article wasn't in German, or something.
---
pb Reply or e-mail rather than vaguely moderate.
pb Reply or e-mail; don't vaguely moderate.
... is how babelfish might translate "First Post"
We need a web translator that accurately translates swear words, or that will at least handle "Will you please fondle my buttocks?" correctly. My nipples explode with delight!
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
It's really good to see that there is work going on to progress these type of programs. What part of the problem with babelfish is that it doesn't quite get the job done. Several of my classmates have tried to cheat when writting a paper in a different language. Someone in Germany said this to me once in response to my translation...
"I know what you say, but I don't know what you say. You funny American!"
Justen Stepka
"My hovercraft is full of eels" in foreign languages? Now I can find out! :)
What did you eat today? http://www.atetoday.com/
I was actually just thinking about a practical way to interface to some translation software to write a real-time IRC bot to translate conversations as they happen. The only free translation software I knew of to do this was Babelfish, and writing an interface to that would be slow as hell for a real-time app, but this thing might be the answer. :)
okay its gpl'd...they're using linux...i'd think that 'skript kiddie' should definitely be a supported language!
-- your knees hurt, don't they?
While machine translation is very practical, it can also provide entertainment. I remember a story about scientists testing an English-Russian-English translator by translating phrases to Russian and back. Input: "The spirit is willing, but the flesh is weak." Output: "The vodka is good, but the meat is rotten."
-- The Sheep --
I'll give this a test at the office,. because half the time I don't understand half of what the customers are saying.
Perhaps I can use it to translate my words to the customer,. so when I say "Ok,. click on My Computer" they don't hear "restart the computer and click on the first icon you see while hitting the esc key and pulling on the power cord".
My studio - www.graylands.ca
I'm not sure if it has been done yet, but it would be quite helpful if an AI could 'evolve' along with the language (because, as we all know, language changes all the time) based on monitoring of user-editing of the post-process text. For example, if at time 'a' it was programmed to translate 'Cool' to 'Froid' in french, it would (after monitoring the changes made by users) learn to translate 'Cool' to the french equivilent of 'hip'. or something. 'cause, dammit, i can't wait until the AIs take over ;)
Don't you think IRC is one of the most difficult translation jobs there is? I mean, with all the abbreviations, misspellings and stuff. And few people use complete sentences at all. You would need an immense amount of knowledge gathered from following the conversation (and several at once!) to be able to get anything useful.
Sorry, but I don't believe it's possible, even if a perfect translator for normal speech existed.
EagerEyes.org: Visualization and Visual Communication
It would be nice if someone were to make a CORBA translation service and add this to one or more of the linux desktops. Then it could be used for email, documentation, irc, coding, etc, not just for the occasional web page. It would also be good if the data at gpltrans was snapshotted regularly and pushed around, ideally so that everyone would have their own copy.
It's common to here the pundits opine that "open source may be good at improving 30-year-old operating systems, but the open-source model just doesn't work when it comes to large scale applications." Various reasons are given, for example: "open source programmers only do what is fun and interesting, and applications aren't interesting". But here we see yet another large-scale application falling to the barbarian hordes.
Those pundits are wrong: there is no genre of software that the open-source model will never absorb. Simply because the open-source model results in better software, for reasons that are well-known. And no, there is no no software application that is so uninteresting that no volunteer anywhere in the world will touch it. On the contrary: the more an application area remains untouched, the more interesting it becomes to open-source programmers, simply because it's virgin territory.
This is the "stamp collector" syndrome: when you already have a goodly number of stamps in your collection, adding the missing ones becomes an obsession.
Life's a bitch but somebody's gotta do it.
Will users be able to add/update/correct translations or modify dictionaries ala the APT bot in #debian on irc.openprojects.net?
It seems to me the growth would be incredible if users could modify the dictionary (or atleast add suggestions that could later be added by someone with the appropriate power.
I wonder if the open-source model for something like this could extend to the program's users as well. The idea would be that, as people used the program, it could learn from their input. Thus, every time someone inputs a new word into their local copy, this information could be replicated at some central repository and made available to other users. In fact, you could even ask the user to categorize, define, and give usage examples for each new thing.
For that matter, you could even have the users refine the system's grammar.
How hard would that be to implement? Is it totally far-fetched?
Can your IM do this?
What most of these language translation programs need is a better understanding of context. I was surprised to find that Altavista's Babelfish utility has very poor analysis of context (possibly none at all). For example, when translating from English to French, "run" always translates to "exécute". For a sentence like you get which is reasonable, but if you translate you get which doesn't make any sense. More incredibly, "store" always translates to "mémoire". You would think that, if they were going to force every word to be interpreted in one sense, they would choose the most common meaning. But this choice leads to insanity where translates to
With knowledge of context, a more advanced system could notice situations in which it was more reasonable for "run" to have a particular meaning. In the last example, "run" is followed by a prepositional phrase indicating a direction, which would imply that the meaning involving physical movement is appropriate, and so on.
Even more revealing is the fact that the confusion of meaning happens differently for different languages. If you translate
into Spanish, you get the hilarious result: For translation software that has multiple language targets, i would have expected it to first resolve the meaning of the English sentence into an internal semantic representation before using it to emit Spanish or French. The above would be evidence that the Systran software has no such representation -- or at least that their representation is too weak to indicate the difference between "store" as in "memory" and "store" as in a warehouse.-- ?!ng
Finally, a project that has been needing to come around. A translator that's fast AND accurate. Best of all, it lets you correct phrases! Babelfish better stick around though.. i always get a kick out of doing things like translating
'I like to soak my feet in gallons of whipped vanilla pudding'
and having it finally come out as
'I appreciate to impregnate my feet in the gallons of the pudding that I have exposed to the flash of the vaniglia.'
Although the site has been slashdotted, it would be interesting to see what sort of algorithms it uses to perform the translations. Mmm, open source.
:) In addition to this, it's very difficult to write simple, lucid grammar rules that also count for the myriad exceptions found in language.
:) The parsing itself is a hefty (and not terribly exciting) task. I attempted to make a term project of a fairly basic English parser and ended up changing the project.
I would be inclined to say that if it is based on grammar rules, the project won't make much headway - machine translation has been butting its head against this brick wall for forty years. The problem with hard-and-fast grammar rules, e.g.,
S = NP VP
NP = Det (Adj)* N
VP = V (Adv)
is that they don't account for rapid linguistic change, and people have this nasty habit of twisting grammar to express themselves in new and creative ways.
I imagine GPLTrans would probably be using some sort of probability frame of phrases and words occurring together, but one can't be sure without looking at the source. I think the best way to do translation software would be to convert the text into syntax, then into a more abstract semantic form, and from the semantic form, translate back into the target language's syntax, and then into the target language's text. Of course, the trick is to figure out just exactly how to do this.
My 2 cents/Pfennig/lire/pesos,
Y
"There is no culture in computer science, only cults." - M. Felleisen
GPLTrans can be quite good, but imagine it's not (I still can't access). Let's suppose that its translation strategy is not very sophisticated and this system ends up being only marginally better than the others. Now, if somebody comes up with a great idea to improve the design of a machine translation system and wants it to be free, what is (s)he supposed to do?
- post it here and hope for the best ?
- report it as a bug fix ?
- do the coding and contribute a patch ?
- fork ?
- start from scratch ?
- try the first five options, in that order ?
Does the outcome depend on the people running the original project?If they are closed to design improvements contributed by others, is their project truly Open?
... how good is it at translating the GPL? Urgleburgle
Back in the 80s, a company produced software which they advertised with the tagline: "Finally, a machine that understands you like your mother."
The great irony, of course, was that no machine natural language system in the world - even today - can deal with the sentence "Finally, a machine that understands you as well as your mother." (think about the possible shades of meaning)
Do you mean Phillip K. Dick's novel "Galactic Pot-Healer"? (Stupid title, I know). In it, bored office workers sending a book title or folk saying through multiple translator machines, and challenging their friends to guess the original title.
It's just called "The Game" in the book.
"How perfectly Goddamn delightful it all is, to be sure" Charles Crumb
I hope the word databases and algorithm are easily separable from the implementation. I'm sure they can't have bound it too tightly to PHP and MySQL - the presentation layer should be determined by the user, and use of other databases should be possible.
Bruce
Bruce Perens.
Either Bill Gates or one of his henchman is once quoted as saying something to the effect of "yeah, open source is great and all, but there are certain things that simply REQUIRE corporate backing, such as automatically translating an email message into another language." While obviously isn't the exact same thing, its pretty darn close. Anybody remember the mention of HTTP-DAV in the Halloween documents... the saga continues. If anybody can find the URL of the quote, please post it... I'm sure I saw it on Linux Today, but I can't find it readily in the search.
Maybe I can stop send letters to my french relatives that say: "I am ambiguously gay" instead of "I love my brothers" etc...
Movie News - "Entertainment news, bitch!"
i need something to translate my sloppy java to ultraoptimized C... and if it can convert the comments to Mandarin as well, all the better.
I don't think it's tenable under the Open Source paridigm. I'm sure there are other, similar examples. So, there's room for proprietary software, coexisting with free software and running on a free infrastructure. I'd just rather keep the proprietary stuff in the leaf nodes of the software "tree", where nothing else depends on it.
Bruce
Bruce Perens.
I don't know what's the case with Babelfish etc. but I know that at least one finnish ->english translator site has used it's logs to improve it's translations. Of course the changes have been made manually, but I see it as a good thing to see it translating something totally wrong, and after some it translates the same sentence correctly.
--
It has to work - rfc1925
You've just enumerated most of the options that are always open for any open source project. Obviously the best thing is to get involved, with code if possible, with the existing project and hope that the coordinator(s) are smart enough to recognize your contribution as valuable. If not, then you can fork or start from scratch, although at some later date the original project might choose to incorporate your changes anyway. This is precisely what happened with libc and glibc.
/. post.
Does anyone have an URL they can send that explains these issues in more detail? The question is just too broad to answer in a
I've studied compiler design, and I've wondered about how human languages compare to programming languages. I would think the biggest hurdle is interpreting ambiguous phrases like, 'fruit flies like a banana'. And all the implied words seem like typecasting, but are also ambigous. '(you/I/they) Come here, dammit'. But I wonder if the entire thing is more than just a really complex language description (in BNF or something) with a big database and a few enumerated phrases.
Now all of us German-impaired Slashdotters can
read the c't articles.
When I checked it during the week-end, it looked like GPLTrans computed the identity function in all directions. I mean, when you fed it a text x in English and told it to do English->French, it'd output the same text, without any translation.
And now their server looks like it's down...
Translate "Sorry, dude!" to French and then back to English -- you will get "Afflicted, standard!" (???!!!)
"Die young" ends up as "the young people of matrix" (!!!)
"Die hard" is "hard matrix", ",die hard" (with a comma) is translated correctly, "live fast, die hard" is again about some silly matrix.
Better yet, try several iterations of english->french->english->... until it settles down, then have your friends guess the initial phrase. "I carry out with biscuits except the function", anyone?
It's oh so easy to make fun of them.
I posted this a reply to a comment but then thought maby it should be its own thread.
The problime is that most if not all of
these systomes know nothing about meaning at all.
All that do is try to match one set of strings to
a difrent set of strings.
GPL Trans works by the substuation methoud.
>from: Mooneer Salem
>
> It is a system where words in a phrase that
> can be substituted are
> marked by %phrase%
> For example:
>
> English: My name is %phrase1%.
> Spanish: Me llamo %phrase1%.
>
This genreal systome can be extended in to a
phrase sturcture grammer with pares of rules for
each language. ex:
english: S -> NP1 V NP2
irish: S -> V NP1 NP2
these rules would modal sentences like:
english: the cat chased the dog.
irish: chased the cat the dog.
All this is oversimplifyed but you get the poin.
The real problime is that you need to be trained
as a linguist to understand what the structer of
many seantences are and even linguestes aruge a
LOT. The phrase structal aprouch is probly what
altavista a such do. All thoe I rilly like the
idea to GPL Trans I do not thik there aproch will
get them to far; but it will be fun to see what
thay can do.
Did you double check, triple check, email, call, write a letter to make sure it was ok to post this story? Don't want people getting upset now do we?
Sarcasm not only implied but required. Yeah off topic, tired of Hemos bashing.
In order to come creature who lives with voltages he on the Pepsi-voltages!
Woo-ee, babelfish is smoking crack tonight. It's starting to sound like a religious prophet. The Bible, by Babelfish, anyone?
---
pb Reply or e-mail rather than vaguely moderate.
pb Reply or e-mail; don't vaguely moderate.
While contextual knowledge can increase the qualitiy of a translation; the amount of world knowledge necessary to translate a typical web page is simply astounding. Most users of a translation system simply do not want to wait for hours to translate a simple sentence.
And, there is the problem of linguistic knowledge. Most web pages are not written in "proper" English, but in some Web-speak-lingo. This requires the system to be very robust.
The most successful use of MT in corporations today are situations where a very simple grammar and lexicon is used, and very little world knowledge ois required. For instance, the Xerox corporation has its own translation system that translates component manuals. The technical writers that write the original version of the manual are required to use very simple English only, without any ambiguities and with very simple constructions.
For translation software that has multiple language targets, i would have expected it to first resolve the meaning of the English sentence into an internal semantic representation before using it to emit Spanish or French.
This "internal semantic representation" is called an Interlingua. It has been used in various MT systems, with varied amounts of succes.
The most important advantage of an Interlingua-based MT system is that is does not require a translation engine for each language pair. For instance, if you create a system for English, French, Dutch and German texts, you only need to create four analysis engines:
- English -> interlingua
- French -> interlingua
- German -> interlingua
- Dutch -> interlingua
And four generation engines:- interlingua -> English
- interlingua -> French
- interlingua -> German
- interlingua -> Dutch
With a non-interlingua system (which is called a Transfer system), you'd have to create 3^2=9 engines:- French -> English
- French -> German
- French -> Dutch
etc..Clearly, it is easier to integrate new languages into a interlingua system than into a transfer system.
Hello,
Ive a masters degree in computational linguistics, and I predict this effort will totally fail. Research on automatic translation is about 40 years now and a lot of money has been spent.
However there is still no working solutions, as problems are still far too big. Id suggest everybody participating in discussion should read a good book on linguistics.
In Korea, all your base are Only For Old People
If you'd read the update text at the top of the page, you'd have realised that it says "French, German and Portuguese have been added, but they currently don't do anything"!
Anyone else worried by the fact that they ask for your POP3 password and sent it to their server?
I ran the English->Spanish translation on my homepage and, although I don't speak Spanish, it is quite clear that it sucked! Much development work to be done I think. A VERY good idea in principle though.
"What I look forward to is continued immaturity followed by death."
Now this is what I call a powerful demonstration of the quality of open source software: ;^)
English: "I am a small fish who wants to live in your ear."
German: "Ich bin a small fish who wants to live in your ear."
Astounding. I couldn't have done it better myself, and it was 6 years since I last took a German class... Wow. Also, I find this part of the Note at the bottom of each page particularily qualitative, too:
Note: this computer-automated translation is not guranteed. It'll screw up with some text. If it does in fact screw up, first make sure you spelt everything properely.
My note: I have mucho respect and understanding for alpha releases. It's just that I'm a nitpicking bastard, and this was quite funny.
main(O){10<putchar(4^--O?77-(15&5128 >>4*O):10)&&main(2+O);}
Since the provider appears to have pulled the page: Here's a mirror of the source (uuencoded to protect it from geocities)... http://www.geoci ties.com/SiliconValley/Foothills/7223/gpltrans.txt It's fairly uncomplex...
>>>> I want their control software to be OpenSource(tm)d, 'coz I won't trust Lockheed Martin.
I still agree with the original statement. Even control systems for military hardware could be open sourced. A system that guides a missle to it's target could be similar to what might guide some self driven transportation device of the future. The open source model might allow for reuse and faster developement.
You might think that there are dangers to giving Terrorist group X the software. I argue the materials and mechanical designs would still be secret and difficult to access. I don't think there is any greater risk then we already have today.
Language software like any and all monsterous and small software projects is perfect for the open source model. Since the marginal cost of copying software is zero to the writer there is no good reason for him to charge the second person who wants the software.
However, the closed model is the reason we have some of the technology we have today. Sometimes we are willing to share the cost of the first writing but no-one will do it for society. In the situation of the this software lots of people are willing to do it for society.
Anything _CAN_ be open sourced.
I mean, this is really a good thing and everything, but it is, after all, a web based translator, hey, everyone doesn't spend all their time in the net, maybe they must even pay for their online time instead of some monthly fee.
So the question is, when do we get a translator that works on your own machine, console or X, doesn't matter to me, as long as it doesn't require connecting to anywhere. Something like euroword etc. (but better, of course)
I think I get your point. If everybody writes in their own little language, a translator will not work!
I also suspect that your use of the English Language were a joke.
Idea: A syntax Checker for languages. Make an alias CE (Compile English) that pipes your text through unix spell. Then someone might actually use it !
>CE myslashdotposting.html > correctpost.txt
>----------------------
>--Compiling ----
>--FATAL ERROR - there is no such word "THIER"
>--ABORTED.
>
--
An AC who wants a syntax "compiler" for languages.
At around 13.00 GMT 29-Nov-99 I can only get 'Forbidden'.
Forbidden
You don't have permission to access / on this server.
Apache/1.3.9 Server at gpltrans.zzweb.com Port 80
Same result with the other link... This makes it a wee bit hard to check out the site. Are there any mirrors out there?
I stumbled across this site a couple of days ago. I was going to submit a story about it, but I decided I'd better try it first. So I typed in one sentence (from a news story about the MS FoF) in English and asked to have it translated into French. It returned the same sentence with one word translated into French. I don't think that they are ready for prime time.
Dear oh dear, what is this "score 5, insightful" nonsense? How come any old "Open Source is rilly cool" comment gets moderated up, regardless of the evidence. Slashdot is beginning to resemble some wacky fundamentalist cult. The only way something as complex as natural language translation could become Open Source is if an academic institution just gave away their source. The last time I checked about a year ago, the only decent software out there was either commercial or it was released by universities as binary only. Suddenly here's a story about an Open Source translator. So you go check on google to learn more about the history of gpltrans. No hits. Same story on DejaNews. A large-scale Open Source development that nobody's ever talked about before? Yeah right.
Who else thinks he'll get a 5 even though this post was completely unrelated to what was being discussed?
Hey Bruce, you have technocrat.net. Keep the mindless ranting there.
(Yes, this was trolling. But I couldn't resist.)
Alas, the website has been /.'ed, so I can't look at the translator, but there are some serious questions to ask.
1 - testing: They claim to be the most accurate of the web-based translators. Based on what corpus and measured in what way? This isn't a trivial question, there are no benchmarks for translation programmes.
2 - parsing. If this program uses American style phrase grammar, it will inevitably break down. Phrase grammar is counterintuitive and for AI purposes pretty unproductive. It is computationally simple - see Charniak's last book for good parsing algorithms - but almost certainly isn't the way humans process language.
All of the most successful natural language translation systems are, in one way or another, dependency grammar based. Dependency based systems are also generally more portable to other languages.
3 - morphology. English is very morphology poor. If morphology is only minimally accounted for (as a lot of poorly thought out, English based NLP systems are), I don't see how it can hope to work in Russian, or Turkish or dozens of other major languages with rich morphology. Furthermore, what kinds of morphological rules can it accept? There are languages that use prefix, postfix and infix morphology. The kinds of simple rules that can account for English will not go vert far with other languages.
I haven't seen this program, and I don't know how seriously these issues have been considered, but they are the kinds of things to keep in mind when looking at machine translation programs.
Read the FAQ for reasons why slashdot doesn't do this...
GB stands for "Government and Binding" theory; it is the outgrowth of Noam CHomsky's model of Universal Grammar from the beginning of the 80's, and possibly the theory on which most theoretical syntax has been done.
GPSG stands for "Generalized Phrase Structure Grammar"; it was developed in the late 70's, initially by Gerald Gazdar. Basically, it is an enhanced form of context-free grammar, that is more suitable for description of natural language syntax.
HPSG was derived from GPSG in the mid-80's at CSLI in Stanford, by Pollard and Sag. It incorporates ideas from other theories of syntax like LFG and GB. HPSG, in comparison to GB, is concerned with making its grammars as useful as possible for computational linguistics. Therefore, many HPSG researchers work in projects like LinGO, trying to apply HPSG to computational projects.
LFG, which I mention above, is another theory of syntax (if you have guessed by now that theoretical linguists are an unagreeing bunch, add 100 points to your total). It is also used in computational projects, like the Xerox NLTT.
I hope people find this info useful.
---
---
does any one else have visions of the IBM tv ad about the guy in the support group that says they had this great idea... they got all kinds of publiclity (substitutite /. for the superbowl comercial) they were going to be huge... but they forgot to warn the web guys... and the site crashed.
bump-ba-dee-dumm-dup
"that was stupid dave...."
that old lady at the end just cracks me up....
Bruce
Bruce Perens.
Some respondents have pointed out the difficulty in making translations contextually sensible ... whether 'run' should be translated as 'execute,' rather than 'quick bipedal motion.'
;) -- based on your own self-declaration, perhaps followed by a quiz to establish competency.)
I don't see an easy way to get out of this -- the needed 'world knowledge' that people have pointed out as necessary for this really is huge.
But (and this is why I mention slashdot's metamoderation), there is a certain amount of brute-forcing which could serve as a useful basis for creating improved context interpretation. For instance, let's say you visit this translation engine and choose some text for it to translate ("Mein Hund ist in dein Aktentasche," say). At the same time, there might be a few selections of recent translations requested by others, and the resultant translations, which could be shown to you based on the languages you know. (Not telepathically
The resultant translations could be joined with alternate tranlations / permutations, and each reader could (say), rank-order them, or choose the best one, as far as they can determine by context, etc.
And hopefully, the program can then be taught (wrong word, but I'm being figurative)that (anthropomorphically), something like "OK, if there are several computer-related terms in the translated text, like megabyte and power-supply, 'run' is likely to mean 'execute.' If 'run' however appears in a context which does not indicate computer use, and / or directly before the paired words 'away from,' it should probably be the bipedal-movement one. And if it's in front of a business-type name, like 'bank,' 'lemonade stand' or 'brothel,' then it is likely to mean 'manage' or 'administer."
In my (interested but ignorant layman's) understanding of AI translators, this is the kind of discrimination that they try to make, nothing out of the ordinary. But, because words can fit into so many categories, I think this sort of gradual, piecemiel accumulation holds hope of making it work better over the long haul. It would take too many linguists to account for all the wacky ways that words get used.
Just thoughts,
timothy
jrnl: http://tinyurl.com/c2l8yr / foes: http://tinyurl.com/ckjno5
"The last time I checked about a year ago,
the only decent software out there was
either commercial or it was released by
universities as binary only."
So check back more often, 'K?
There is a third interpretation, in which this is a noun phrase. You know, that kind of "rice fly" which is "like sand".
Of course, one can make some even stranger sentences, like All black english literature professors know some rice flies like most sand. Hell, this one must be ambiguous in well over a hundred ways :-).
---
I'll take a stab at your puzzle: "I toss my cookies down the toilet." Just a guess, highly dependent on humorous context. ;)
Vovida, OS VoIP
Beer recipe: free! #Source
Cold pints: $2 #Product
Maybe the translator could consult a search engine ... count the hits for each attempted translation (e.g. "execute away from" should generate much fewer hits than "run away from") and base its translation on these counts (so choose "run away from"), i.e. use the internet as your "world knowledge" database? Just a silly idea ...
Yeah, I think this is also a good idea. The problem with it is that search engines themselves can only supply answers based on statistics, not judgement. It would be useful to do a search engine search like you say, but the translator engine would have to have a good idea of what size chunks to divide the original text into.
Anyhow, no conflict here -- I think translation engines are going to have to use a number of strategies on every input text and see which ones make the most sense in the end, then applying the information that for text-chunk X, translation X-prime (or whichever) was the best translation. That way when phrasings similar / identical to ones in text-chunk X appear again, there is at least a reference to check against.
timothy
jrnl: http://tinyurl.com/c2l8yr / foes: http://tinyurl.com/ckjno5
The key to making progress with any natural language processing system is lots of quality, annotated data. My M.A. long paper project involved adapting a natural language parser to identify errors made by Japanese language students. The hardest, most time consuming part was getting examples of errors that real students made and then getting a Japanese teacher to diagnose the errors. For another project, I wrote a program that automatically deduced rules for identifying proper names, places, times etc. from sentences in which these entities were already tagged.
There are lots of ways to do statistical analyses that result in better NLP systems, but the key is having lots and lots of quality data. For developing translation systems, having lots of translated sentence pairs done by a good human translator is almost crucial.
Bruce Perens just pointed out that gpltrans is a toy system at this point; an engine plus a small vocabulary. Developing the lexicon (words + definitions) and grammars will probably be the part of this project that will require the most effort. Kind of like all of the device drivers needed to make Linux a really useful system.
Does anyone know if there are free (speech) annotated corpii/lexicons/grammars/translation pairs out there that could be used in this and other NLP projects? Does anyone want to contribute some?
And does anyone know when the site is coming back up (or a mirror)? I'm dying to have a look at the source!
-jimbo
"Hold me Bob!" "I would if I could man!" -Larry and Bob in VeggieTales
http://gpltrans.grmbl.com/ (should be up, but the database is still messed up)
http://gpltrans.sourceforge.net/ (will be up by tomorrow)
Sorry for the inconvience. And thanks William X Walsh for forwarding those mirror requests.
US businesses that currently accept chip and PIN/signature
"But, if you say something more obvious like "Molten Lead is cool" it's pretty easy to assume which version of cool you mean."
/not/ antonymous in the context. For example, molten describes the noun. Molten is probably also partially synonymous with "hot". Since "hot" is the antonym of "cool", in the temperature sense, then one would not use "froid" to describe it in French, but instead the appropriate term for "cool" ("cool" itself I guess), which would not be antonymous with "hot".
Couldn't one also use antonyms in this case. I.e. a word/phrase can be a replacement, if it is synonymous, and
It's 10 PM. Do you know if you're un-American?
Who's going to sponsor this project (like Microsoft and TerraServer)? It seems to me like a major server should be set up, but that it would need to be big, close to the backbone, and quick because it would get a lot of translation work (if it did well). As a fan of Artificial Linguistics, I have to ask... Should the "main" server support adding new languages? What about artificial languages like Esperanto, Lojban, Klingon, or even languages that are less well known? Should a Language Suggestion function be present? Or even a language addition utility?
WorldMaker