The key to making progress with any natural language processing system is lots of quality, annotated data. My M.A. long paper project involved adapting a natural language parser to identify errors made by Japanese language students. The hardest, most time consuming part was getting examples of errors that real students made and then getting a Japanese teacher to diagnose the errors. For another project, I wrote a program that automatically deduced rules for identifying proper names, places, times etc. from sentences in which these entities were already tagged.
There are lots of ways to do statistical analyses that result in better NLP systems, but the key is having lots and lots of quality data. For developing translation systems, having lots of translated sentence pairs done by a good human translator is almost crucial.
Bruce Perens just pointed out that gpltrans is a toy system at this point; an engine plus a small vocabulary. Developing the lexicon (words + definitions) and grammars will probably be the part of this project that will require the most effort. Kind of like all of the device drivers needed to make Linux a really useful system.
Does anyone know if there are free (speech) annotated corpii/lexicons/grammars/translation pairs out there that could be used in this and other NLP projects? Does anyone want to contribute some?
And does anyone know when the site is coming back up (or a mirror)? I'm dying to have a look at the source!
Basically every group has some history of it, so don't attribute innocence of your group (or apply a martyr attitude) by ignoring past crimes against personal beliefs.
I didn't. In fact, my point was the same as yours: every group is unfairly persecuted at one time or another, some more than others.
If Dell decides to ship some of these machines with Windows, and some without, the ones using Linux/BSD/BeOS or whatever they use will undoubtably be cheaper. If I were a clue-free end user, that would imply to me that they were inferior OS's, since the hardware was identical.
Most probably won't know what OS the cheaper box runs on at all. (Ask someone with WebTV what OS it runs. I bet most would give you a blank stare.)
Of course, that only applies to AMERICAN Christians of the past ~two hundred years. Christians have faced a lot of persecution throughout history and in many (most?) parts of the world today. We American Christians are just extremely fortunate to be living here.
The key to making progress with any natural language processing system is lots of quality, annotated data. My M.A. long paper project involved adapting a natural language parser to identify errors made by Japanese language students. The hardest, most time consuming part was getting examples of errors that real students made and then getting a Japanese teacher to diagnose the errors. For another project, I wrote a program that automatically deduced rules for identifying proper names, places, times etc. from sentences in which these entities were already tagged.
There are lots of ways to do statistical analyses that result in better NLP systems, but the key is having lots and lots of quality data. For developing translation systems, having lots of translated sentence pairs done by a good human translator is almost crucial.
Bruce Perens just pointed out that gpltrans is a toy system at this point; an engine plus a small vocabulary. Developing the lexicon (words + definitions) and grammars will probably be the part of this project that will require the most effort. Kind of like all of the device drivers needed to make Linux a really useful system.
Does anyone know if there are free (speech) annotated corpii/lexicons/grammars/translation pairs out there that could be used in this and other NLP projects? Does anyone want to contribute some?
And does anyone know when the site is coming back up (or a mirror)? I'm dying to have a look at the source!
-jimbo
Basically every group has some history of it, so don't attribute innocence of your group (or apply a martyr attitude) by ignoring past crimes against personal beliefs.
I didn't. In fact, my point was the same as yours: every group is unfairly persecuted at one time or another, some more than others.
-jimbo
If Dell decides to ship some of these machines with Windows, and some without, the ones using Linux/BSD/BeOS or whatever they use will undoubtably be cheaper. If I were a clue-free end user, that would imply to me that they were inferior OS's, since the hardware was identical.
Most probably won't know what OS the cheaper box runs on at all. (Ask someone with WebTV what OS it runs. I bet most would give you a blank stare.)
-jimbo
Of course, that only applies to AMERICAN Christians of the past ~two hundred years. Christians have faced a lot of persecution throughout history and in many (most?) parts of the world today. We American Christians are just extremely fortunate to be living here.
-jimbo