PDA Speech Translator
jlowery writes "Not quite as good as a babelfish, but a PDA that does translation is probably better than resorting to hand gestures alone. I could see this as a boon to the tourist who travels to places where English speakers are uncommon."
The problem with every software that I have used that tries to decipher human language (like Zork or the game included with emacs for X) is that you have to know what words the software understands and in what context.
I have seen the same problems with automated phone systems that are supposed to recognize a generic voice and I can see the same thing happening here.
The main difference here though, is that when entering text, you know exactly what you input before pressing enter. With voice recognition software, how do you know that the software "hears" exactly what you say? If you say somethign like "What are my appointments for the thirteenth?" and it hears, "What are my appointments for the thirtieth?" you would be receiving the wrong information.
I hope this is a success but I don't have my hopes up.
--
7329756
The linux hacker
"All your base are belong to us!"
Spoken like someone who has never taken a foreign language class. Suppose that thing is going to get the accent right? Emphasis on the right syllable? Not likely, mostly good for translating some text message into the PDA holder's tongue (and doing an Engrish job of it anyway.)
A feeling of having made the same mistake before: Deja Foobar
I thought that you only had to speak English slowly and loudly enough for anyone to understand. Silly me!
According to the article, it only works for medical terms so far, and is only 80% accurate. I don't know about the rest of you, but I don't think I'd want to trust any of my medical treatment to such a translation!
Doctor: "Well, we thought he said pennicillin, not omoxycillin! I'm afraid the infection has run amok!"
Yeah, I could really use one of these when I go from Fort Lauderdale to Miami...
How come Slashdot never gets Slashdotted?
"It also works only when the speakers are talking about medical information, and it's only about 80 percent accurate in the lab."
Forgive my immediate misgivings, and you can call me chicken if you want, but I'm really not that keen on walking into a hospital and asking to have a medical procedure done with a 1 in 5 chance that instead of removing my appendix, they might remove my "appendage"...
"It is the prerogative of fools (or noobs) to utter truths that no one else will speak."
...Stephen Hawking in Arabic.
Technology is at a point where all the software has been written to create a translator where a person speaks into a microphone which then is translated into text which is then translated into a different language which is then played back verbally in the same persons voice in a different language. The problem is that this cannot be done in realtime. 4 years ago I worked on a project for At&t to create an application that would train a users voice, break down thier voice patterns and be able to rearange those patterns to create other sounds which sound like they are coming from that real person. The problem is that with current processors the time to train and process is about 10 hours. So we can do voice recognition in realtime, we can translate text words in realtime, and in 10 hours we can reproduce a persons voice nearly flawlessly. Think of the possiblities!
There is or can be built a machine that can simulate any physical object. -Church-Turing principle
I realize that this software is supposed to be somewhat more powerful, but what I am saying is that even limited translation programs are useful for tourists.
As speech recognition technology gets better, and as handheld computers get more powerful, audio translators are becoming a more practical proposition.
Researchers from Carnegie Mellon University, Cepstral, LLC, Multimodal Technologies Inc. and Mobile Technologies Inc. have put together a two-way speech-to-speech system that translates medical information from Arabic to English and English to Arabic and runs on an iPaq handheld computer.
The prototype falls short of Star Trek's fictional universal translator in several ways. The system is not transparent -- it must be switched between Arabic-to-English and English-to-Arabic modes. It also works only when the speakers are talking about medical information, and it's only about 80 percent accurate in the lab.
The device shows that it's becoming possible, however, to provide automatic translation using a portable device. "It's good enough to make yourself understood," said Alex Waibel, a professor of computer science at Carnegie Mellon University and a founder of Mobile Technologies Inc.
The effort is one of a series of projects aimed at providing the armed forces with automatic translation for medical and force protection situations and making automatic translation in a wider set of subject areas available for tourists during the 2008 Olympics in Beijing, said Waibel.
The Speechalator prototype uses a built-in microphone and a language-selection button. "You push on the button on the iPaq and speak a sentence and then the translation comes out... in the other language," said Waibel. "You can switch it into the opposite mode when the other person answers and it translates back into your own language."
The software consists of three components: a speech recognizer, a translator, and a speech synthesis engine. "Each one of these components have slight twists to them... in order to work properly for speech translation," said Waibel.
The researchers modified the speech recognition engine to optimize it for handling spontaneous speech.
The translation system has the biggest twist. It extracts the key meaning from the input sentence and translates it to an interlingual, or intermediate representation, and the process depends on the speech being contained in a certain domain, or context, like medical information. "It's just certain nuggets in the phrase that... you need to extract," said Waibel.
The process is akin to constructing a medical-context template that fits the key information, then filling in the template, said Waibel. This process makes it possible for the system to handle spontaneous speech. "We go fishing for the nuggets," he said. But it is also a limitation -- the system must know what domain a speaker is talking about.
The researchers are working on a system that can handle multiple contexts and automatically switch between them, said Waibel. "It can, for example, recognize 'now you're in the hotel reservation domain', or 'now you're in the conference registration mode', or 'now you're talking about medical problem'," he said.
To come up with templates that handle different domains, the researchers collect a lot of data from people talking in those domains, said Waibel. "The more data we collect the better coverage of all the possible ways you could be saying [these things] becomes," he said.
The difficult part was fitting the software required to do two-way translation in the 64 megabytes of memory contained in the handheld computer, said Waibel. "You need two recognizers, two synthesizers and two translators to make [it] happen in both directions," he said.
The prototype also has a camera attachment that translates text like that on street signs, said Waibel. Snap a picture of a sign with the camera and it automatically extracts the text region, puts the text through a character recognition program, then translates it, he said. "What you then see on the screen is the picture of the scene with a sign and then underneath an English subtitle," he said.
"Are you speaking the english?"
"I speak to the English, it's the Americans I won't talk to..."
-Adam
First can we have a PDA that does decent text-to-speech or speech-to-text, preferably both.
A hardware babelfish will revolutionise human communication later this century, but right now you need both of the above before you can begin to contemplate speech-to-speech. I can't imagine any serious algorithm at this time would attempt direct translation, without an intermediate text translation phase.
Bit OT: Considering the interest in E-Books, I don't know why music players and PDAs force users to download wave forms when we could just download text and convert using a cheap text-to-speech synth.
Outstanding. This thing will finally make the common Ugly American practice of yelling actually useful:
*hold PDA to face* Ahem! "WHERE IS THE BATHROOM?!" *hold PDA to foreigner's ear*
RW
Text on screen: In 2004, the World Trade Center lay in ruins, and foreign nationalists frequented the streets - many of them Arabs (not the streets - the foreign nationals). Anyway, many of these Arabs went into tobacconist's shops to buy cigarettes....
A Arab tourist approaches the shopclerk. The tourist is talking haltingly into a PDA.
Arab: I will not buy this record, it is scratched.
Clerk: Sorry?
Arab: I will not buy this record, it is scratched.
Clerk: Uh, no, no, no. This is a tobacconist's.
Arab: Ah! I will not buy this *tobacconist's*, it is scratched.
Clerk: No, no, no, no. Tobacco...um...cigarettes (holds up a pack).
Arab: Ya! See-gar-ets! Ya! Uh...My hovercraft is full of eels.
Clerk: Sorry?
Arab: My hovercraft (pantomimes puffing a cigarette)...is full of eels (pretends to strike a match).
Clerk: Ahh, matches!
Arab: Ya! Ya! Ya! Ya! Do you waaaaant...do you waaaaaant...to come back to my place, bouncy bouncy?
Clerk: Here, I don't think you're using that thing right.
Arab: You great poof.
Clerk: That'll be six and six, please.
Arab: If I said you had a beautiful body, would you hold it against me? I...I am no longer infected.
Clerk: Uh, may I, uh...(takes PDA, talks to it)...Costs six and six...ah, here we are. (speaks weird Arabic-sounding words)
Arab punches the clerk.
Meanwhile, a cop on a quiet street cups his ear as if hearing a cry of distress. He sprints for many blocks and finally enters the tobacconist's.
Cop: What's up
Arab: Ah. You have beautiful thighs.
Cop: (looks down at himself) WHAT?!?
Clerk: He hit me!
Arab: Drop your panties, Sir William; I cannot wait 'til lunchtime. (points at clerk)
Cop: RIGHT!!! (drags Arab away by the arm)
Arab: (indignantly) My nipples explode with delight!
I believe PDAs are going to be tremendously transformed over the next few years.
1. Convergence is going to happen with a vengance. The Treo 600 is just the start. More and more apps will make it to the PDA. Speech recognition is one, and that sets up for another dybamic...
PDAs don't really need screens and keyboards if you can talk to them and they can talk to you. If they don't need those components, they can get a whole lot smaller. The next generation PDAs will be like a hearing aid, and the ones after that will be built into your glasses or an implant. That means less power, so less battery. Besides, it will be able to run on your body heat if not tap into your own body's electrical system, so it won't need a battery. Every improvemnt along these lines dwindles the size even more. A heads-up display, made transparent or opaque, ought to handle those times when you need to really observe rather than consult.
A combination of AI and connectivity will mean your PDA is your first line of defense in many of life's situations. Get pulled over by a cop and it will tell you what to do, what NOT to do, and contact your lawyer. Need a cop and it will call them and know just how long it's going to take to get there.
Medicine: It will have a complete medical history of you, remind you to take your meds, and monitor your blood pressure and other vita signs. If you have a heart attack it will call 911 with your location and be the first thing the medics consult when they get to you.
Personality: You'll be able to choose its level of humor and sarcasm. Although clearly a machine, people will develop meaningful relationships with them, at least they'll think so.
Connectivity: Everything you can think of, including your own house, which you'll call up to turn the heat up since you're coming home early. All teh Wi-Fi/cell connectivity you want will be built in.
Finances: It will know everything you do and provide access to your dough. If you get overdrawn it will be intentional because it will have real time access. It will have all the ATM/debit/credit stuff all on-hand. It will also be able to shop for you and tell you where the best deal is.
It will know all your friends and business associates and help remind you, "This is Joe. He's a Cougar. He knows you're a Husky, but don't rub it in. His kid just joined the Navy. He thinks LOTR sucks, and Rush is Right, so be careful. He drinks Guiness. His budget is 250K and he's looking to upgrade the Ciscos."
You'd never think of leaving home without this. Indeed, since it very well may be built-in, you won't have to worry about it. Just keep up the subscription.
'
How about a moderation of -1 pedantic.
my experience with voice recognition (yes even your beloved Via-Voice) is that it blows and will for some time. We probably need better speech recognition before we get speech to speech.
Well. I have been to quite a few places where English was not exactly lingua franca. In most of these places semi-right pronounciation of foreign words would not have had a big impact. Hand gestures and my favourite dictionary (which contains pictures of just about anything one would ever need 'on the road') have always been sufficient to find a hotel, a train or bus ticket out and some food. For the latter: Just walking into a restaurant's kitchen and pointing at the visible ingredients (dead or alive ;) ) suffices, and can generate a lot of fun in the process :)
Silly foreigner, don't you know everyone speaks English?
Business \Busi"ness\, n.;
A scam in which all people involved perceive as beneficial...
Like Miami???
So all you need is a mobile phone. You phone up the number for the language you need translated to, tell the translator what you want to say and hand the phone over to the person you want to talk to. Quite expensive per minute, but cheaper than a PDA and very very handy in an emergency.
Course, you could learn another language, it isn't remotely as difficult as school makes it out to be. English is one of the more difficult languages to learn. If you learn, one of Italian, French, Spanish, Portugese you should be able to pick the others up fairly quickly. English is based on a Germanic language with a lot of the French and Roman influences chucked in on top, it's a real mishmash.
Government of the people, by corporate executives, for corporate profits.
How about the opposite sex? Parents? Now those would be Nobel-prize-worthy accomplishments.
"This is not a sig." -- R.