eDigital MXP100 with Voice Control

← Back to Stories (view on slashdot.org)

eDigital MXP100 with Voice Control

Posted by michael on Sunday February 10, 2002 @08:46AM from the speak-your-mind dept.

An anonymous reader writes: "Here is a lengthy review of eDigital's 1GB flash MP3 portable that is as much a review on Lucent's remarkable speech recognition technology VoiceNav as it is on the player. VoiceNav offers speaker-independent recognition, meaning it doesn't have to learn each individual user's particular speech patterns like IBM's ViaVoice. Just say the name of a music track into the player's microphone and VoiceNav pulls up and plays that song. In ideal conditions the reviewer was able to twice run through a list of 14 song titles without fail. This included titles with "non-real word" band names like Sum41 and U2. Neat technology that could make its way into PDAs soon. The player is a pretty good one too, using IBM's Microdrive for storage."

8 of 150 comments (clear)

Min score:

Reason:

Sort:

Re:Filters by d5w · 2002-02-10 09:06 · Score: 3, Informative

Does the voice recognition filter itself out? When U2 sings "one" I don't necessarily want it switching to Aimee Mann's "one" and vice versa

From the review:
Navigation using VoiceNav only operates when a song is not playing (manual controls will allow navigation when a tune is pumping), therefore there is no "Stop" or "Pause" command.
So they punted on that problem.
On another front, tt looks like "one" isn't likely to produce useful responses from the speech recognition in any case. The only times the reviewer seems to have gotten acceptable recognition of track names were when saying the entire artist and title.
Even worse than cell phone by Ezubaric · 2002-02-10 09:13 · Score: 4, Funny

When I can't get voice rec to work, I usually end up speaking louder because the frustration is just too much. It's bad enough listening to people yapping down the street or in stores with those little embedded mikes and earphones. Can you imagine hordes of people walking down the street screaming:

"Uncle Fucker"
"Baby Got Back"
"Cocaine"
"Cocacabana"

The last is probably worst of all. We know Barry exists, but it's horrible to be reminded that people actually listen to him.

--

----------
I am an expert in electricity. My father held the chair of applied electricity at the state prision.
1. Re:Even worse than cell phone by Bilestoad · 2002-02-10 09:20 · Score: 3, Funny
  
  Even better - imagine being able to sneak up on people with one of these, and saying
  
  "Kenny G"
Re:Voice Recognition by d5w · 2002-02-10 09:15 · Score: 4, Insightful

Voice recognition on computers has been around for a while now with products like Dragon, Via Voice, etc. All of these programs are clunky, somewhat bloated, and need to be trained to individual speakers. A truly speaker-independent voice recognition system could be just what the doctor ordered for Lucent.
This kind of thing comes up every time speech recognition is mentioned here, and it's largely missing the point. Desktop speech recognition, as handled by Dragon NaturallySpeaking, is a very different problem from simple commands and list selection, and it has very different solutions. If you have to recognize and transcribe arbitrary sentences in a given language you have to handle a much larger search space in basically every dimension -- so much larger that the optimal search techniques can be very different, and (as in your comment) the resources required to implement those techniques will be incomparable.
I won't say the problems are fundamentally different, because the fundamentals are much the same between the two domains; but nearly every detail of the implementation of those fundamentals is likely to be different.
Just to clear things up by Anonymous Coward · 2002-02-10 09:16 · Score: 3, Insightful

IBM's voice recognition line extends past ViaVoice. We offer several products, including an embedded product, that do not require any training. Only the highest end dictation product requires training because of the demands on it to understand what you just said from tens of thousands of words. If all you can say is a hundred or so phrases like "play", "stop", "rewind", "livin' la vida loca", etc. then it's a lot easier to make a prediction and training is a waste of time. At that point it's just a matter of microphone quality and filtering out the background noise. We can even do untrained natural language voice recognition in situations like this with the proper processor power. Since we know what you're by and large going to say, we can pick out enough from the whole free-form sentence to get the gist of what you meant without any training.

And believe me we're getting to the point where training isn't needed for dictation either :)
Playlist with 14 entries isn't enough by hovik · 2002-02-10 09:20 · Score: 5, Informative

In ideal conditions the reviewer was able to twice run through a list of 14 song titles without fail.

This doesn't mean much. To pick the correct one between only 14 possible is quite easy. The reviewer should rather have tried with a playlist with more than 3000 entres. The error rate will grow exponatially with the number of songs, because statisically more song will be phoneticly more equal, the more you add. (bad way to say it, but you prob get the point)
Hype Company by Anonymous Coward · 2002-02-10 09:40 · Score: 4, Informative

edigital has a long history of using hype and grossly misleading tactics to, IMO, defraud investors. So far they've lost tens of millions of dollars, and recently had to resort to taking a loan at a 49% interest rate just to stay in business. Even the CEO has referred to the investors as a "cult".

As for their history with their products, their much-hyped Treo barely sold any units in stores, and is now being sold by liquidators on ebay. A lot of customers were a bit pissed that their players didn't come with any storage media!

This wasn't intended as flamebait, but E.digital has a long history of using hype and misleading tactics to pursue little more than an incursion of investment money from gullible public investors. I didn't lose any money to them, but a lot of people did, and will continue to.

In fact, they recently registered 20 million more shares so they can stay in business a while longer. They really don't deserve this kind of attention from Slashdot.

For those considering investing in them, I'd say stay away. For those considering a product purchase, I'd recommend the same.
I was wondering when they'd come out with this by flacco · 2002-02-10 09:54 · Score: 3, Funny

...only I pictured it with the ability to retrieve a song by just singing a bit of it or speaking some lyrics.

--
pr0n - keeping monitor glass spotless since 1981.