eDigital MXP100 with Voice Control
An anonymous reader writes: "Here is a lengthy review of eDigital's 1GB flash MP3 portable that is as much a review on Lucent's remarkable speech recognition technology VoiceNav as it is on the player. VoiceNav offers speaker-independent recognition, meaning it doesn't have to learn each individual user's particular speech patterns like IBM's ViaVoice. Just say the name of a music track into the player's microphone and VoiceNav pulls up and plays that song. In ideal conditions the reviewer was able to twice run through a list of 14 song titles without fail. This included titles with "non-real word" band names like Sum41 and U2. Neat technology that could make its way into PDAs soon. The player is a pretty good one too, using IBM's Microdrive for storage."
Have you seen any hardware player of Ogg Vorbis format?
~shiny
WILL HACK FOR $$$
I think I'm feeding the trolls on this one, but I can't understand why you think a company would spend money on adding support for that format unless it would be a selling point. I grant that mp3 is worse than ogg, but can you honestly say that ogg is big enough in the "real world" for a company to go to the trouble of supporting it? The vast majority of my linux using friends still use mp3, and you can bet almost no one in the windows world uses ogg.
Slashdot 's editors are dickheads
wonder how well it would work on, say, the side of a highway. if it worked well this would be a nice little toy for those of us who run (or bike) around.
Kith Kaddith Lizard Man Extraordinaire
I guess I have too many obscure mp3s, but how can the voice control differentiate:
Daydream Boat.mp3
Day Dreamboat.mp3
Alpha Betray.mp3
Alphabet Ray.mp3
Mont Anagram.mp3
Montana Gram.mp3
...
From the review:
So they punted on that problem.On another front, tt looks like "one" isn't likely to produce useful responses from the speech recognition in any case. The only times the reviewer seems to have gotten acceptable recognition of track names were when saying the entire artist and title.
. . . otherwise, there'll be a special broadcast on radio, cable, and embedded in trojan MP3s one day. It'll be Jack Valenti's voice saying "Don't play non-SDMI compliant content anymore." :).
One CPU cycle wasted on digital restrictions management is ONE TOO MANY.
You can get a domain-independant NL parser from www.sil.org, the PC PATR II parser. You may have to write a few grammar rules...
She sat at the window watching the evening invade the avenue.
When I can't get voice rec to work, I usually end up speaking louder because the frustration is just too much. It's bad enough listening to people yapping down the street or in stores with those little embedded mikes and earphones. Can you imagine hordes of people walking down the street screaming:
"Uncle Fucker"
"Baby Got Back"
"Cocaine"
"Cocacabana"
The last is probably worst of all. We know Barry exists, but it's horrible to be reminded that people actually listen to him.
----------
I am an expert in electricity. My father held the chair of applied electricity at the state prision.
I won't say the problems are fundamentally different, because the fundamentals are much the same between the two domains; but nearly every detail of the implementation of those fundamentals is likely to be different.
IBM's voice recognition line extends past ViaVoice. We offer several products, including an embedded product, that do not require any training. Only the highest end dictation product requires training because of the demands on it to understand what you just said from tens of thousands of words. If all you can say is a hundred or so phrases like "play", "stop", "rewind", "livin' la vida loca", etc. then it's a lot easier to make a prediction and training is a waste of time. At that point it's just a matter of microphone quality and filtering out the background noise. We can even do untrained natural language voice recognition in situations like this with the proper processor power. Since we know what you're by and large going to say, we can pick out enough from the whole free-form sentence to get the gist of what you meant without any training.
:)
And believe me we're getting to the point where training isn't needed for dictation either
This doesn't mean much. To pick the correct one between only 14 possible is quite easy. The reviewer should rather have tried with a playlist with more than 3000 entres. The error rate will grow exponatially with the number of songs, because statisically more song will be phoneticly more equal, the more you add. (bad way to say it, but you prob get the point)
Ogg is just the name of the, uh, 'group' doing the work. The actual audio format is called Ogg Vorbis, in contrast with Ogg Tarken, their proposed video codec.
:P
So your sylable count is really incorrect
autopr0n is like, down and stuff.
For me, the biggest attraction of MP3 players is the ability to have no moving parts. This makes it truly portable and useful in more situations that what we had previously. So, my question is, how reliable is this IBM microdrive? How robust is it? If I'm training for to run a marathon, is it going to survive all of the pounding?
Hrm, the thing dosn't look quite as cool as the ipod. Not that I don't hate apple or anything, but there don't seem to be a lot of players out there that have both a high capacity and the esthetic styling approaching or surpassing the iPod. There are some cool looking mp3 players, and there are some that are better technically then the iPod. But unfortunately, they don't seem to be in the same group. (of course, given the price you could just get a real PDA that can play mp3s for a bout $100 more...)
Personally, I doubt the voice nav in the current system is really that great, especially since you have to manually stop the music in order to use it. Of course with 200 or so songs it might come in handy (if it scales that well).
autopr0n is like, down and stuff.
edigital has a long history of using hype and grossly misleading tactics to, IMO, defraud investors. So far they've lost tens of millions of dollars, and recently had to resort to taking a loan at a 49% interest rate just to stay in business. Even the CEO has referred to the investors as a "cult".
As for their history with their products, their much-hyped Treo barely sold any units in stores, and is now being sold by liquidators on ebay. A lot of customers were a bit pissed that their players didn't come with any storage media!
This wasn't intended as flamebait, but E.digital has a long history of using hype and misleading tactics to pursue little more than an incursion of investment money from gullible public investors. I didn't lose any money to them, but a lot of people did, and will continue to.
In fact, they recently registered 20 million more shares so they can stay in business a while longer. They really don't deserve this kind of attention from Slashdot.
For those considering investing in them, I'd say stay away. For those considering a product purchase, I'd recommend the same.
I have a lot of friends who have sprint phones with voice nav. They all used it for the first week because it was "cool" but after awhile, they went back to traditional methods. Another example is my father; he got the 02 Infinity Q45 which has loads of tech toys built in. The voice nav is really cool but it's not nearly as fast a clicking a button.
...only I pictured it with the ability to retrieve a song by just singing a bit of it or speaking some lyrics.
pr0n - keeping monitor glass spotless since 1981.
You're telling me that you read Slashdot and you've never heard of the iPod?!?
"Reality is just a convenient measure of complexity" -Alvy Ray Smith
If this thing ran off CDs and supported ogg vorbis I would buy this in an instant. As it is i'm forced to drool over the spiffy voice recognition and keep waiting...
It's tempting, but I won't go for it. I'm too much of a They Might Be Giants fan. I can see it now, sitting there in a public area with some weird looking device in my hand:
"PUT YOUR HAND INSIDE THE PUPPET HEAD!"
"...NO!" Someone speaks to me "Are you OK?"
"Yeah Yeah," Yeh Yeh starts playing. "Ahh!"
"DIG MY GRAVE"
"Sir, are you sure you're alright?" [stopping]
"Yeah, fine." suddenly person A asks person B for a light. "I've got a match."
The thing starts playing agian. Just then a Dirt Bike wizzes by and someone says "Man, that's a fast Dirt Bike." Guess what song starts playing. Then I stop it so I can play "I AM A HUMAN HEAD!" again getting more stares.
Then what if I want to hear Chuck Berry? "MY DINGALING" *SMACK*
No, for me this is nothing but trouble...
--Josh
There are exactly 42,935,718 letter sized sheets in a square mile.
you might be interested in the fact that this has already been done
The only reason we haven't seen OGG Vorbis support on solid state players is that they would only lose money by doing so, at least for now. This is coming from someone who encodes all of his own CD's as .ogg's.
.ogg support required only a few days of extra development time, you'd see it.
.mp3 and .ogg.
Alas, I wish there were some incentive for player manufacturers to add the support. There are two ways I can see for this to happen:
(a) Make adding it as trivial as possible. If adding
(b) Increase the market share that OGG Vorbis has. This one is trickier, mainly because of the slim market that a good, lossy codec serves. What do I mean? Well, audiophiles aren't going to want to listen to any compressed format (though these dinosaurs claim their hissy records are better-sounding than Super Audio CD), and Joe Sixpack isn't going to notice any difference at all between
Having done numerous sound quality tests of OGG Vorbis and MP3 on my own equipment, I can say without a doubt that were all things considered equal, OGG would win out. Unfortunately, OGG has had a very late start, and is up against lots of other competitors who are all "good enough" for the average person, so its supporters will have to reduce the barriers to its use before anyone will care.
[ home ]
What if someone tries queing up their favorite track from The Faint's Danse Macabre.
The only reason we haven't seen OGG Vorbis support on solid state players is that they would only lose money by doing so, at least for now. This is coming from someone who encodes all of his own CD's as .ogg's.
Actually I think that the only thing stopping OGG Vorbis on hardware players is the lack of a free fixed point decoding library. Right now you can find free floating point decoding libraries, but not fixed point. Most of the processors used in hardware players do not support floating point operations. The CPU's only have an integer unit. When a fixed point library is released, I think that you will find Ogg supported everywhere that MP3 is, since it should be trivial to add, and will only take up a little more ROM.
Portable MP3 players of all things get the voice tech first. Why? Same with phones. The cell phones have the voice recognition, but if there are POTS phones that have it, they aren't exactly making commercials about it (not that I watch TV anyways)
:)
This feature would be no less useful on a desktop. It's definitely ideal for a small portable unit where working with a tiny display screen and buttons to switch between a large selection of songs can be tedious. However, being able to swap songs by simply speaking to your computer without forcing yourself to do a task switch could be helpful as well. Certainly, the 10-20 seconds you spend doing so isn't significant by itself, but this does add up over time. Its all about productivity people!
MP3 players are pioneering the way in other areas as well. Other than perhaps digital cameras, they provide a market for flash memory. And getting realtime playback, and hopefully soon widespread use of unrestricted realtime mp3 encoding for these units, will enhance their use beyond the simple playback of music. And of course, don't forget, anything that pisses off the RIAA is a good thing.
-Restil
Play with my webcams and lights here
Because of the amount of songs mp3 allows us to carry around, indexing the songs we have with us is a tricky thing. There are numerous indexing methods on MP3 players at the moment.. playlists on the iPod, simple numeric 'album' jumps on MP3-CD players, search facilities on in-car units etc.. but voice definitely simplifies matters.
However, I spy a problem. Even if it doesn't require training to recognise a voice, I bet it's still limited to a subset of accents.
You notice it with voice-recognition computer programs here in the UK. You speak normally and it rarely works.. put on the dullest most monotone American-style accent you can, and hey presto, up and running!
So, to get one of these, is a prerequisite that I practice my 'dull American drone'?
mogorific carpentry experiments
I might get modded down for this, but eDigital has just left a bad taste in my mouth..... And I wanted to share... ;)
I personally see this as being *on* topic because before you buy something from eDigital let me tell you what you *might* just be in for.
I'll do a condensed version of my story and just say "don't let this happen to you". I got a Treo 10 MP3 Jukebox from http://www.treoplayer.com for an xmas present. I'll be looking for a new xmas present.
My Treo 10 was basically D.O.A. the unit's harddrive would lock up during playback.
It took me *one month* to get an RMA number.
When I got *finally did get* the RMA number and sent the unit back I was to "promptly have a new unit sent" to me.
This didn't happen. The Treo 10 is on back order and no replacements will be sent out until *APRIL*. Like I'm going to wait three months for a replacement.
SO, I demanded a full refund. Their main support center said 'OK'. I got my credit email today and was told they were going to keep 15% for a "restocking fee" (?!?!?).
So, I called -- again -- raised hell, and am finally getting a full refund.
During this time, I went back to doing realtime recording of MP3's using my Sony MZR-900 (minidisc Walkman) and my digital soundcard. What I found was that the sound quality of my MP3's coming off my computer and onto my MD Walkman was *better* sounding than anything coming out of the Treo 10. I guess there's something to be said for Sony's D/A chips. I also re-discovered how convenient the MD Walkmanis.. It, and 3 Minidiscs easily fit in my coat pocket. I also have more than enough battery power to get through the day at the office.... And MDLP 4 mode is certainly livable enough for my needs. Hell it *still* sounds better than a cassette tape walkman if you ask me and I can 'boost' highs and lows to compensate for the sound loss during compression via WinAMP if I need to.
So that's it. No more MP3 jukebox BS for me. I'll stick to what works. And if you *do* decided to get an MP3 juke box - avoid eDigital like the PLAUGE! Their customer service is horrible and
their product when it *does* work is only of passible sound quality.
Polymorphism -- It's what you make of it.
In one form or another, speech recognition is going to be used more and more in the future, perhaps especially with handheld devices and tablet PC's. So, in light of this, who is working on Open Source speech recognition. I'm aware of CMU's Sphinx project, but last I saw it was quite obsolete technologically compared to commercial offerings. Is there any other Open Source'd work being done with cutting edge SR techniques?
No where in the actual article does it say that it uses "1GB flash" cards. However, the IBM microdrive does store that much data (340 MB, 512 MB or 1GB).
As far as I know the "SanDisk-compatible CompactFlash(TM) Cards" max out at 128 MB.
They might want to update the article seeing how it may get some people's hopes up.
"A plan fiendishly clever in its intricacies"- Homer Simpson
I hope voice recog is better than the last time I used it!
Trying to load stairway to heaven:
"Stairway...delete that...Stairway...delete that...no! Delete that!...Shit...delete that...delete that...delete that... Stairway...to...delete that...to...delete that...to...delete that...to...heaven...delete that...heaven...delete that...heaven...delete that...heaven...play...delete that...play...delete that...play...delete that...play...delete that..."
:)
I hate voice recognition.
It's been a long time.
ACtually, I work in this field.
Dragon, ViaVoice, etc. are dictation recognizers. They work by analyzing the speech data, and attempting to do phoneme matching to generate words, from a huge dictionary, and then do word matching.
This isn't an overly exciting model for different reasons. Large vocabulary recognizers have been around for 8-10 years. Nuance, SpeechWorks, Philips, and Temic end up being the big four in this market, allthough there is also a large vocabulary implementation of ViaVoice and others.
These products take a fixed grammar set, compile them in an speaker-indepedant manner, and can be used to recognize the compiled grammar. Without getting overly techincal, it is a very different speech recognition method than the dictation recognizers, as they aren't trying to recognize everything out of a dictionary, but simply out of what the known grammar is. The flexiblity in how the user can phrase the requests is small, but for relatively simple tasks, its a fine trade off.
Look at SprintPCS's VoiceCommand for example. (I was one of the writters of the product -- not the handset based recognition, but the serverside voice activated dialing solution). The idea is very similar, but we handle the concept a little differently.
This type of device is just waiting to happen. With VoiceXML designing tools like this will be standardized, but its not anything new, just a use of existing technology.