eDigital MXP100 with Voice Control
An anonymous reader writes: "Here is a lengthy review of eDigital's 1GB flash MP3 portable that is as much a review on Lucent's remarkable speech recognition technology VoiceNav as it is on the player. VoiceNav offers speaker-independent recognition, meaning it doesn't have to learn each individual user's particular speech patterns like IBM's ViaVoice. Just say the name of a music track into the player's microphone and VoiceNav pulls up and plays that song. In ideal conditions the reviewer was able to twice run through a list of 14 song titles without fail. This included titles with "non-real word" band names like Sum41 and U2. Neat technology that could make its way into PDAs soon. The player is a pretty good one too, using IBM's Microdrive for storage."
Have you seen any hardware player of Ogg Vorbis format?
~shiny
WILL HACK FOR $$$
This technology is just cool, with some pretty serious applications.
I remember sitting around with some voice recognition program (can't remember what one) about 5 years ago, running through all of the little training things to get it to learn my speech patterns.
I find it kind of strange that it's first appearing in an MP3 player, but I suppose that's the kind of market where a lot of innovation is going to be. I just wonder how long it's going to be until we start seing this in more practical applications, instead of just being a convenience thing.
Dark Nexus
"Sanity is calming, but madness is more interesting."
Now all somebody has to do is link this with a domain-specific natural language parser, and put it in my car stereo (along with a decent amount of storage) and I'll pay any amount for it.
Does the voice recognition filter itself out? When U2 sings "one" I don't necessarily want it switching to Aimee Mann's "one" and vice versa.
---
Oregon
Anyone who has their song names in japanese charecter format might be SOL for the voice part, unless it can read kanji/hiragana/katakana
Call on God, but row AWAY from the rocks!
I think I'm feeding the trolls on this one, but I can't understand why you think a company would spend money on adding support for that format unless it would be a selling point. I grant that mp3 is worse than ogg, but can you honestly say that ogg is big enough in the "real world" for a company to go to the trouble of supporting it? The vast majority of my linux using friends still use mp3, and you can bet almost no one in the windows world uses ogg.
Slashdot 's editors are dickheads
wonder how well it would work on, say, the side of a highway. if it worked well this would be a nice little toy for those of us who run (or bike) around.
Kith Kaddith Lizard Man Extraordinaire
We tried and found that the background din of music, talking, and slamming weights was too much for VoiceNav. Once in a blue moon we got the track to shift, but not until speaking loud enough to draw the gaze of a few patrons who wondered why we were yelling at our MP3 player.
heh, I loved that part
Sig you!
It seems really nice now, but when you have the machine in hand, the VC really sucks. As stated in the review, it's really picky with order. The best thing though would be when you use it in a really crowded place. I'd get it just to get looks from people when I'm yelling at the player. Interesting stuff. "Play Bach damnit! I want to hear something soothing!"
I guess I have too many obscure mp3s, but how can the voice control differentiate:
Daydream Boat.mp3
Day Dreamboat.mp3
Alpha Betray.mp3
Alphabet Ray.mp3
Mont Anagram.mp3
Montana Gram.mp3
...
Lucent - formerly Bell Labs - really needs a shot in the arm. It's stock price has been battered big-time lately for reasons unconnected to the dot-bomb phenomena. Voice recognition on computers has been around for a while now with products like Dragon, Via Voice, etc. All of these programs are clunky, somewhat bloated, and need to be trained to individual speakers. A truly speaker-independent voice recognition system could be just what the doctor ordered for Lucent.
I searched Google for "VoiceNav" and the only references that came back were those connected to the MXP-100. I wonder if this is brand new. On the down side, if this does represent a breakthrough of sorts, Lucent probably holds patents on the technology that they will milk for all they're worth. The old Bell Labs used to have fairly liberal licensing policies for some of their stuff (UNIX anyone?) but now they're profit-driven. Shareholders might not look favorably on giving away a possible golden goose. I would love to see the magic behind this technology in an Open Source form.
Ogg is _NOT_ better then MP3 from a market standpoint. Every 6 months there will be some new format that improves the compression and sound quality. Many times, geeks are to focused on the technical aspect, and not the market aspect. Ubiquity is the key. MP3 is good enough, and it is here to stay. WMA is also catching on, but you'll notice that even that took years to happen.
I want something under $100 with under 64 MB flash but it has to be small so I can take it jogging.
. . . otherwise, there'll be a special broadcast on radio, cable, and embedded in trojan MP3s one day. It'll be Jack Valenti's voice saying "Don't play non-SDMI compliant content anymore." :).
One CPU cycle wasted on digital restrictions management is ONE TOO MANY.
Granted, I don't use Ogg Vorbis. I think I looked into it a while back, but I spent too long ripping all my CDs to switch. That's the real issue. Even a batch mp3>ov converter wouldn't work. I don't want to recompress an already lossy compression.
As for the name, I think ogg would be better to say than mp3. Ogg= 1 syllable, mp3 = 3. Plus, instead of ripping CDs, you can ogg them. Ogg players. No, in terms of names, ogg has mp3 beat.
The problem is mp3 is "good enough", and already entrenched.
jred
I'm not a mechanic but I play one in my garage...
When I can't get voice rec to work, I usually end up speaking louder because the frustration is just too much. It's bad enough listening to people yapping down the street or in stores with those little embedded mikes and earphones. Can you imagine hordes of people walking down the street screaming:
"Uncle Fucker"
"Baby Got Back"
"Cocaine"
"Cocacabana"
The last is probably worst of all. We know Barry exists, but it's horrible to be reminded that people actually listen to him.
----------
I am an expert in electricity. My father held the chair of applied electricity at the state prision.
IBM's voice recognition line extends past ViaVoice. We offer several products, including an embedded product, that do not require any training. Only the highest end dictation product requires training because of the demands on it to understand what you just said from tens of thousands of words. If all you can say is a hundred or so phrases like "play", "stop", "rewind", "livin' la vida loca", etc. then it's a lot easier to make a prediction and training is a waste of time. At that point it's just a matter of microphone quality and filtering out the background noise. We can even do untrained natural language voice recognition in situations like this with the proper processor power. Since we know what you're by and large going to say, we can pick out enough from the whole free-form sentence to get the gist of what you meant without any training.
:)
And believe me we're getting to the point where training isn't needed for dictation either
This doesn't mean much. To pick the correct one between only 14 possible is quite easy. The reviewer should rather have tried with a playlist with more than 3000 entres. The error rate will grow exponatially with the number of songs, because statisically more song will be phoneticly more equal, the more you add. (bad way to say it, but you prob get the point)
Ogg is just the name of the, uh, 'group' doing the work. The actual audio format is called Ogg Vorbis, in contrast with Ogg Tarken, their proposed video codec.
:P
So your sylable count is really incorrect
autopr0n is like, down and stuff.
As for the test I would like to know if anyone here has less then 15 mp3's that they would like to store on one of these. I want to see how it reacts when you have a few hundred songs and try to use the name recognition system. I have the odd feeling that it might not work so well
Information wants to be free like speech wants to be free, not like we want beer to be free.
For me, the biggest attraction of MP3 players is the ability to have no moving parts. This makes it truly portable and useful in more situations that what we had previously. So, my question is, how reliable is this IBM microdrive? How robust is it? If I'm training for to run a marathon, is it going to survive all of the pounding?
Hrm, the thing dosn't look quite as cool as the ipod. Not that I don't hate apple or anything, but there don't seem to be a lot of players out there that have both a high capacity and the esthetic styling approaching or surpassing the iPod. There are some cool looking mp3 players, and there are some that are better technically then the iPod. But unfortunately, they don't seem to be in the same group. (of course, given the price you could just get a real PDA that can play mp3s for a bout $100 more...)
Personally, I doubt the voice nav in the current system is really that great, especially since you have to manually stop the music in order to use it. Of course with 200 or so songs it might come in handy (if it scales that well).
autopr0n is like, down and stuff.
edigital has a long history of using hype and grossly misleading tactics to, IMO, defraud investors. So far they've lost tens of millions of dollars, and recently had to resort to taking a loan at a 49% interest rate just to stay in business. Even the CEO has referred to the investors as a "cult".
As for their history with their products, their much-hyped Treo barely sold any units in stores, and is now being sold by liquidators on ebay. A lot of customers were a bit pissed that their players didn't come with any storage media!
This wasn't intended as flamebait, but E.digital has a long history of using hype and misleading tactics to pursue little more than an incursion of investment money from gullible public investors. I didn't lose any money to them, but a lot of people did, and will continue to.
In fact, they recently registered 20 million more shares so they can stay in business a while longer. They really don't deserve this kind of attention from Slashdot.
For those considering investing in them, I'd say stay away. For those considering a product purchase, I'd recommend the same.
I have a lot of friends who have sprint phones with voice nav. They all used it for the first week because it was "cool" but after awhile, they went back to traditional methods. Another example is my father; he got the 02 Infinity Q45 which has loads of tech toys built in. The voice nav is really cool but it's not nearly as fast a clicking a button.
...only I pictured it with the ability to retrieve a song by just singing a bit of it or speaking some lyrics.
pr0n - keeping monitor glass spotless since 1981.
Picture this:
Stuck in traffic watching your favorite music video then saying "Pause" and "Dial Home" to tell sweety that you'll be late coming home because of traffic.
It's take too long to copy files to the thing. My iPod has spoiled me.
And for all you OSS zelots, OGG is not mature and has no place in hardware yet, deal with it. If you judge all hardware based on somthing that no hardware worth owning has, then you will never be happy. Opensource does not always mean it's better anyway.
I guess there was really no reason not to add voice nav to the system. The DSP arhcitecture they use for decoding is also pretty ideal for voice recognition apps. It's just a matter of adding some software they probably alreayd own and want to test.
I figure this gives them a cheap opportunity to test their voice rec. system where it won't cause too many problems if it doesn't work (You can still play MP3's) adn none will be too pissed.
Thank you, thank you. I'm here all week.
Just say the name of a music track into the player's microphone and VoiceNav pulls out a rabit...
I know this has nothing to do with anything but i want at least someone to read this.
IT IS NOT YOUR JOB TO RE-WRITE THE NATIONAL ANTHEM .
Anyone watching the basketball all-star game knows what i'm talking about. A Canadian singer, dont know his name, DESTROYED the canadian anthem. Patti Labelle KILLED the Star Spangled Banner. I didnt even recognize either song.
Don't buy the "it-worked-for-me" argument. Especially with speech-recognition technology. A selective test is not a benchmark.
This speaker-independent technology is based on recognition of phonems. To be able to perform recognition, you first need to translate written entries into sequences of phonems. For example, "Genesis" will become "JH EH1 N AH0 S AH0 S". Usually, this conversion is done by looking up in a phonetic dictionary. When there's a missing entry, a fallback strategy is to perform automatic graphem-to-phonem automatic, i.e. create phonem strings based on lexical structure of a word. This yields poor results for many languages such as english which has unpredictable graphem-to-phonem correspondence. So, either this technology uses a dictionary (within the PC application) or it uses a graphem-to-phonem engine. The problem with dictionary is that it may be HUGE with all the music authors and titles available and it evolves rapidly.
Also, the training is usually done for only one language (sometimes, two). This is called acoustic model training. Each phonem of a given language will be trained in HMMs (Hidden Markov Model). You can only achieve limited results when using words made out of foreign phonemes. "Björk", for instance, will be phonetized "B Y AO1 R K" for english-speaking persons. If you happend to pronounce correctly (i.e. in Icelandic), the engine won't be able to figure it out because the acoustic data is not modeled properly.
I have strong doubts about this gadget because it requires dynamic dictionaries and multi-lingual support. I listen a lot to foreign music. I don't think this toy will work ok for me.
All humans are mortal. Socrates is a human. Socrates is dead.
1) it's too big.
2) they claim to have not many competitors, that like is a complete pharce. they have plenty. some whose players are smaller and hold more.
3) i see some complaining for ogg vorbis support. um do you not get it, mp3 is here, to stay. regardless of which technology is better think of the names.. "mp3".. "ogg vorbis".. MP3 winds hands down, totally no contest.
4) most people won't even bother with portable mp3 players like this. maybe if it were smaller than a wallet, held over two weeks of music, and retailed for under $100.. then people might start buying these newfangled gadgets. Until then, you're losin money baybay.
If this thing ran off CDs and supported ogg vorbis I would buy this in an instant. As it is i'm forced to drool over the spiffy voice recognition and keep waiting...
it's easy to configure grip to automatically rip and sort any cd you put in the drive. That way you just spend some time swapping cds around. You don't need to do it all at once
anta baka?
Li-Ion rechargeable battery (3.7V/1200mAh) for over 12 hours of playback
you can't really get a gig worth of MP3s out of that..
It's tempting, but I won't go for it. I'm too much of a They Might Be Giants fan. I can see it now, sitting there in a public area with some weird looking device in my hand:
"PUT YOUR HAND INSIDE THE PUPPET HEAD!"
"...NO!" Someone speaks to me "Are you OK?"
"Yeah Yeah," Yeh Yeh starts playing. "Ahh!"
"DIG MY GRAVE"
"Sir, are you sure you're alright?" [stopping]
"Yeah, fine." suddenly person A asks person B for a light. "I've got a match."
The thing starts playing agian. Just then a Dirt Bike wizzes by and someone says "Man, that's a fast Dirt Bike." Guess what song starts playing. Then I stop it so I can play "I AM A HUMAN HEAD!" again getting more stares.
Then what if I want to hear Chuck Berry? "MY DINGALING" *SMACK*
No, for me this is nothing but trouble...
--Josh
There are exactly 42,935,718 letter sized sheets in a square mile.
you might be interested in the fact that this has already been done
The only reason we haven't seen OGG Vorbis support on solid state players is that they would only lose money by doing so, at least for now. This is coming from someone who encodes all of his own CD's as .ogg's.
.ogg support required only a few days of extra development time, you'd see it.
.mp3 and .ogg.
Alas, I wish there were some incentive for player manufacturers to add the support. There are two ways I can see for this to happen:
(a) Make adding it as trivial as possible. If adding
(b) Increase the market share that OGG Vorbis has. This one is trickier, mainly because of the slim market that a good, lossy codec serves. What do I mean? Well, audiophiles aren't going to want to listen to any compressed format (though these dinosaurs claim their hissy records are better-sounding than Super Audio CD), and Joe Sixpack isn't going to notice any difference at all between
Having done numerous sound quality tests of OGG Vorbis and MP3 on my own equipment, I can say without a doubt that were all things considered equal, OGG would win out. Unfortunately, OGG has had a very late start, and is up against lots of other competitors who are all "good enough" for the average person, so its supporters will have to reduce the barriers to its use before anyone will care.
[ home ]
It is very impressive when you shout the word "Folder" talking like Apu Nahasapeemapetilon and it still works.
Not really. I'd be impressed though it picked up on the guy's name instead of his accent.
I Browse at +4 Flamebait
Open Source Sysadmin
What if someone tries queing up their favorite track from The Faint's Danse Macabre.
The only reason we haven't seen OGG Vorbis support on solid state players is that they would only lose money by doing so, at least for now. This is coming from someone who encodes all of his own CD's as .ogg's.
Actually I think that the only thing stopping OGG Vorbis on hardware players is the lack of a free fixed point decoding library. Right now you can find free floating point decoding libraries, but not fixed point. Most of the processors used in hardware players do not support floating point operations. The CPU's only have an integer unit. When a fixed point library is released, I think that you will find Ogg supported everywhere that MP3 is, since it should be trivial to add, and will only take up a little more ROM.
Portable MP3 players of all things get the voice tech first. Why? Same with phones. The cell phones have the voice recognition, but if there are POTS phones that have it, they aren't exactly making commercials about it (not that I watch TV anyways)
:)
This feature would be no less useful on a desktop. It's definitely ideal for a small portable unit where working with a tiny display screen and buttons to switch between a large selection of songs can be tedious. However, being able to swap songs by simply speaking to your computer without forcing yourself to do a task switch could be helpful as well. Certainly, the 10-20 seconds you spend doing so isn't significant by itself, but this does add up over time. Its all about productivity people!
MP3 players are pioneering the way in other areas as well. Other than perhaps digital cameras, they provide a market for flash memory. And getting realtime playback, and hopefully soon widespread use of unrestricted realtime mp3 encoding for these units, will enhance their use beyond the simple playback of music. And of course, don't forget, anything that pisses off the RIAA is a good thing.
-Restil
Play with my webcams and lights here
Because of the amount of songs mp3 allows us to carry around, indexing the songs we have with us is a tricky thing. There are numerous indexing methods on MP3 players at the moment.. playlists on the iPod, simple numeric 'album' jumps on MP3-CD players, search facilities on in-car units etc.. but voice definitely simplifies matters.
However, I spy a problem. Even if it doesn't require training to recognise a voice, I bet it's still limited to a subset of accents.
You notice it with voice-recognition computer programs here in the UK. You speak normally and it rarely works.. put on the dullest most monotone American-style accent you can, and hey presto, up and running!
So, to get one of these, is a prerequisite that I practice my 'dull American drone'?
mogorific carpentry experiments
why is everyone suprised that a mp3 player has voice control ?, cellphones here in the UK have had that technology for ages, but now they are moving just beyond "call pizza" to built in mp3 players and radios , at current speed of development its gonna be this year they merge these technologies and we end up with a voice controlled mp3 player with pda and cellphone with built in cameras ! yay, the end is near for all these fragmented devices and we will soon have that device we all want that fits in our pockets and does everything in one single device
One of the differences: CF (including IBM Microdrive) is removable. iPod uses non-removable 1.5'' Toshiba drive. The fact that it is removable may make the difference. For instance, if you also have a digital camera which uses CF cards. Also with CF you have a choice between a microdrive and a solid state. As to memory capacity, 1.5'' 10GB drive just became available (doubles the capacity of the one used in iPod), so even if IBM comes up with 6GB microdrive next year, 1.5'' format is still the higher capacity. Finally, firewire is a way faster than USB. we need at least usb2.
looks like is has flaws
I'll stick with the iPod thanks.
That was exactly my point. Right now, it would be more than "a few days' development time" to integrate OGG support. Thanks for being redundant. =)
[ home ]
I might get modded down for this, but eDigital has just left a bad taste in my mouth..... And I wanted to share... ;)
I personally see this as being *on* topic because before you buy something from eDigital let me tell you what you *might* just be in for.
I'll do a condensed version of my story and just say "don't let this happen to you". I got a Treo 10 MP3 Jukebox from http://www.treoplayer.com for an xmas present. I'll be looking for a new xmas present.
My Treo 10 was basically D.O.A. the unit's harddrive would lock up during playback.
It took me *one month* to get an RMA number.
When I got *finally did get* the RMA number and sent the unit back I was to "promptly have a new unit sent" to me.
This didn't happen. The Treo 10 is on back order and no replacements will be sent out until *APRIL*. Like I'm going to wait three months for a replacement.
SO, I demanded a full refund. Their main support center said 'OK'. I got my credit email today and was told they were going to keep 15% for a "restocking fee" (?!?!?).
So, I called -- again -- raised hell, and am finally getting a full refund.
During this time, I went back to doing realtime recording of MP3's using my Sony MZR-900 (minidisc Walkman) and my digital soundcard. What I found was that the sound quality of my MP3's coming off my computer and onto my MD Walkman was *better* sounding than anything coming out of the Treo 10. I guess there's something to be said for Sony's D/A chips. I also re-discovered how convenient the MD Walkmanis.. It, and 3 Minidiscs easily fit in my coat pocket. I also have more than enough battery power to get through the day at the office.... And MDLP 4 mode is certainly livable enough for my needs. Hell it *still* sounds better than a cassette tape walkman if you ask me and I can 'boost' highs and lows to compensate for the sound loss during compression via WinAMP if I need to.
So that's it. No more MP3 jukebox BS for me. I'll stick to what works. And if you *do* decided to get an MP3 juke box - avoid eDigital like the PLAUGE! Their customer service is horrible and
their product when it *does* work is only of passible sound quality.
Polymorphism -- It's what you make of it.
On the article they dont mention the battery life on this thing. The microdrive is GREAT, but it consumes a lot of battery life when compared to a flash memory card.
Anyone has any info on this?
Thanks!
In one form or another, speech recognition is going to be used more and more in the future, perhaps especially with handheld devices and tablet PC's. So, in light of this, who is working on Open Source speech recognition. I'm aware of CMU's Sphinx project, but last I saw it was quite obsolete technologically compared to commercial offerings. Is there any other Open Source'd work being done with cutting edge SR techniques?
For years I thought that track was called "strange formula", but then I found out the original ID3 tag typist had been lazy :)
As for the iBook laptop keyboard, use "Keycaps" (in the Apple menu if you're using OS9, or the Apps:Utilities Folder in OS X)...
No where in the actual article does it say that it uses "1GB flash" cards. However, the IBM microdrive does store that much data (340 MB, 512 MB or 1GB).
As far as I know the "SanDisk-compatible CompactFlash(TM) Cards" max out at 128 MB.
They might want to update the article seeing how it may get some people's hopes up.
"A plan fiendishly clever in its intricacies"- Homer Simpson
I hope voice recog is better than the last time I used it!
Trying to load stairway to heaven:
"Stairway...delete that...Stairway...delete that...no! Delete that!...Shit...delete that...delete that...delete that... Stairway...to...delete that...to...delete that...to...delete that...to...heaven...delete that...heaven...delete that...heaven...delete that...heaven...play...delete that...play...delete that...play...delete that...play...delete that..."
:)
I hate voice recognition.
It's been a long time.
They'll never get it to play track 2 from Windowlicker. (Although, I do love Amazon's attempt.)
The writing standards in this review by Richard Menta are amongst the worst I have seen. He repeats almost every piece of information at least once (so you say it supports MP3 and WMA?) and fails to mention some pretty crucial features of any mp3 player. For example, he mentions the lithium-ion batteries "had no trouble handling the power hungry Microdives" - but how long did they actually last?!
Also, testing the VoiceNav feature with 14 songs is laughable. You basically had to know the full artist+track name anyway, so why not just memorise the 14 tracks and refer to them by number? My mp3 player has over 3000 tracks at the moment and I have no confidence that VoiceNav could handle them, despite reading this review which gave it 4 stars. And seriously, on a crowded and noisy subway train who is going to yell "Achey breaky heart" into their shirt pocket?
Not I, for more reasons than one.
Karma police, I've given all I can, it's not enough, I've given all I can, but we're still on the payroll.
Being that I work on Via Voice products, I can tell you first hand that Lucent isn't the game in town that offers speaker independent recognition especially within the embedded domain where this product exists. And of course, our product is the market leader for a reason.
Since you apparently never received a working Treo, how do you know what one sounds like? Your testimony is suspect. BTW, mine surpasses anything else like it for sound quality.
Coming from an AC, why do I even bother.. but what the heck...
The unit wasn't *completely* DOA. It did work for a while. But the sound quality is NOTHING like a Minidisc player. Sony's D/A processing chips are beyond anything anyone else does. If you believe otherwise, then I can see the *reason* you post as an AC.
Polymorphism -- It's what you make of it.
I would have tested it only using the Comic Book Store guy's voice. It seems like the type of thing he would use. "Worst playlist, ever!"
I really hate signatures, but go to my website.