Sounds good if they make the corpus freely available. Having lots of free high quality audio...
I agree, but from a quick look at their page, I see a lot of problems with reaching that goal.
1: Most computers I've seen have pretty wretched audio inputs: tiny microphones near the screen, so not anywhere near the speaker's mouth. So we can expect lots of noise, echo, and other stuff. Good for simulating the real world (because it basically is the real world), but not what I would call high quality. Some gamers and others probably use good quality headsets, but I doubt they will make up the majority of the data base. Audio might be pretty good if the speakers use cell phones.
2: People reading written text don't talk the same way as in natural conversation. That's going to be a limitation for some developers.
3: They seem to be depending on the generosity/curiosity of people to generate and validate the samples. That's a hard way to get thousands to enroll. If they had some kind of game or other system that provides a psychic reward/incentive to the users I'd be more confident of a good response.
And a final comment: I hope they're sampling at 16 kHz instead of 8. To explain: Nyquist's Theorem says the sampling rate needs to be more than twice the highest frequency component in the analog signal. Speech typically contains components up to about 6 or 7 kHz, so 16k is a good number. Unfortunately, the carbon microphones that phones used for the first 100 years or so only go up to about 4kHz, so Ma Bell (remember her?) settled on an 8kHz rate in the middle of last century, and most everybody else has accepted that ever since.
I think any near state of the art recognizer is going to be pretty complicated, because the algorithms are not simple. On the other hand you're talking about complicated math turned into code by people who are scientists instead of professional programmers.
At one extreme, TiESR https://gforge.ti.com/gf/proje... is a fairly simple to use. Not state of the art, but it does use Hidden Markov Models (HMM's) and has some noise compensation built in. It comes with word and language models, so it's fairly easy to use - for US English at least. I haven't been ambitious enough to figure out how to build new models.
At the other extreme, Kaldi http://kaldi-asr.org/ is the most advanced open source recognizer that I'm aware of. Neural Nets and all the other goodies researchers have been working on the last few years. Definitely not easy to compile or use, though. And don't even think about trying to design a neural net without a graphics card to use as a math accelerator: one of the examples ran for days and wasn't even close to finishing when I gave up.
Anybody else have suggestions for another toolkit?
I've been trying to get New Hampshire information (should be simple because we only have one provider in the exchange). Being self-employed I have mediocre individual insurance, but would like to see if ObamaCare* is better and compare costs. Hints in the local news indicate that costs are pretty good but their network has a limited set of hospitals and doctors, so I'd like to get information in order to figure out whether I even want to sign up or try to keep what I have.
Tuesday I did the signup process, filled in all the information 3 times. Then I figured out that I could just hit the "back" button to go back to the security questions page and hit submit again. Finally got registered about 9PM, then got the validation email and clicked on that several times until it was finally accepted at 10:30PM.
And I've been trying and failing to login ever since.
So why should I have to go through all that just to get prices and find out which doctors are in their plan? On Ebay, Amazon, or just about any ecommerce site I can get the product description and price straight from a Google search. I only have to go through the registration/login hassle if I actually want to buy something.
If they would just provide the plan information with a simple static html page I could get the information I want, stop hammering on their servers, decide what to do, and come back next month if I decide I want to buy.
* Off-topic: If the program is even moderately successful, I suspect certain politicians will regret working so hard to ensure that Obama's name is forever attached to it.
About using the Digilent USB for data transfers from Linux, you might want to check out the project I recently started: http://mhz100q.sourceforge.net/. It includes VHDL and firmware for the Cypress USB chip to support transfers using libusb.
I think your best hope is to get a crude hardware prototype with your software running on it, and let an actual mass-market company buy it off of you (or hire you.)
I'll second the idea of starting with a prototype. If you don't have hardware experience, there's a good chance you are overlooking something critical. An example from experience: I did some work for a small company that wanted to build a USB remote control for digital cameras. Neat concept, but after some breadboarding we found there were fundamental reasons why it was impossible at a reasonable price and performance point -- like, many digitals don't support a "take-picture" command via USB.
Depending on what you're trying to do, I would suggest looking at a place like microcontrollershop.com to see if you can get some existing boards that you can wire together to implement your idea with a low up-front investment. If your idea still looks reasonable at that point, you have something workable to demonstrate, you have a starting point for a hardware design, and you have a platform for software development while the real hardware is being developed.
Sounds good if they make the corpus freely available. Having lots of free high quality audio ...
I agree, but from a quick look at their page, I see a lot of problems with reaching that goal.
1: Most computers I've seen have pretty wretched audio inputs: tiny microphones near the screen, so not anywhere near the speaker's mouth. So we can expect lots of noise, echo, and other stuff. Good for simulating the real world (because it basically is the real world), but not what I would call high quality. Some gamers and others probably use good quality headsets, but I doubt they will make up the majority of the data base. Audio might be pretty good if the speakers use cell phones.
2: People reading written text don't talk the same way as in natural conversation. That's going to be a limitation for some developers.
3: They seem to be depending on the generosity/curiosity of people to generate and validate the samples. That's a hard way to get thousands to enroll. If they had some kind of game or other system that provides a psychic reward/incentive to the users I'd be more confident of a good response.
And a final comment: I hope they're sampling at 16 kHz instead of 8. To explain: Nyquist's Theorem says the sampling rate needs to be more than twice the highest frequency component in the analog signal. Speech typically contains components up to about 6 or 7 kHz, so 16k is a good number. Unfortunately, the carbon microphones that phones used for the first 100 years or so only go up to about 4kHz, so Ma Bell (remember her?) settled on an 8kHz rate in the middle of last century, and most everybody else has accepted that ever since.
At one extreme, TiESR https://gforge.ti.com/gf/proje... is a fairly simple to use. Not state of the art, but it does use Hidden Markov Models (HMM's) and has some noise compensation built in. It comes with word and language models, so it's fairly easy to use - for US English at least. I haven't been ambitious enough to figure out how to build new models.
At the other extreme, Kaldi http://kaldi-asr.org/ is the most advanced open source recognizer that I'm aware of. Neural Nets and all the other goodies researchers have been working on the last few years. Definitely not easy to compile or use, though. And don't even think about trying to design a neural net without a graphics card to use as a math accelerator: one of the examples ran for days and wasn't even close to finishing when I gave up.
Anybody else have suggestions for another toolkit?
Tuesday I did the signup process, filled in all the information 3 times. Then I figured out that I could just hit the "back" button to go back to the security questions page and hit submit again. Finally got registered about 9PM, then got the validation email and clicked on that several times until it was finally accepted at 10:30PM.
And I've been trying and failing to login ever since.
So why should I have to go through all that just to get prices and find out which doctors are in their plan? On Ebay, Amazon, or just about any ecommerce site I can get the product description and price straight from a Google search. I only have to go through the registration/login hassle if I actually want to buy something. If they would just provide the plan information with a simple static html page I could get the information I want, stop hammering on their servers, decide what to do, and come back next month if I decide I want to buy.
* Off-topic: If the program is even moderately successful, I suspect certain politicians will regret working so hard to ensure that Obama's name is forever attached to it.
About using the Digilent USB for data transfers from Linux, you might want to check out the project I recently started: http://mhz100q.sourceforge.net/. It includes VHDL and firmware for the Cypress USB chip to support transfers using libusb.
I think your best hope is to get a crude hardware prototype with your software running on it, and let an actual mass-market company buy it off of you (or hire you.)
I'll second the idea of starting with a prototype. If you don't have hardware experience, there's a good chance you are overlooking something critical. An example from experience: I did some work for a small company that wanted to build a USB remote control for digital cameras. Neat concept, but after some breadboarding we found there were fundamental reasons why it was impossible at a reasonable price and performance point -- like, many digitals don't support a "take-picture" command via USB.
Depending on what you're trying to do, I would suggest looking at a place like microcontrollershop.com to see if you can get some existing boards that you can wire together to implement your idea with a low up-front investment. If your idea still looks reasonable at that point, you have something workable to demonstrate, you have a starting point for a hardware design, and you have a platform for software development while the real hardware is being developed.