Slashdot Mirror


Neural Net Outperfoms Human in Speech Recognition

orac2 writes "Here's a press release (with a real video clip) on a neural net that can recognise speech better than humans - even in noisy environments. The network uses just 11 neurons. They did it by incorporating an aspect of biological neural networks normally ignored by artificial networks; the timing of signals between neurons. Beyond the immediate application to speech recognition the wider implications for all neural networks are obvious. " Neurons. Mmm.

2 of 203 comments (clear)

  1. All this and a counting horse by An+El+Haqq · · Score: 5

    It's difficult to evaluate this system given the sparse amount of information available. I, for one, am incredibly skeptical at this point.

    a) There is no statement of the train/test procedure for the neural net. It's fairly easy to get good performance if you're training your system on the same dataset that you test. Without this information, you cannot make a reasonable judgement.

    b) If you listen to the audio samples in the video at
    http://www.usc.edu/ext-relations/news_service/re al/real_video.html

    You can notice a significant difference in the times of the samples (e.g. "stop" is shorter than "yes"). A fairly unsophisticated NN can pick up on the length of a sound sample and generalize from there. I didn't hear any statement saying that in the official training and testing all sound samples were of the same length.


    It's really a mess. If someone has a journal article or other piece of reliable information on this research, a pointer would be appreciated. Until then, I'll be feeding Clever Hans.


  2. I get the impression by konstant · · Score: 5

    I get the impression that this net did not perform better "even" under noisy conditions, but "only" under noisy conditions.

    Here's the original link
    http://ww w.usc.edu/ext-relations/news_service/releases/stor ies/36013.html

    If I'm right about that, then this development (while still insanely cool - don't get me wrong) might not be so surprising. As I recall from college brain-and-mind psych courses, humans use a variety of factors when singling out a lone voice or conversation in a noisy environment. These include spacial orientation, visual cues, etc. My prof called the "cocktail party effect". Rob them of these cues, and it isn't suprising that they are hobbled.

    Also, computers have the mixed blessing of ignoring information patterns unless they are instructed to do otherwise. A person, listening to white noise, would subconsciously attempt to find meaning in every bleep and scratch. A computer, listening only for certain cues, can disregard the majority of the signal.

    I would be interested in learning what rate of word recognition this system achieves. Current technology manages about 90%, which means one in every ten words is heard incorrectly. If they could improve that to 99.9% or even just 99%, we might actually get some speech-processors in Office desktop products.



    -konstant

    --
    -konstant
    Yes! We are all individuals! I'm not!