Slashdot Mirror


US Intelligence Unit Launches $50k Speech Recognition Competition

coondoggie writes The $50,000 challenge comes from researchers at the Intelligence Advanced Research Projects Activity (IARPA), within the Office of the Director of National Intelligence. The competition, known as Automatic Speech recognition in Reverberant Environments (ASpIRE), hopes to get the industry, universities or other researchers to build automatic speech recognition technology that can handle a variety of acoustic environments and recording scenarios on natural conversational speech.

62 comments

  1. How about this one? by korbulon · · Score: 4, Insightful

    "Go fuck yourself."

    1. Re: How about this one? by Anonymous Coward · · Score: 0

      Voice pattern recognized. You are under arrest George Carlin.

    2. Re:How about this one? by Anonymous Coward · · Score: 0

      "Go fuck yourself."

      got it! "Call for oar elves?"

    3. Re:How about this one? by Anonymous Coward · · Score: 0

      In this case, it should be more like "Go fuck yourself... self... elf... elf..."

    4. Re:How about this one? by Anonymous Coward · · Score: 0

      Something tells me their next contest would be about cracking voice encryption.

    5. Re:How about this one? by bigfoottoo · · Score: 1
    6. Re:How about this one? by dean.collins · · Score: 1

      My exact thoughts....I hope developers heed this and they get a total of 0 entries.

    7. Re:How about this one? by penguinoid · · Score: 1

      "Go fuck yourself."

      Better idea. Enter the competition, use already well-developed commercial software (or write a program to average the results of several commercial programs), and easily win the competition. It's not like anyone is going to create software worth millions and give it away for a tiny prize.

      --
      Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways
    8. Re:How about this one? by Kirth · · Score: 1

      The would better start to recognize peoples right to free speech.
      Which includes the right of not being pestered by the government (as in: put under surveillance) for it.

      --
      "The more prohibitions there are, The poorer the people will be" -- Lao Tse
  2. Eh arent they trying? by Roodvlees · · Score: 3, Interesting

    Haven't Microsoft, Apple and Google already spend billions of dollars on this?
    Seems they are appealing to any random developer who might have an idea.

    --
    Thank you, Bradley Manning, Edward Snowden and so many others, for courageously defending humanity, my freedom and more!
    1. Re: Eh arent they trying? by Anonymous Coward · · Score: 0

      We are in desperate need of systemd-voicerecognd

    2. Re:Eh arent they trying? by bouldin · · Score: 2

      Haven't Microsoft, Apple and Google already spend billions of dollars on this?

      All the speech recognition software I've used has relied on a controlled environment (e.g. yelling directly into your phone with almost no reverberation, no competing conversations, very little background noise).

      Reverberation *should* be the easiest kind of noise to remove, because it has a simple mathematical model:

      S(t) = signal(t) + f(signal(t - delay))

      Where f() is a pretty simple function that may attenuate some frequencies more than others.

      Modelling all the other kinds of background noise is much, much harder.

    3. Re:Eh arent they trying? by ranton · · Score: 2

      All the speech recognition software I've used has relied on a controlled environment (e.g. yelling directly into your phone with almost no reverberation, no competing conversations, very little background noise).

      ...

      Modelling all the other kinds of background noise is much, much harder.

      I agree, but the issue is this problem is harder than those that industry leaders are putting billions of dollars of R&D money into. What is $50k really going to accomplish? There are Kaggle competitions that pay out more than that for far more trivial problems (like a marginal increase in CTR prediction).

      --
      -- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
    4. Re:Eh arent they trying? by iluvcapra · · Score: 1

      Reverberation *should* be the easiest kind of noise to remove, because it has a simple mathematical model:

      S(t) = signal(t) + f(signal(t - delay))

      It's not that simple, a reverberant space can have dozens of different discrete delay taps, add secondary (and tertiary, etc) reflections and the resulting spectral envelope is just a fog with an effectively continuous system of delay. Also keep in mind that all "functions that attenuate frequencies" are themselves just delays whose length is a function of a particular wavelength of interest. The spectral changes a reverberant space imparts -- attenuating and resonating -- are a function of cavities in the space, modes, and surface diffractions that have the effect of filtering the signal due to multipath interference.

      In practice, reverb removal is impossible to do perfectly. Techniques for doing it do things like modelling the reverberant space as linear time invariant system and then inverse-convolving the recorded signal. This is sortof what you described, by getting the LTI model in the first place is the difficult nut to crack, some systems simply do blind deconvolution, where the spectrum of the dry signal is guessed or some kind of average in the spectral domain. And once you have the model, it can change the moment a source moves in the space or the space changes configuration, by say opening a door. Good systems for speech often involve psychoacoustic modeling...

      --
      Don't blame me, I voted for Baltar.
  3. Nope. by Anonymous Coward · · Score: 0

    Only 50k to sell my soul for having them spy on more people... including myself?
    Nope.

    1. Re: Nope. by bill_mcgonigle · · Score: 1

      Only 50k to sell my soul for having them spy on more people... including myself?
      Nope.

      Of course not you - but the kinds of people who will submit are going to get job offers from the NRO. They are willing to make that deal, they're not bright enough to run off to industry, and they might have a glimmer of talent that cannot be cultivated in the university system. Plus, $50k isn't enough to quit and start a company, so it's a well-considered recruiting effort.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
  4. 50k? by Thanshin · · Score: 1

    Call Nuance and tell them you are going to make a money injection in their R&D dept.

    I'm sure your 50k will make a real impact, when added to their 1.9 billion dollar revenue.

    1. Re:50k? by Anonymous Coward · · Score: 1

      Thing is, every huge company has a core of an idea (perhaps built by the founders on a weekend), that they're just milking for all its worth... the $50k might motivate a lone wolf developer to build something that's qualitatively better than the multibillion dollar's core idea.

      For example, right now, all sound is filtered, transformed (frequency bands), quantized, and then those values are used to train a hidden markov model... that works for speakers in a quiet room---but doesn't for noisy environments or multiple speakers... somehow the brain manages to do that but we can't yet. Crazy thought (lone wolf developer?) once HMM detects any meaningful speech, go back and adjust the filtering to isolate *that* voice---and perhaps repeat if 2nd voice is detected. This is likely counter to what current methods are doing (they don't circle back to the frequency bands or quantization stages, which are mostly static). So yah... a lone wolf might implement something like that on a weekend, and it *might* be better for noisy environments.... but then their language model wouldn't work quite as well as the $2 billion dollar company's :-/

    2. Re:50k? by Thanshin · · Score: 2

      Thing is, every huge company has a core of an idea (perhaps built by the founders on a weekend), that they're just milking for all its worth... the $50k might motivate a lone wolf developer to build something that's qualitatively better than the multibillion dollar's core idea.

      You may be right, let's offer $50k to whoever sends another probe to a comet. Sure it cost $1,4 billions to the ESA but a lone wolf could find a qualitatively better way to do the mission. By February 4, 2015.

      Slashdot is the last place where I expected to see an extremely difficult problem underestimated just because it's a computing problem.

    3. Re:50k? by Anonymous Coward · · Score: 0

      Yeah, er, no. This is not something you can knock up in a weekend. The methods used by the good software is protected by trade secrecy, not patents, so you don't really know what tricks they're doing. Hint: anything you can rattle off without thinking, the people who are experts at speech recognition and who are paid to work on speech recognition can also do. They also have weekends, in fact they have all week to play with ideas like these. Plus, your understanding of how speech recognition works is from 90s textbooks based on papers from the 70s and 80s. Maybe some F/OSS speech recognition software works like this. The bleeding edge is 30 years beyond that now. Ever notice how this stuff kind of works now, whereas it didn't 30 years ago?

  5. Out of touch with reality by MtHuurne · · Score: 2

    So they want a complex problem solved in 2 months (first test on Feb 4 and there are holidays inbetween), for which they will pay a relatively low amount and only to the winners. Even if the result wouldn't be used for spying, I don't think there would be many takers.

    1. Re:Out of touch with reality by SourceFrog · · Score: 2

      I am sick of these "challenges" that effectively try get programmers to work for effectively well below market rates. As if we're like children, a "challenge" is supposed to make us set aside months or years of income to work on a really difficult problem that if we had to actually go out and do for a company in the job market, we'd be paid $100K/year or more. I think they probably attract young people who don't understand the value of their own time or skills, or who are more easily lured by childish notions like that it's a "challenge", or some of these types of "challenges" attract good programmers from poor countries who are desperate to become more recognized in the longer term - in that case they may at least get something useful out of it, but still I'd rather see these "challenges" pay at *least* closer to market rates for programming labor. As they say in prostitution and marriage, don't 'give away the goods for free'.

      --
      My other UID is three digits.
    2. Re:Out of touch with reality by Thanshin · · Score: 1

      So they want a complex problem solved in 2 months (first test on Feb 4 and there are holidays inbetween), for which they will pay a relatively low amount and only to the winners. Even if the result wouldn't be used for spying, I don't think there would be many takers.

      Relatively low amount? For $50k it would have to be coded by volunteers and prison inmates.

      "It's breaking rocks with a hammer, being stabbed in the laundry, or coding the speech recognition thing."
      "Hmm, the laundry thing seems superfun but I'll pick hammering rocks. Give the coding gig to the guys in death row. They have nothing to lose anyway."

    3. Re:Out of touch with reality by dj245 · · Score: 1

      I am sick of these "challenges" that effectively try get programmers to work for effectively well below market rates. As if we're like children, a "challenge" is supposed to make us set aside months or years of income to work on a really difficult problem that if we had to actually go out and do for a company in the job market, we'd be paid $100K/year or more..

      You're completely missing the point. They've found the Stargate and egyptologists are a dime a dozen. They need to form an elite team of programming and AI experts who will decode the symbols on the Stargate and defeat Apophis. This is just a fancy recruitment test.

      --
      Even those who arrange and design shrubberies are under considerable economic stress at this period in history.
    4. Re:Out of touch with reality by peragrin · · Score: 1

      Then they should code the tests into the next call of duty game. They can call it. Call of Duty :Prometheus. And features alien worlds and starships.

      --
      i thought once I was found, but it was only a dream.
    5. Re:Out of touch with reality by CaptainLard · · Score: 2

      Well they're not doing it on purpose. The DOD is just used to it's contractors massively under-bidding to win the contract and then exploding the budget with 1000 MBA's united in the goal of shareholder profit maximization. Just enter the contest and when it comes time to demonstrate the algorithm say the schedule has slipped to April...2022 and you'll need an extra $3billion. You'll see, they won't even blink!

    6. Re:Out of touch with reality by SourceFrog · · Score: 1

      This is just a fancy recruitment test

      I don't think I've missed the point, as I'm saying the same thing - I just think it's a lousy way to do recruitment. Analogy time: Say you want to hire a sex worker. Here are two methods:

      1. Go find one that looks reasonable, initiate a negotiation. If you can find a mutually agreeable rate, hire her, otherwise continue looking for another one.

      2. Issue a "challenge" to all sex workers. Declare that every day for the next 30 days, every applicant must give you a free blow job. At the end of the 30 days you will declare a grand "winner", paying the best one $500.

      The difference between this analogy and the programmer challenge is that no sex worker would fall for the latter scenario.

      --
      My other UID is three digits.
    7. Re:Out of touch with reality by Anonymous Coward · · Score: 0

      I'm announcing my own $5 blowjob challenge.

      There's an interview process first, then a round of trials, then the winner will receive $5 in the form of an iTunes voucher.

      That's about as much sense as this announcement makes to me.

    8. Re:Out of touch with reality by mcswell · · Score: 1

      You might have a look at the IARPA releases on this, especially https://www.innocentive.com/ar.... Programmers are *not* being asked to release their software rights: "To receive an award, Solvers will not have to transfer their IP rights or grant a license to the Seeker – the purpose of the Challenge is to gauge how far recent advances in speech recognition have come in solving this important problem. With broad participation, this Challenge has the potential to provide IARPA with insights on the best next steps to stimulate research for solving this challenging problem." Of course, if someone does come up with a significant improvement on the state of the art, they might be in a good position to sell it--for >> $50k.

  6. Voice recognition - AI by ledow · · Score: 1

    Given my own personal experience with voice recognition, it's not a problem we can throw money at. We can throw money AWAY trying, but we haven't improved much in many, many years of trying.

    I don't have a particularly poor speech, or unusual accent, and English-speakers all understand me - even foreign English speakers like the one I live with. But speech recognition has always been an absolute flop unless I want to learn how to talk to the computer, which is the exact opposite of what I want to happen.

    Since the first days of Dragon NaturallySpeaking, it's never been worth the training time even if all I'm doing it dictating serial numbers, or product codes, using simple single letters spaced out in a silent environment. Telling the difference between "eight" and "A" is much more involved than just context matching on a rough FFT of my voice.

    And, as has been pointed out, someone who can do this will get a damn sight more than $50k reward as the patents would be worth billions.

    To do it properly, we're really looking into problems that are the equivalent of the higher functions of AI.

    1. Re:Voice recognition - AI by bouldin · · Score: 1

      Telling the difference between "eight" and "A" is much more involved than just context matching on a rough FFT of my voice.

      To do it properly, we're really looking into problems that are the equivalent of the higher functions of AI.

      Maybe the problem isn't with the AI techniques we're using, it's with the FFT.

      FFT assumes a very periodic, stable signal. It doesn't handle transients well at all.

  7. US Intelligence by Thanshin · · Score: 1

    When US Intelligence says something so clearly stupid, you always have to look for the subtext. The hidden message. The truth crouching behind the apparent idiocy.

    In this case, the hidden message seems to be "we are incompetent in even the simplest basics of our main task".

    1. Re:US Intelligence by gweihir · · Score: 1

      That is indeed the most plausible explanation...

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    2. Re:US Intelligence by mcswell · · Score: 1

      Can you explain (without revealing your own stupidity) what you think is so stupid about this?

  8. Saw this coming by Ol+Olsoc · · Score: 2

    First person arrested will be Stephen Hawking.

    --
    The shepherds did so well protecting the flock that the sheep no longer believed that wolves existed.
  9. Competitions by suso · · Score: 1

    As usual with competitions like this, you shouldn't settle for the prize money if you develop such a thing because its worth quite a bit more.

  10. Outsourcing by Anonymous Coward · · Score: 0

    1) Give 10$ to someone in India.
    2) Use him for voice recognition (good enough). Software forwards audio input to him and outputs the typed text.
    3) ???
    4) 49.99k$ PROFIT!

    1. Re:Outsourcing by mcswell · · Score: 1

      Go read the solicitation. They thought of that. It won't work.

  11. Sorry, but no ... by Anonymous Coward · · Score: 0

    Anybody who is participating in a contest to provide technology to the Director of National Intelligence is a moron who should be tried for treason.

    Because I bet the Director of National Intelligence probably should.

  12. (pinky to mouth) by jeffb+(2.718) · · Score: 1

    Fifty THOUSAND dollars!

    1. Re:(pinky to mouth) by ClickOnThis · · Score: 1

      Nice one. You beat me to it.

      --
      If it weren't for deadlines, nothing would be late.
  13. omission by Anonymous Coward · · Score: 1

    Coincidentally, this competition, by its very introduction also reveals a method for making massive automated eavesdropping difficult. Unless it produces a success, that is.

    1. Re:omission by Anonymous Coward · · Score: 0

      Yes. Noise generators are installed in secure areas for precisely this reason.

  14. Ridiculous by Anonymous Coward · · Score: 0

    Ignoring the obvious "it'll be used for spying" thing, such technology would be worth far more than $50k. It's the sort of technology that gets your start-up bought by Google for a bazillion dollars because it's a difficult long-standing problem in the industry.

  15. Listening through noise or interference by jeffb+(2.718) · · Score: 1

    I remember a demo out of IBM, I believe, for recognizing controlled vocabulary in high-noise environments. It handily OUT-performed humans -- listening to the test audio, you couldn't really be sure there was a human voice at all, but the software detected and interpreted the speech with high accuracy.

    This demo would have been circa 2000. I can't help imagining that there's been more progress since then.

    The proposed task, where the interference is correlated with the original sound, seems like fertile ground for superhuman performance again. The original signal gets replicated and redundantly presented. Our brains are hard-wired to be confused by that, but it seems like a well-designed speech-recognition system could take advantage of it.

    1. Re:Listening through noise or interference by Anonymous Coward · · Score: 0

      Well if you convolve a signal with a 1Hz sinc wave the signal is "replicated and redundantly presented" but provably most of the information in the original signal is now lost. Interference correlated with the original sound is a convolution. It destroys information, it has to. It makes the problem harder, that's why our brains can't handle it, although our brains have context-based processing which allows us to recover a lot more than a system without that.

      Also, our brains are not hard-wired.

    2. Re:Listening through noise or interference by bouldin · · Score: 1

      The proposed task, where the interference is correlated with the original sound, seems like fertile ground for superhuman performance again. The original signal gets replicated and redundantly presented. Our brains are hard-wired to be confused by that, but it seems like a well-designed speech-recognition system could take advantage of it.

      Mammalian auditory systems actually have a lot of wiring that seems dedicated to processing reverberation.

      I'm not familiar with the IBM demo you mention, but the key there is the controlled vocabulary. It was probably also trained on the speaker's voice. Those are huge constraints.

    3. Re:Listening through noise or interference by jeffb+(2.718) · · Score: 1

      I'm not familiar with the IBM demo you mention, but the key there is the controlled vocabulary. It was probably also trained on the speaker's voice. Those are huge constraints.

      I'm remembering that it was controlled-vocabulary, but speaker-independent. I think it was trained on spoken digits -- a very small vocabulary. It's been a long time, and I may be misremembering even the most basic details. Still, it was impressive to hear it picking out numbers where all I could hear was noise.

    4. Re:Listening through noise or interference by jeffb+(2.718) · · Score: 1

      Well if you convolve a signal with a 1Hz sinc wave the signal is "replicated and redundantly presented" but provably most of the information in the original signal is now lost. Interference correlated with the original sound is a convolution. It destroys information, it has to. It makes the problem harder, that's why our brains can't handle it, although our brains have context-based processing which allows us to recover a lot more than a system without that.

      Perhaps. As you can surely tell, this is well outside my expertise.

      Also, our brains are not hard-wired.

      Forgive my imprecise wording. Brain wiring is malleable, but there's a lot of built-in structure, especially around perception and language processing.

  16. Contest? by kingnite9915 · · Score: 1

    What? they didn't want to hire the same people who built the Obamacare site? I'm shocked!

  17. My Application by Overzeetop · · Score: 1
    --
    Is it just my observation, or are there way too many stupid people in the world?
  18. What a joke! by Anonymous Coward · · Score: 0

    Get the public to make your spy shit.

  19. intelligence? by l3v1 · · Score: 2

    So, who wants to be the one who improves the automatic speech2text capabilities of automatic wiretapping systems in the US for a few bucks? :))

    --
    I am putting myself to the fullest possible use, which is all I can think that any conscious entity can ever hope to do.
  20. 50K? by Anonymous Coward · · Score: 0

    These guys need to up their game.

  21. Readying Mechanical Turk...FTW? by xxxJonBoyxxx · · Score: 1

    Let's see...for $50K...I could probably write up a quick mobile app ($1K) that feeds microphone input into a streaming acceptance service on a server ($3K), that chops it up into wav files for Mechanical Turk processing. Fund that long enough to pass the POC stage ($2K), ride some odds (25%) and cash the check before the tech collapses = $6K for possible $12.5K win = $6.5K possible profit? Er...still no.

  22. They should get in touch with Raytheon... by Anonymous Coward · · Score: 0

    Raytheon already do a 300bps voice codec for highly noisy environments (Helicopters!), given the crossover between speech compression and speech recognition (speech recognition is essentially just a special case of speech compression) maybe together they could work something out.

  23. ASpIRE? by oldmac31310 · · Score: 1

    That acronym is an utter failure. It doesn't even work.

    --
    http://www.acetonestudio.com
  24. This is horrific by Deliveranc3 · · Score: 1

    Course we'll have a list of entrants! And that is probably a good thing!

    Where do people who would do this come from? Is it child abuse?

  25. Behind the curve by Anonymous Coward · · Score: 0

    Pfwa! The Ruskies did this in the forties/fifties.

    Check out the book "In the First Circle" by this guy named Aleksandr Solzhenitsyn.

  26. Only valid reason by Anonymous Coward · · Score: 0

    Really, the only valid reason to enter this competition is sabotaging their apparatus by making something that looks good at first but is a dead end (see Nazi Germany Nuclear scientests and their inflated calculations).

  27. US Intelligence by Anonymous Coward · · Score: 0

    No, it means they already have the technology, and to trick the public into thinking they don't only costs $50K.

  28. Presidential decree by Anonymous Coward · · Score: 0

    I propose that before this decade is out, America will put a microphone with speech recognition in every home and office on the planet.