Slashdot Mirror


Google Opens Access To Its Speech Recognition API, Going Head To Head With Nuance (techcrunch.com)

An anonymous reader quotes a report from TechCrunch: Google is planning to compete with Nuance and other voice recognition companies head on by opening up its speech recognition API to third-party developers. To attract developers, the app will be free at launch with pricing to be introduced at a later date. The company formally announced the service today during its NEXT cloud user conference, where it also unveiled a raft of other machine learning developments and updates, most significantly a new machine learning platform. The Google Cloud Speech API, which will cover over 80 languages and will work with any application in real-time streaming or batch mode, will offer full set of APIs for applications to "see, hear and translate," Google says. It is based on the same neural network tech that powers Google's voice search in the Google app and voice typing in Google's Keyboard. Google's move will have a large impact on the industry as a whole -- and particularly on Nuance, the company long thought of as offering the best voice recognition capabilities in the business, and most certainly the biggest offering such services.

46 comments

  1. Saying "penis" to it just for the heck of it. by Anonymous Coward · · Score: 0

    Would anyone here be willing to admit that they would use this service just so they could repeatedly say "penis" to it for no reason other than to have it recognize that they're saying "penis" to it?

    1. Re:Saying "penis" to it just for the heck of it. by Anonymous Coward · · Score: 0

      No, it's pee innis.

  2. Nuance the Biggest by Ksevio · · Score: 3, Interesting

    It's not so much that Nuance is known for being the best for a long time, it's more that they've bought out all their competitors and have pretty much controlled the market.

    1. Re:Nuance the Biggest by WarJolt · · Score: 1

      It's not so much that Nuance is known for being the best for a long time, it's more that they've bought out all their competitors and have pretty much controlled the market.

      It's mostly that they were afraid of losing market share to Alexa Voice Service, which was opened up to developers a while ago.

    2. Re:Nuance the Biggest by Anonymous Coward · · Score: 3, Interesting

      we work in transcription business. that is exactly what nuance did, and do, especially the medical transcription segment.

      american-based, native english speaking transcriptionists are essentially just training nuance's computers to do the transciptionists' jobs. once the voice recognition accuracy hits a certain mark, they outsource to india or some other piss-poor country with lower wages and more favorable-to-them contract and labor laws, the editing of their now trained and automated output

      and we do that work with lower wages (usually a piece rate per line; if per hour then with quotas to meet) than we got 5 or 10 or even 20 years ago. nuance's end user software is buggy, proprietary, prone to crashes, and really not all that secure either... not even mentioning the security and privacy nightmare of sending the work offshore.

      in five years, our work will be hard to come by, as facilities continue buying into the bullshit nuance sells. the only thing that might save some of our jobs is if congress passes a law that says our medical records have to stay in the country unless absolutely needed for a traveling patient's care.. then we'd at least get the editing work.

      but that editing work (that does stay in country) pays half or less the rate of actually transcribing a document, and nuance already pays half or even only a third the rate a facility or doctor would pay directly to a contracting transcriptionist. they likely pay nuance the same, but they of course, nuance has to have their own, and the largest, piece of the pie. so not only are we only working ourselves out of jobs by training nuance's computers, but we get shitty pay to do it, and have no choice in the matter because of nuance's chokehold on the market.

    3. Re:Nuance the Biggest by Anonymous Coward · · Score: 0, Interesting

      Oh well, another sore loser business with no skills but hearing and typing.
      The whining is very interesting though.

      Before the end of this decade, it is predicted that AI/Machine learning is going kill off five million jobs.

      The good thing is, you won't have to blame the piss-poor countries for it.

    4. Re:Nuance the Biggest by Livius · · Score: 3

      1) Transcription doesn't require the level of skill that practising medicine does, but it's skilled work and there is a lot more to it than typing.

      2) It's one thing to be replaced by a computer that genuinely replaces the work you do. It's another to lose your livelihood or have your income reduced by software that is terrible at the work. People using transcription software generally are getting less value for their money even though they might be paying less for the first draft, while the talents of transcriptionists who want the work are under-used.

    5. Re:Nuance the Biggest by Anonymous Coward · · Score: 0

      Also, their competitors tended to suck. I worked for one: watching the president do the "weekly meeting" where he'd post about devoted user emails he'd obviously written himself, encouraging the employees to astroturf the products on social media, and watching the sales department present their growth curve by saying "we had a little delay, but we're still this exponential growth curve!" when the growth was actually on an entirely different, much, much flatter curve and would have to make quantum leaps to ever reconnect to the curves they kept publishing to potential investors was hysterically funny, and made me cry about how nonsensical their math and science were. And yes, I looked at the patents they were fighting over: the work was *obviously* lifted from the same approaches as Dragon Naturally Speaking, and the new patents for both companies were *nonsense*. They were obvious derivations of much older work and should never have been patented.

      Hell, a lot of the work was derived from Robert Licklider's work in the 1950's, and kept insisting on trying to analyaze sounds for frequency power bands, a theory which is computationally easy to do but has been proven not to work for decades.

    6. Re:Nuance the Biggest by Tupper · · Score: 3, Interesting

      The nerds at Ma Bell used to provide very high quality telephony; they were shocked and appalled when the market chose low quality low cost telephony. The medical transcription market has gone through the same change..

      The documents, especially the ones used clinically, can suffer from lower quality of ASR and/or offshoring.. Also, in the old days, light editing was usually part of the process. This happens less in today's price obsessed market and sadly results in less readable reports.

      On the other hand, today it's possible to get turn around times of 0 with document issues identified in real time by NLP. That is a really big improvement. (I don't know if Nuance has that, but if they don't, they will soon)

  3. Privacy by markdavis · · Score: 4, Interesting

    >" Google says. It is based on the same neural network tech that powers Google's voice search in the Google app and voice typing in Google's Keyboard."

    Indeed. So does this mean Google will store and mine and analyze and profitize the spoken text data too?

    1. Re:Privacy by WarJolt · · Score: 1

      The speech goes into retraining the machine. They profit from the transcribed data as well.

    2. Re:Privacy by AHuxley · · Score: 1

      Re '"analyze and profitize the spoken text data too?"
      Pics, text, sound and any other environmental sensor data found networked will feed the ads ... :)
      Google looks to patent tech that listens to calls to promote ads (23 March 2012)
      http://www.cnet.com/news/googl... "..the patent application also looks into placing onto people's computers online ads that are influenced by data from environmental sensors--such as temperature, humidity, light, and sound. "

      --
      Domestic spying is now "Benign Information Gathering"
    3. Re:Privacy by Livius · · Score: 1

      does this mean Google will store and mine and analyze and profitize

      No, it doesn't mean that.

      Though only because Google is already doing it.

    4. Re:Privacy by mlw4428 · · Score: 1

      If you choose to use the product/service -- sure, why not? Do you think that creating, maintaining, and upgrading this kind of system is cheap?

    5. Re:Privacy by ScrewMaster · · Score: 2

      Google's entire approach to speech recognition is based on big data, so yes, they will be "mining" it in the sense that they will use it to continually improve the technology, and improve accuracy for the individual user. I would be surprised if they didn't use that data for targeted ads (after all, that is what they do) but being Google there will likely be an easy opt-out.

      --
      The higher the technology, the sharper that two-edged sword.
  4. Do I have to say it? by 93+Escort+Wagon · · Score: 2

    To attract developers, the app will be free at launch with pricing to be introduced at a later date.

    The first one's always free...

    --
    #DeleteChrome
    1. Re:Do I have to say it? by bugs2squash · · Score: 1

      if they were to announce the future pricing now it might even be worth trying.

      --
      Nullius in verba
    2. Re:Do I have to say it? by ShanghaiBill · · Score: 4, Informative

      if they were to announce the future pricing now it might even be worth trying.

      Keep in mind that the VR API used to be open, then they closed it, screwing anyone using it. Now they are opening it up again "for free", but it will supposedly be yanked away yet again, when/if they finally decide on the pricing. Google has a terrible record of supporting their products. You would be foolish to rely on this API if you have any alternative.

    3. Re:Do I have to say it? by Anonymous Coward · · Score: 0

      That's what Google does, though. Create something amazing and then mothball it a year down the road for no apparent reason, fucking a hell of a lot of people over.

    4. Re:Do I have to say it? by 93+Escort+Wagon · · Score: 2

      That's what Google does, though. Create something amazing ...

      Sometimes they do create it... but more often they buy it, run with it for a while, and then shut it down.

      --
      #DeleteChrome
    5. Re:Do I have to say it? by Anonymous Coward · · Score: 1

      I couldn't agree more. Google has established a pattern of either buying or creating something cool and then shutting it down when some new whim takes hold. They are like a little spoiled kid in a toy store. TBH, either one is annoying as hell.

  5. 2018 Headline by jwymanm · · Score: 2

    Google Closes Access To Its Speech Recognition API, 3rd party developers left scratching heads

  6. Guffaw. by SeaFox · · Score: 1

    ...the app will be free at launch with pricing to be introduced at a later date.

    /insert metaphor about drug dealers here

  7. Free QA by Anonymous Coward · · Score: 0

    Yes, thanks you for the free QA.

  8. Let's try it out by Anonymous Coward · · Score: 0

    I'm using Google Voice to Text thing to write this, just to see how good it is or if it's today or not. Generally I think it works pretty good but every once in awhile it really fucked up my words. For instance in this message so far I see at least two words that is messed up.

  9. Pebble Time has been waiting for this by Wizarth · · Score: 4, Interesting

    I'm waiting to see if/how this affects Pebble Time. We've been wanting access to the Google Voice API for ages now. Personally I want it mostly for Google Now integration, which may or may not be separate.

  10. Hot Air by Anonymous Coward · · Score: 0

    Nothing to see here, folks. Google will end support for this in short order anyways. I recommend picking something else that won't be abandoned...which means things other than Google offers.

    1. Re: Hot Air by Anonymous Coward · · Score: 0

      Care to point to another voice reco app that is not nuance? Thanks...

    2. Re: Hot Air by ScrewMaster · · Score: 2

      Nuance's Dragon Naturallyspeaking is about the most frustrating, ill-conceived, effectively-unsupported, crash-prone, erratic and generally flaky application of its kind on the market. It's unstable, unpredictable, and regularly drives every user I know into apoplexy. The problem is, they just don't care. Really, they don't: bugs are left unaddressed for years, often through several major "revisions", because they know that there's nowhere else for users to go. That's especially true if one needs their specialized vocabularies.

      If anyone wants to know why monopolies are bad ... this is it.

      --
      The higher the technology, the sharper that two-edged sword.
    3. Re: Hot Air by Anonymous Coward · · Score: 0

      https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/speech-to-text/

  11. Can we ditch the cloud? by Anonymous Coward · · Score: 0

    This is interesting but not really where I'd prefer to see speech recognition going. The current approach seems to be to do speech recognition in the cloud, which results in a huge level of concern about where your speech is being sent (smart TVs always listening to your living room etc.)

    It would be great to see someone addressing the privacy concern by developing a decent offline speech recognition system. There are a few free software efforts: CMU Sphinx, Julius and Kaldi; but none seem to be really fully mature and there's little benchmarking to indicate how well they perform.

    1. Re:Can we ditch the cloud? by ScrewMaster · · Score: 1

      The problem with alternatives is that any successful ones will immediately run afoul of Nuance's intellectual property lawyers.

      --
      The higher the technology, the sharper that two-edged sword.
    2. Re:Can we ditch the cloud? by omnichad · · Score: 1

      The current approach seems to be to do speech recognition in the cloud

      There's a reason for this. They use a neural network and an absolutely massive dataset. They seeded this data set with GOOG-411 a few years before Google Now came out. Microsoft did the same thing with BING-411 when GOOG-411 shut down and now we have Cortana.

    3. Re:Can we ditch the cloud? by Anonymous Coward · · Score: 0

      There's a reason for this. They use a neural network and an absolutely massive dataset. They seeded this data set with GOOG-411 a few years before Google Now came out. Microsoft did the same thing with BING-411 when GOOG-411 shut down and now we have Cortana.

      All of the reasons are market / bullshit based. We had more options a decade ago for offline recognition then we do now and your jumbling training datasets and running set together as if they are the same.

      If you believe there is any technical reason we can't have useful offline recognition: http://arxiv.org/pdf/1603.0318...

    4. Re:Can we ditch the cloud? by Anonymous Coward · · Score: 0

      I agree with this. In 2013 a number of people figured out how to get Google's speech recognition to work offline. It required root and was definitely a hack, but I got it to work on my HTC One V. It worked completely offline and its accuracy was great--and I mumble. It was also possible to add in custom word lists which could further improve accuracy. I was hoping to use offline voice recognition as a feature in an Android app for botanists, where cell signals cannot be assumed and where the sunlight makes it annoying to peck at the screen you can't read and being able to control it with your voice would be very useful. I was never able to get offline voice recognition working on any other phone though, and requiring root and a bunch of funky steps kind of killed its practicality.

      This Stack Overflow question gives the gist of what one needed to do to get it working. I used some other resources as well which I can't locate right now. It's kind of a non-starter though, as Google clearly doesn't intend this functionality to be possible--and it might not be anymore. So you'd be running afoul of their policies or lawyers or whatever.

      An open source solution would be great but from what I've seen the existing projects are experimental with changing APIs. Since Google's speech recognition worked fine offline, it's definitely possible to do. One would think that the community could help out by providing speech samples and by helping do manual quality control (does this speech match this machine-translated text, or transcribing difficult passages to help refine the algorithms). I don't really understand the approaches behind speech recognition but providing training datasets (for many languages) seems like something interested people could help with.

  12. mass survaillance... by Anonymous Coward · · Score: 0

    Now NSA can spy on non-english language speech..... for free.... good /rofl...

  13. Local STT is the optimum end game by fyngyrz · · Score: 1

    The world needs high quality STT that works when the net is down and isn't vulnerable to arbitrary changes in API, availability, and legal impediments.

    It's clearly one of the harder software problems, but I expect it to be solved in fairly short order; years, not decades.

    --
    I've fallen off your lawn, and I can't get up.
    1. Re:Local STT is the optimum end game by mrchaotica · · Score: 1

      The world needs high quality STT that works when the net is down and isn't vulnerable to arbitrary changes in API, availability, and legal impediments.

      Not to mention, security issues (e.g. of the "sending all your private speech to the NSA" variety).

      --

      "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz

    2. Re:Local STT is the optimum end game by fyngyrz · · Score: 1

      Likely your phone would be doing that anyway -- if the NSA cared even in the slightest about you in particular. They're doing it on every phone call anyway. Government is long out of control on privacy issues. Then there's the "smart TV" issue...

      Orwell was an optimist

      --
      I've fallen off your lawn, and I can't get up.
  14. Unrealistic expectations by fyngyrz · · Score: 2

    Any idea you might have that the market will do what you think is optimum is based upon a complete misunderstanding of markets.

    Markets often choose inferior performance options. High quality solutions often fail to gain, or keep, traction. No undertaking that doesn't have significant lobbying impact (which of course means high $) with the relevant legislature can reasonably expect its business model to be protected in the face of any particular eroding force. Once a particular solution to a problem has been chosen, it is very likely that any change has social hurdles to overcome: those having made the decision are invested; training costs and familiarization erect similar barriers; disruption of stockholder confidence can be a factor.

    --
    I've fallen off your lawn, and I can't get up.
  15. Don't get too comfy with. by Anonymous Coward · · Score: 0

    Like many of google's products, if it doesn't take off or they can't profit, it'll be shut down and you and your users will be left out in the cold.

    https://en.wikipedia.org/wiki/Category:Discontinued_Google_services

  16. Re:Hey! by omnichad · · Score: 1

    Homsar, is that you?