Slashdot Mirror


Google Opens Access To Its Speech Recognition API, Going Head To Head With Nuance (techcrunch.com)

An anonymous reader quotes a report from TechCrunch: Google is planning to compete with Nuance and other voice recognition companies head on by opening up its speech recognition API to third-party developers. To attract developers, the app will be free at launch with pricing to be introduced at a later date. The company formally announced the service today during its NEXT cloud user conference, where it also unveiled a raft of other machine learning developments and updates, most significantly a new machine learning platform. The Google Cloud Speech API, which will cover over 80 languages and will work with any application in real-time streaming or batch mode, will offer full set of APIs for applications to "see, hear and translate," Google says. It is based on the same neural network tech that powers Google's voice search in the Google app and voice typing in Google's Keyboard. Google's move will have a large impact on the industry as a whole -- and particularly on Nuance, the company long thought of as offering the best voice recognition capabilities in the business, and most certainly the biggest offering such services.

13 of 46 comments (clear)

  1. Nuance the Biggest by Ksevio · · Score: 3, Interesting

    It's not so much that Nuance is known for being the best for a long time, it's more that they've bought out all their competitors and have pretty much controlled the market.

    1. Re:Nuance the Biggest by Anonymous Coward · · Score: 3, Interesting

      we work in transcription business. that is exactly what nuance did, and do, especially the medical transcription segment.

      american-based, native english speaking transcriptionists are essentially just training nuance's computers to do the transciptionists' jobs. once the voice recognition accuracy hits a certain mark, they outsource to india or some other piss-poor country with lower wages and more favorable-to-them contract and labor laws, the editing of their now trained and automated output

      and we do that work with lower wages (usually a piece rate per line; if per hour then with quotas to meet) than we got 5 or 10 or even 20 years ago. nuance's end user software is buggy, proprietary, prone to crashes, and really not all that secure either... not even mentioning the security and privacy nightmare of sending the work offshore.

      in five years, our work will be hard to come by, as facilities continue buying into the bullshit nuance sells. the only thing that might save some of our jobs is if congress passes a law that says our medical records have to stay in the country unless absolutely needed for a traveling patient's care.. then we'd at least get the editing work.

      but that editing work (that does stay in country) pays half or less the rate of actually transcribing a document, and nuance already pays half or even only a third the rate a facility or doctor would pay directly to a contracting transcriptionist. they likely pay nuance the same, but they of course, nuance has to have their own, and the largest, piece of the pie. so not only are we only working ourselves out of jobs by training nuance's computers, but we get shitty pay to do it, and have no choice in the matter because of nuance's chokehold on the market.

    2. Re:Nuance the Biggest by Livius · · Score: 3

      1) Transcription doesn't require the level of skill that practising medicine does, but it's skilled work and there is a lot more to it than typing.

      2) It's one thing to be replaced by a computer that genuinely replaces the work you do. It's another to lose your livelihood or have your income reduced by software that is terrible at the work. People using transcription software generally are getting less value for their money even though they might be paying less for the first draft, while the talents of transcriptionists who want the work are under-used.

    3. Re:Nuance the Biggest by Tupper · · Score: 3, Interesting

      The nerds at Ma Bell used to provide very high quality telephony; they were shocked and appalled when the market chose low quality low cost telephony. The medical transcription market has gone through the same change..

      The documents, especially the ones used clinically, can suffer from lower quality of ASR and/or offshoring.. Also, in the old days, light editing was usually part of the process. This happens less in today's price obsessed market and sadly results in less readable reports.

      On the other hand, today it's possible to get turn around times of 0 with document issues identified in real time by NLP. That is a really big improvement. (I don't know if Nuance has that, but if they don't, they will soon)

  2. Privacy by markdavis · · Score: 4, Interesting

    >" Google says. It is based on the same neural network tech that powers Google's voice search in the Google app and voice typing in Google's Keyboard."

    Indeed. So does this mean Google will store and mine and analyze and profitize the spoken text data too?

    1. Re:Privacy by ScrewMaster · · Score: 2

      Google's entire approach to speech recognition is based on big data, so yes, they will be "mining" it in the sense that they will use it to continually improve the technology, and improve accuracy for the individual user. I would be surprised if they didn't use that data for targeted ads (after all, that is what they do) but being Google there will likely be an easy opt-out.

      --
      The higher the technology, the sharper that two-edged sword.
  3. Do I have to say it? by 93+Escort+Wagon · · Score: 2

    To attract developers, the app will be free at launch with pricing to be introduced at a later date.

    The first one's always free...

    --
    #DeleteChrome
    1. Re:Do I have to say it? by ShanghaiBill · · Score: 4, Informative

      if they were to announce the future pricing now it might even be worth trying.

      Keep in mind that the VR API used to be open, then they closed it, screwing anyone using it. Now they are opening it up again "for free", but it will supposedly be yanked away yet again, when/if they finally decide on the pricing. Google has a terrible record of supporting their products. You would be foolish to rely on this API if you have any alternative.

    2. Re:Do I have to say it? by 93+Escort+Wagon · · Score: 2

      That's what Google does, though. Create something amazing ...

      Sometimes they do create it... but more often they buy it, run with it for a while, and then shut it down.

      --
      #DeleteChrome
  4. 2018 Headline by jwymanm · · Score: 2

    Google Closes Access To Its Speech Recognition API, 3rd party developers left scratching heads

  5. Pebble Time has been waiting for this by Wizarth · · Score: 4, Interesting

    I'm waiting to see if/how this affects Pebble Time. We've been wanting access to the Google Voice API for ages now. Personally I want it mostly for Google Now integration, which may or may not be separate.

  6. Re: Hot Air by ScrewMaster · · Score: 2

    Nuance's Dragon Naturallyspeaking is about the most frustrating, ill-conceived, effectively-unsupported, crash-prone, erratic and generally flaky application of its kind on the market. It's unstable, unpredictable, and regularly drives every user I know into apoplexy. The problem is, they just don't care. Really, they don't: bugs are left unaddressed for years, often through several major "revisions", because they know that there's nowhere else for users to go. That's especially true if one needs their specialized vocabularies.

    If anyone wants to know why monopolies are bad ... this is it.

    --
    The higher the technology, the sharper that two-edged sword.
  7. Unrealistic expectations by fyngyrz · · Score: 2

    Any idea you might have that the market will do what you think is optimum is based upon a complete misunderstanding of markets.

    Markets often choose inferior performance options. High quality solutions often fail to gain, or keep, traction. No undertaking that doesn't have significant lobbying impact (which of course means high $) with the relevant legislature can reasonably expect its business model to be protected in the face of any particular eroding force. Once a particular solution to a problem has been chosen, it is very likely that any change has social hurdles to overcome: those having made the decision are invested; training costs and familiarization erect similar barriers; disruption of stockholder confidence can be a factor.

    --
    I've fallen off your lawn, and I can't get up.