Domain: nuance.com
Stories and comments across the archive that link to nuance.com.
Stories · 4
-
Google Opens Access To Its Speech Recognition API, Going Head To Head With Nuance (techcrunch.com)
An anonymous reader quotes a report from TechCrunch: Google is planning to compete with Nuance and other voice recognition companies head on by opening up its speech recognition API to third-party developers. To attract developers, the app will be free at launch with pricing to be introduced at a later date. The company formally announced the service today during its NEXT cloud user conference, where it also unveiled a raft of other machine learning developments and updates, most significantly a new machine learning platform. The Google Cloud Speech API, which will cover over 80 languages and will work with any application in real-time streaming or batch mode, will offer full set of APIs for applications to "see, hear and translate," Google says. It is based on the same neural network tech that powers Google's voice search in the Google app and voice typing in Google's Keyboard. Google's move will have a large impact on the industry as a whole -- and particularly on Nuance, the company long thought of as offering the best voice recognition capabilities in the business, and most certainly the biggest offering such services. -
Better Tools For Disabled Geeks?
layabout writes "We've seen tremendous advances in user interfaces over the past few years. Unfortunately, those UIs and supporting infrastructure exclude the disabled. In the same timeframe there has been virtually no advance in accessibility capabilities. It's the same old sticky keys, unicorn stick, speech recognition, text-to-speech that kind-of, sort-of, works except when you need to work with with real applications. Depending on whose numbers you use, anywhere from 60,000 to 100,000 keyboard users are injured every year — some temporarily, some permanently. In time, almost 100% of keyboard users will have trouble typing and using many if not all mobile computing devices. My question to Slashdot: Given that some form of disability is almost inevitable, what's keeping you from volunteering and working with geeks who are already disabled? By spending time now building the interfaces and tools that will enable them to use computers more easily, you will also be ensuring your own ability to use them in the future." Follow the link for more background on this reader's query.
This question is aimed mostly at the kind of disability we are susceptible to and I have been living with for the past 15 years. Even though we have speech recognition, it doesn't solve any problem except writing text. There have been a couple of attempts at making speech recognition more useful to programmers [0], but they have failed. The needs are clear:
[1] A working full-vocabulary, continuous recognition system on Linux.
[2] Tools that don't expect you to "speak the keyboard."
[3] Tools that let you edit as well as create code.
So why don't more geeks work on securing their own future, or at the very least, work to help their fellow geeks to stay on the economic ladder?
[0] VoiceCode and VR-Mode: VoiceCode or is an amazing piece of work. It makes it possible for a disabled programmer to generate Python code very quickly. Unfortunately, it does not solve the editing problem. Even more unfortunately, it's hand-wearingly complicated to set up and get working. VR-Mode makes it possible to use Naturally Speaking's "Select and Say" mode in Emacs — that is, if you can get it to work. It seems to have drifted into non-functionality as Emacs has moved forward.
[1] Naturally Speaking works well, is reasonably cheap, and works somewhat under Wine today. If we can make it work reliably under Wine, it solves the problem in months rather than decades. Other tools such as Sphinx 1-4 are great IVR systems if you have a vocabulary and grammar under 15,000 words. In contrast, Naturally Speaking's working vocabulary is in the 100,000-word range. Any disabled user will choose Naturally Speaking because it works so much better than the nearest alternative. We have people who are injured now and need these tools. They can't afford to wait 10 years or more for an OSS solution.
[2] "Speaking the keyboard" refers to speech user interfaces developed by people who don't use speech recognition. They expect you to say too much, which creates a vocal form of RSI — see [3]. Listen to what disabled users do, not to what you think they should speak.
[3] See VoiceCode in [0]. Unfortunately, today's tools are only for writing code, not correcting code. Code correction is a very different process and must be spoken in a different way: "change index" instead of "search forward left bracket leave mark search forward right bracket copy region." This is also an example of "speaking the keyboard." -
Death of the Cell Phone Keypad As We Know It?
An anonymous reader writes, "According to a CNet article, two companies called Mobience and Nuance have created viable and possibly better alternatives to the standard cell phone keypad. 'Mobience, which is based in South Korea, has redesigned the ABC and Qwerty key layout, and come up with MobileQwerty. It's essentially the same three-letters-per-key system as the standard mobile keypad layout, but the letters have been rearranged in a Qwertyesque way to increase efficiency.' The other system developed by Nuance is a mobile speech platform that turns speech into text and replaces the keypad altogether. I was skeptical at first but the video of Nuance's software vs. Ben Cook, the ex world texting champion, is undeniably impressive." -
Speech Recognition, Voice Verification -- Free
ten thirty writes: "TECHNOCRAT.NET recently featured a great article regarding the dawning (well, it's only a few of years old anyway) of speech recognition software within the open source community. In particular, the Sphinx project of Carnegie Mellon University is discussed, as well as some other systems such as Festival and a public domain project at the University of Missouri. The notion here is that eventually the GUI, which has come so far over the past two decades, will eventually be supplanted, at least for some applications, by the VUI. The question is, will the open-source community allow the integration of this technology into our society be spearheaded by closed-source vendors?"