Slashdot Mirror


Voice Is the Next Big Platform, But Amazon Already Owns It (backchannel.com)

Six million homes already have an Amazon device with it Alexa voice assistant -- about 5% of all households. But Backchannel argues that Amazon is already dominating the race to become the operating system for future voice-activated devices, with Forrester tech analyst James McQuivey pointing out that "having microphones in your environment is a lot more convenient than pulling out your phone." The Alexa-enabled Echo is a true unicorn, one of those rare products that arrives every few years and fundamentally changes the way we live... After years of false starts, voice interface will finally creep into the mainstream as more people purchase voice-enabled speakers and other gadgets, and as the tech that powers voice starts to improve.
Despite competition from Google Home, and a rumored "Home Hub" from Microsoft, Amazon "has a two-year jump on its competition, having first introduced the Echo speaker in November 2014," notes the article, adding that Amazon also "opened its platform early to third-party developers." (Alexa now has more than 5,000 "skills".) They argue that Amazon is already winning the war of the operating systems by familiarizing consumers with "a new computing interface -- a voice devoid of a screen -- that will eventually grow to be more ubiquitous and more useful than our smartphones... Soon, you'll speak your wants into the air -- anywhere -- and a woman's warm voice with a mid-Atlantic accent will talk back to you, ready to fulfill your commands."

13 of 229 comments (clear)

  1. Hmm... by Anonymous Coward · · Score: 5, Funny

    While in general I like the idea of a woman fulfilling my every command, I'm not sure it's worth it if she's constantly keeping tabs on me.

  2. Oh yeah, just what I need. by Anonymous Coward · · Score: 5, Insightful

    Some remote server listening in on everything I say, filtering every word, analyzing each sentence, etc.

    Say one wrong thing, and the appropriate authorities are automatically informed and dispatched automatically. Tax evasion? IRS shows up at your door. Diesel fuel and fertilizer? FBI. Feel like killing your manager who's been driving you nuts all week? Local police.

    Sign me up.

    I don't understand why none of this stuff operates locally. It's always some remote server in the cloud. I remember having IBM ViaVoice (back then I think it was called "Voice Type Dictation" or "SimplySpeaking") on my goddam Pentium 75mhz computer. After about an hour of training, it would nail mostly everything I said. I find it incredibly difficult to believe that we don't have the hardware resources necessary to perform local speech-to-text and text processing inside your house without ever touching the internet.

    1. Re:Oh yeah, just what I need. by Anonymous Coward · · Score: 5, Informative

      It is always listening (if you do not turn the mic off), it does notify you when it accesses the cloud service and... it does process locally for its activation word. Its supposedly not always streaming your voice to the cloud (since you'd wind up noticing that on your internet usage anyways). Processing voice isn't the problem, its the decision engine behind there that figures out what you wanted from the words input and then serves it up to you. On your device, this would be painfully slow, since it does require an AI of its own. So... stream voice to cloud, use eleventy-bajillion processors to do it all and the device is dumb and therefor cheap on the customer end. All the Echo, Dot, Siri, Cortana really do is record, stream, and playback your data... that's called an MP3 player and should be as cheap as one. That should be your complaint, that these things cost more than 20 bucks when they are the definition of a dumb terminal.

      From https://www.amazon.com/gp/help/customer/display.html?pop-up=1&nodeId=201602230#Question10

      "Amazon Echo and Echo Dot FAQs
      Amazon Echo and Echo Dot are far-field Alexa-enabled devices.

      1. How do Amazon Echo and Echo Dot recognize the wake word?

      Amazon Echo and Echo Dot use on-device keyword spotting to detect the wake word. When these devices detect the wake word, they stream audio to the Cloud, including a fraction of a second of audio before the wake word.

      2. How do I know when Amazon Echo or Echo Dot are streaming my voice to the Cloud?

      When Amazon Echo or Echo Dot detect the wake word, when you press the action button on top of the devices, or when you press and hold your remote's microphone button, the light ring around the top of your Amazon Echo turns blue, to indicate that Amazon Echo is streaming audio to the Cloud. When you use the wake word, the audio stream includes a fraction of a second of audio before the wake word, and closes once your question or request has been processed. Within Sounds settings in the Alexa App (Settings > [Your Device Name] > Sounds), you can enable a 'wake up sound,' a short audible tone that plays after the wake word is recognized to indicate that the device is streaming audio. You can also enable an 'end of request sound' that will play a short audible tone at the end of your request, to indicate that the connection has closed and the device is no longer streaming audio.

      3. Can I turn off the microphone on Amazon Echo and Echo Dot?

      Yes, you can turn Amazon Echo or Echo Dot's microphone off by pushing the microphone on/off button on the top of your device. When the microphone on/off button turns red, the microphone is off. The device will not respond to the wake word, nor respond to the action button, until you reactivate the microphone by pushing the microphone on/off button again. Even when the device’s microphone is off, Amazon Echo or Echo Dot will still respond to requests you make through your remote."

      All that aside, yes they could change this with a firmware update and you'd never know. And that would be why I will not be buying or using one. To find out they did it i'd have to commit a federal felony (thanks DMCA!)

    2. Re:Oh yeah, just what I need. by nospam007 · · Score: 5, Insightful

      "I don't understand why none of this stuff operates locally. "

      It's the fault of the couple of dozen missing IBM Watson type computers in your basement.

    3. Re:Oh yeah, just what I need. by postbigbang · · Score: 4, Insightful

      And everyone in the room is heard, each noise, appliance, looking for digital filtration, notation, and archiving while the folly of Big Data pours over such stuff, conflates, infers, and makes decisions based upon it, then rats the info to anyone who would pay.

      Then there are those that won't pay, just merely barge in with (or without) a warrant or writ of assistance to mangle the data for The State's purpose.

      Unicorn my ass. Rat Fink Stool Pigeon.

      --
      ---- Teach Peace. It's Cheaper Than War.
  3. Six million Alexa installs... compared to? by tlambert · · Score: 5, Informative

    Six million Alexa installs... compared to?

    A billion Apple devices with Siri... http://www.theverge.com/2016/1...

    Uh, who owns it again?

  4. I'll remain Luddite on this by No+Longer+an+AC · · Score: 5, Funny

    I have no desire to talk to my devices and I definitely don't want them listening to me either.

    I spent about 5 minutes playing with Okay Google on my phone and it wasn't very good and about 6 months later it finally responded to some music I was listening to and I realized I had never turned it off.

    And it really pisses me off when I am going through some voice prompt system and I can't just press a number for my response - it insists on a voice response. No, we don't speak the same language and your voice recognition system sucks.

    I also was very resistant to using a mouse and I also keep a pen and note paper in my desk.

    I was wrong about mice I guess - they are actually useful.

    But I see know use in these Alexa thingies. I could see getting a sarcastic parody device though. "Hey, Alexis, what's the weather like today?"

    "Look out the window, you moron! It's December. It's probably cold. Either that or it's very cold. It might even be snowing!"

    Just thinking of some of the commercials I've seen....

    1) Alexa, turn off the lights. Okay, haven't we had this technology ever since the Clapper was a thing? Clap On! Clap Off!

    2) Alexa, order more tape. Okay, right - like I order so much tape that Alexa knows what brand I buy, what kind I need and I'm not even concerned at all about the price because of course I'm going to get it from Amazon.

    3) Alexa, what's the weather like in Miami? If I really cared, I could easily look that up on the internet.

    This doesn't even pass the "Wow factor" test let alone the "do I need or even want it?" test.

    And I'd be willing to bet that within 5 minutes of getting one I'd be going all Samuel Jackson on it. "English, motherfucker. Do you speak it?"

    "Say 'what' one more time! I dare you!"

  5. ...and listen to everything in mic range. by jbn-o · · Score: 4, Insightful

    Soon, you'll speak your wants into the air -- anywhere -- and a woman's warm voice with a mid-Atlantic accent will talk back to you...

    (read in a woman's warm voice with a mid-Atlantic accent) ...and your computer will listen to everything in mic range. No need for that activity light on the mic/camera; it was operated by proprietary (read: always untrustworthy) software to begin with, and wasn't present on trackers (a more honest name for the devices also known as cell phones, mobile phones). You'll come to expect omnipresent listening, ostensibly waiting for you to give the command to signal that the computer should do something for you so you feel like you're in control. But in reality your computer has been doing something for so many proprietors all along—letting an uncountable number of parties spy on you. Because you brought these devices and services into your home, your car, and your workplace. Revel in the convenience of never really knowing if you're alone.

    And don't worry: they're not spying on you for your safety. The spying "feature" works on your tracker, your home computers, and various needlessly Internet-enabled devices like your next refrigerator, a child's toy, a lightbulb socket, and more.

  6. Stasi by Anonymous Coward · · Score: 5, Funny

    I live in Berlin. When I explain voice platforms to people, I roughly say: "in former times, they came into your flat, installed mics and even fixed the wallpaper whenever cabling was necessary. When you were back in, all was done and clean. *All costs where taken up by the state*".

    These days you gotta pay for it. And you gotta fix the cabling mess yourself. Now tell me Socialism was worse!

  7. Oh please by Billly+Gates · · Score: 3, Interesting

    Google won.

    Google knows where I live, work, where airport is if I travel, what flight I am, when restaurants in my area close etc.

    All the geeks in my IT department say OK Google when does X close? Or OK Google how far is X when looking at traffic while we drive. Amazon already lost and I see no value in such a device. Our phones know all the information based on habits and can even track traffic

  8. Wait for us, we're the leader... by QuietLagoon · · Score: 4, Insightful

    ... a rumored "Home Hub" from Microsoft ...

    Microsoft is still trying to live in its halcyon days when it seemed Microsoft could kill competition's products just by announcing that they had a similar product in beta.

    .
    If it weren't for Microsoft's stranglehold on corporate computing, Microsoft would have been a footnote by now...

  9. A little late... by Sqreater · · Score: 3

    This battle has already been lost. Cameras are everywhere and growing in number. Everything you do or say on the internet is parsed now. Your data is stolen on a regular basis. I've had Amazon Echo since the beginning and it is integrating itself into my life skill by skill. What's the weather? What's my commute time? What's on my calendar? Put this on my calendar. Buy this; buy that. (I have a mountain of TP sent by just telling her to buy it since it is already on my list, for example) Music. Lights control. Heat and air. And more skills are coming. She learns things using AI. When it comes to my car I'll have it even better. Put down that crank mentality and get a car with a starter.

    --
    E Proelio Veritas.
  10. Scaling things up by DrYak · · Score: 3, Informative

    I don't understand why none of this stuff operates locally. It's always some remote server in the cloud. I remember having IBM ViaVoice (back then I think it was called "Voice Type Dictation" or "SimplySpeaking") on my goddam Pentium 75mhz computer. After about an hour of training, it would nail mostly everything I said. I find it incredibly difficult to believe that we don't have the hardware resources necessary to perform local speech-to-text and text processing inside your house without ever touching the internet.

    The problems are scaling it up and the finer small details.

    Regarding speech :

    Modern offline text-to-speech technology is able to handle about 95% accuracy. (Being able to feed back based on past context to tell which homophone makes more sense, etc.)
    - Which is damn cool already (it's only 1 in 20 words that need to be fixed ! Fucking impressive !!!)
    - And is pretty useful to dictate toughs for those people who speak faster than they type (i.e.: most random joe six-pack outside /. and especially outside of steno communities), they can mostly speak what they want and only fix here and there (only a single word every 20. Or about a word every 2-3 sentences).
    - But that's completely useless on the scale of things which are required for Siri- / Alexa- / Cortana- / Whatever- type of constant speech flux of commands. The point is to completely do away with keyboard and mouse. Not to have to pull out a keyboard (or pull out your smartphone out of your pocket) to correct every third sentence you speak to your home assistant.

    The only practical application would be speaking in robotic rigid sentences. "Military-type radio speak" rigidity
    (Strict word ordering: "[name], [order: [verb] [noun] ]". Fixed protocols : AI should ack what it understood and ask for confirmation "[user], you ask me to [verb] the [noun] ?", and user should confirm/correct "Yes do it [=fixed sentence] / No [=fixed sentence], [followed by new order]")
    That is the kind of speech protocol that leaves very few ambiguity and risks of error (that's why it's used by military, law enforcement, catastrophe responders, or simply people working outdoor with very noisy radio conditions - ski teacher of a club spread accross mountains in my personnal experience).
    That could work nearly flawlessly with modern tech.
    But it is very far from the "having a casual discussion with your assistant" experience that most companies are wanting to sell.

    To reach that level of fluent conversation, current experience shows (100% fully autonomous real-time text subtitling, 100% fully autonomous real-time translation, etc.) that you needs several orders magnitude more accuracy (think 99.9% accuracy. Only one missed word every thousand. Or in practice an error every day or so). And due to the law of diminishing returns, that means fuck-tons more of processing power. Several data-centers worth of processing in your basement.
    (Don't believe me ? Look at youtube auto-generated subtitles. And Google certainly throws more processing power at them than simply a desktop computer).

    And all the above is only about *parsing* the speech (i.e.: getting the speech-to-text accurate enough). Then you need to make *sense* out of the speech.

    Again, with modern technology, making the system react to a bunch of preset command is trivial (the kind where you write a plug-in to get new commands supported) and could probably be handled on raspberry pi.
    But again the things that these companies are trying to sell to random users are much more complex : "Having a natural conversation with your assistant".

    That require three things :
    - tons more of processing (good bye, raspberry pi)
    - tons more of reference data (much more than a few commands that the user has custom pre-configured)
    - fuck ton of data gathering... (recording every command spoken by every user)
    - ...coupled with analysis of reject / mis-interpreted command... (most probably by huma

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]