Annual Smart Speaker IQ Test (loupventures.com)

← Back to Stories (view on slashdot.org)

Annual Smart Speaker IQ Test (loupventures.com)

Posted by msmash on Friday December 21, 2018 @07:25AM from the battle-royale dept.

Research firm Loop Ventures published its annual Smart Speaker IQ Test this week. Like earlier iterations of the test, it put the top smart assistants and speakers head-to-head, grading them on a wide range of queries and commands. From the report: We asked each smart speaker the same 800 questions, and they were graded on two metrics: 1. Did it understand what was said? 2. Did it deliver a correct response? The question set, which is designed to comprehensively test a smart speaker's ability and utility, is broken into 5 categories:
Local -- Where is the nearest coffee shop?
Commerce -- Can you order me more paper towels?
Navigation -- How do I get to uptown on the bus?
Information -- Who do the Twins play tonight?
Command -- Remind me to call Steve at 2 pm today.

It is important to note that we continue to modify our question set in order to reflect the changing abilities of AI assistants. As voice computing becomes more versatile and assistants become more capable, we will continue to alter our test so that it remains exhaustive. Results: Google Home continued its outperformance, answering 86% correctly and understanding all 800 questions. The HomePod correctly answered 75% and only misunderstood 3, the Echo correctly answered 73% and misunderstood 8 questions, and Cortana correctly answered 63% and misunderstood just 5 questions.

26 of 129 comments (clear)

Min score:

Reason:

Sort:

A command they all need to honor by nwaack · 2018-12-21 07:32 · Score: 4, Funny

before anyone should ever put one of these in their house: "Alexa/Siri/Google, stop spying on me."
1. Re:A command they all need to honor by hawguy · 2018-12-21 07:45 · Score: 4, Funny
  
  Reminds me of this:
  https://www.reddit.com/r/The_D...
  For those that don't want to click:
  
  People from the 60's: "I better not say that or the government will wiretap my house"
  People today: "Hey wiretap, do you have a recipe for pancakes?"
2. Re:A command they all need to honor by swillden · 2018-12-21 08:17 · Score: 4, Interesting
  
  Sound drivers are user-removable, yes they are. You can verify non-function of the speakers and mic on most systems. Again, conflating phones, PC's and "smart" assistants is reductive in terms of actual security.
  Well, it is for people who actually disable the microphones on their laptop and cell phone (which would make it not a "phone" any more, wouldn't it?). Do you do that? If so, your commitment to privacy is impressive. Also misguided, but impressive.
  For the other 99.999% of the population, hawguy has a very good point. If you believe that companies are willing to violate their claims about what their devices do (which, note, is often illegal), then you have to assume that any and all of them might be listening to you. If you believe they're honest about what their devices do (and again, note that you don't have to believe in their honorable nature or good intentions to believe that, just their unwillingness to risk the legal and PR disaster that could result from lying), then smart speakers are fine, because they only record/transmit after their hotword is spoken and they let you review and optionally delete everything they recorded.
  To make my evaluation of these risks clear, I carry a cellphone with multiple microphones and cameras, use a laptop with integrated microphone and camera and a desktop with an attached Logitech microphone/camera -- with drivers properly installed and the peripheral fully functional because I use it for video conferencing -- and I have eight smart speakers scattered around my house and I'm contemplating buying a ninth.
  
  --
  Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
3. Re:A command they all need to honor by nwaack · 2018-12-21 08:25 · Score: 2
  
  Just because the smart speaker is the only device that advertises that it's listening to you, that doesn't mean it's the only device that is.
  Yes, but it's the only one whose MAIN PURPOSE IS TO SPY ON YOU. While unfortunate and annoying that all those other things you listed *might* be spying on you from time to time, they have a ton of other uses. And, in most cases, you can turn the "spy stuff" off. Whereas the only use for a smart speaker is to listen in on every single thing you do. If someone chooses to put these in their house they're welcome to do so, but I'll pass.
4. Re:A command they all need to honor by markdavis · 2018-12-21 08:25 · Score: 2, Interesting
  
  >"you can't reliably lock down a laptop."
  Yes you can, to the highest degree of what is even possible, when it is running Linux. You are in control of which distro, what things are loaded, what services are available and running, how it is configured, have 100% root control, when and how it is updated, and all the code is open source.
5. Re:A command they all need to honor by dunkelfalke · 2018-12-21 09:15 · Score: 2
  
  Only if you write your own firmware for every piece of your hardware.
  
  --
  "It's such a fine line between stupid and clever" -- David St. Hubbins, Spinal Tap
6. Re: A command they all need to honor by jrumney · 2018-12-21 11:12 · Score: 2
  
  Even running Linux you are not in full control on any recent laptop that ships with Intel Management Engine or the AMD equivalent.
Re:Spyspeaker test you mean? by hawguy · 2018-12-21 07:41 · Score: 3, Informative

Why the fuck would anyone allow that shit in your home? Basically everything you say can and will be recorded for future law enforcement fishing expeditions.
That's not correct -- only anything you say after the wake-word is recorded. (unless, of course, you use the device to call your boss and talk crap about him and get fired).
If you have evidence that the devices have been used for general spying without having said the wake-word, I'd like to see it.
Re:Be best. by Tablizer · 2018-12-21 07:46 · Score: 2

A 3-way debate between Alexa, Siri and Trump.. who would win?
Probably the goldfish.

--
Table-ized A.I.
Quite a jump up for Siri by 93+Escort+Wagon · 2018-12-21 07:47 · Score: 4, Insightful

Last year it was at 52%, now it's at 75%. Google increased from 81% to 88%.
But still... even when understanding my query isn't an issue, I've found that typing/clicking is faster than talking for setting up most things - the exceptions being "set a timer" and "when I get home, remind me to ...".

--
#DeleteChrome
1. Re:Quite a jump up for Siri by thegarbz · 2018-12-21 08:22 · Score: 4, Insightful
  
  I've found that typing/clicking
  Even when it requires any of the following?:
  a) starting a laptop
  b) unlocking a phone with a passcode
  c) getting out of your chair because it's not within reach
  d) needing wash your hands
  e) needing to drop what you are currently holding on to
  f) no fuckit, this should be a) right at the very top: taking your eyes off the road
  The context around our actions are far more important than any action itself.
Would've liked to see Mycroft by aitikin · 2018-12-21 07:58 · Score: 4, Interesting

It would've been nice if they put a Raspberry Pi with Mycroft in this as well. I'd actually be interested in the results of that one.

--
"Don't meddle in the affairs of a patent dragon, for thou art tasty and good with ketchup." ~ohcrapitssteve
Alexa, kill Kenny by Joe_Dragon · 2018-12-21 08:00 · Score: 4, Funny

Alexa, kill Kenny
1. Re:Alexa, kill Kenny by Anonymous Coward · 2018-12-21 08:12 · Score: 2, Funny
  
  Oh my god, she killed Kenny!
Re:Spyspeaker test you mean? by Anonymous Coward · 2018-12-21 08:01 · Score: 2, Informative

Do you think the wake word algorithm is perfect?
https://www.npr.org/sections/thetwo-way/2018/05/25/614470096/amazon-echo-recorded-and-sent-couples-conversation-all-without-their-knowledge
Re:Be best. by Oswald+McWeany · 2018-12-21 08:10 · Score: 5, Funny

A 3-way debate between Alexa, Siri and Trump.. who would win?
In a three way debate between those three you'd end up getting a $5 billion border wall ordered on your Amazon account by accident and be encouraged to buy a newer more expensive wall next year that is missing a headphone port.

--
"That's the way to do it" - Punch
bummer by cascadingstylesheet · 2018-12-21 08:13 · Score: 2

I thought they administered an actual IQ test ... now that would be interesting ...
Re:Practical usage examples? by Kristoph · 2018-12-21 08:18 · Score: 4, Interesting

I gave one each to my kids so they can play music, send and receive messages, and ask random questions while their doing homework. I found that a better alternative then giving them a device with a screen.
I find the interactions kids have with these things very interesting because after a while the device becomes integral to their workflow. My daughter will sometimes ask Siri dozens of question an hour when she is doing something Siri is familiar with ( like chemistry, geography, history and so on ).
I could, of course, personally lookup the density of sugar or some historical fac or whatever when my daughter needs help with that but I am not always available and even when I am I am not adding much to the interaction.
Percentage improvement in TFA is wrong by Solandri · 2018-12-21 08:36 · Score: 5, Insightful

You can't compare improvement as a percentage of success rate because the value of a % changes depending on what your success rate is. e.g. Increasing from 10% to 15% successes is not very impressive, while improving from 94% to 99% is very impressive, even though they're both a 5% improvement. To correctly compare, you have to invert and compare based on proportional decrease in failure rate.

Google
88% in 2018, or 12% failure rate
81% in 2017, or a 19% failure rate
12/19 = 0.63, or a 37% reduction in failures compared to last year

Siri
75% in 2018, or 25% failure rate
53% in 2017, or a 47% failure rate
25/47 = 0.53, or a 47% reduction in failures compared to last year

Alexa
72% in 2018, or 28% failure rate
63% in 2017, or a 37% failure rate
28/37 = 0.76, or a 24% reduction in failures compared to last year

Cortana
63% in 2018, or 37% failure rate
56% in 2017, or 44% failure rate
37/44 = 0.84, or a 16% reduction in failures compared to last year

The same problem crops up when comparing car MPG, which is actually the inverse of fuel efficiency so bigger MPG numbers actually represent smaller fuel savings. e.g. Switching from a 20 MPG vehicle to a 25 MPG vehicle saves 3.6x more fuel than switching from a 40 MPG vehicle to a 45 MPG vehicle despite both improvements being 5 MPG.

It also crops up in disk speed benchmarks, which are done in MB/s, when your perception of speed is the inverse (how many seconds you wait for an op to complete). So the "huge" improvement in sequential speeds from 500 MB/s for a SATA SSD to 3000 MB/s for a NVMe SSD actually matters a lot less than a "tiny" improvement in 4k read speeds from 30 MB/s to 50 MB/s.
Re:Practical usage examples? by ShanghaiBill · 2018-12-21 08:38 · Score: 4, Insightful

... the downsides, like cost and snoop risk?
The Alexa Dot costs $29. That is about the price of an extra large pizza.
The "snoop risk" is nonsense promulgated by dumb people who are trying to sound smart. It only records the sentence after the keyword. This is documented behavior, and has been confirmed by many people running packet sniffers. Your cell phone, with all its 3rd party apps, is a FAR greater "snoop risk" than your speaker.
Re:Practical usage examples? by ShanghaiBill · 2018-12-21 08:44 · Score: 2, Interesting

My kids have found the smart speakers especially helpful for their foreign language classes.
Re:Practical usage examples? by ShanghaiBill · 2018-12-21 09:20 · Score: 4, Insightful

There may indeed be a vast conspiracy of thousands of Amazon employees willfully and blatantly violating federal and state laws, and sworn to secrecy, for no obvious benefit to themselves, and risking jail time and a hundred billion dollar collapse in market capitalization if the secret is exposed ... in order to record inane kitchen chatter. But that is getting into serious tinfoil hat territory. If you believe this, yet think it is okey-dokey to own a cell phone, which has a vastly greater spying capability and exploitable attack surface, then you are a moron.
Alexa... by rthille · 2018-12-21 09:23 · Score: 2

Alexa, define 'begs the question".

--
Awesome furniture, accessories and cabinetry in Santa Rosa, CA: http://humanity-home.com/
Re:Practical usage examples? by Anonymous Coward · 2018-12-21 09:45 · Score: 2, Interesting

Our house is all smart lights and "smart" stuff.. heck even the dishwasher talks with alexa. Does it make us more productive? probably not.. However, being able to ask when the dishwasher and clothes dryer will be done, or have it turn on the office lights or bedroom lights while walking down the hall is nice, same with turning off the lights.
Seeing the front door camera and the backyard cameras are nice (backyard cuz we have bears and the dogs lose their shit if they can corner a bear) anyway, it's all just convenient. Plus all the random day to day stuff, like before the "smart" stuff we just went to google and typed a question, now for the most part we just ask Alexa.
We also have August smart locks in our house, and i've heard about all the security risks that bings, but like, before we had an august, the security risk was a rock smashing the window and someone getting inside. TBH, before we had the august, half the time we forgot to lock the door (canada :p) The screened versions of echo are great for digital photoframes as well. Also, they work pretty good as whole house speakers. I didn't think we would use the "list" type features, but here we are 6 months later and when I go grocery shopping I'm looking at our list made though an echo. Downsides are right on.. this shit was expensive, but it's pretty cool! There is a snoop risk, but there is a snoop risk in our all devices. Devices with cameras are covered, but they can still listen.
They of course are not directly accessible via the internet, but yeah, it would suck if they got hacked. Anyway, all is great for now.. if something shitty happens we'll reevaluate , like everything else!
Re: Practical usage examples? by Kristoph · 2018-12-21 11:20 · Score: 2

I think you are either not a parent or, if you are, you are probably doing it wrong.
As a parent, your goal is to teach your children to think and solve problems independently and assist them only when it's clear they need that assistance. If I hover over my daughter to 'help' her do her homework that is not conductive to independent problem solving. But I am certainly there when she needs help understanding a concept or idea.
However sometimes my daughter will want to verify some fact - like the density of a chemical or some date of significance. I don't know these things so I can look them up and tell her or I can give her a computer and she can do that ( in US middle schools most kids get a computer or an iPad or something and this is commonly what they use to look up facts ). I happen to think that a screen is something of a distraction so when my daughter is solving a pen and paper problem I discourage her from using a computer and, if she does need to look up a fact, she can just ask the smart speaker.
The same applies to other things: if she wants to play some music she can do that through smart speaker without a screen, if she wants to remember something she can add a voice note, if she wants to send a message she can do that too. It's very helpful.
Re:Practical usage examples? by dromgodis · 2018-12-22 01:25 · Score: 2

The "snoop risk" is nonsense promulgated by dumb people who are trying to sound smart.
That strikes me as an unexpectedly bold (I avoid the word "dumb") statement. I didn't think that anyone denied the snoop risk.

It only records the sentence after the keyword
Even if this is true *now*, it can change at any time by the command of a number of actors, e.g the device/service suppliers, authorities, spy agencies, hackers, ...
As with all data collection, the *current* intent may be good but the data can very easily end up in the hands of bad actors. It can be the original actors with a changed agenda, or it can be new actors. And what someone calls "good" may be what you call "bad".