Annual Smart Speaker IQ Test (loupventures.com)
Research firm Loop Ventures published its annual Smart Speaker IQ Test this week. Like earlier iterations of the test, it put the top smart assistants and speakers head-to-head, grading them on a wide range of queries and commands. From the report: We asked each smart speaker the same 800 questions, and they were graded on two metrics: 1. Did it understand what was said? 2. Did it deliver a correct response? The question set, which is designed to comprehensively test a smart speaker's ability and utility, is broken into 5 categories:
Local -- Where is the nearest coffee shop?
Commerce -- Can you order me more paper towels?
Navigation -- How do I get to uptown on the bus?
Information -- Who do the Twins play tonight?
Command -- Remind me to call Steve at 2 pm today.
It is important to note that we continue to modify our question set in order to reflect the changing abilities of AI assistants. As voice computing becomes more versatile and assistants become more capable, we will continue to alter our test so that it remains exhaustive. Results: Google Home continued its outperformance, answering 86% correctly and understanding all 800 questions. The HomePod correctly answered 75% and only misunderstood 3, the Echo correctly answered 73% and misunderstood 8 questions, and Cortana correctly answered 63% and misunderstood just 5 questions.
Local -- Where is the nearest coffee shop?
Commerce -- Can you order me more paper towels?
Navigation -- How do I get to uptown on the bus?
Information -- Who do the Twins play tonight?
Command -- Remind me to call Steve at 2 pm today.
It is important to note that we continue to modify our question set in order to reflect the changing abilities of AI assistants. As voice computing becomes more versatile and assistants become more capable, we will continue to alter our test so that it remains exhaustive. Results: Google Home continued its outperformance, answering 86% correctly and understanding all 800 questions. The HomePod correctly answered 75% and only misunderstood 3, the Echo correctly answered 73% and misunderstood 8 questions, and Cortana correctly answered 63% and misunderstood just 5 questions.
before anyone should ever put one of these in their house: "Alexa/Siri/Google, stop spying on me."
Sure, but which one is more fun to shoot? Tune in next week when we line them up on a fence along with some beer cans, and launch them into the air for a skeet shoot shotgun test.
Why the fuck would anyone allow that shit in your home? Basically everything you say can and will be recorded for future law enforcement fishing expeditions.
That's not correct -- only anything you say after the wake-word is recorded. (unless, of course, you use the device to call your boss and talk crap about him and get fired).
If you have evidence that the devices have been used for general spying without having said the wake-word, I'd like to see it.
of course Google will get more questions right. They own a search engine and can fix things so their google home can find answers. besides, I got better things to do than to ask it stupid questions. I want something that will make me lazy. I want something that will actually work with all of my smarthome devices. I want something that will actually hear me. I have both and mainly use alexa while google home is a backup/troubleshooter.
a real test would include every feature, not just pick and choose the best feature and claim the device is the best because of it.
Probably the goldfish.
Table-ized A.I.
Last year it was at 52%, now it's at 75%. Google increased from 81% to 88%.
But still... even when understanding my query isn't an issue, I've found that typing/clicking is faster than talking for setting up most things - the exceptions being "set a timer" and "when I get home, remind me to ...".
#DeleteChrome
I'm more interested in the IQ of the people that own these things. How stupid do you have to be to let some huge corporation record everything you say?
I would have also been nice if they had included Samsung's Bixby, you know, just for laughs.
Does anyone have sufficient success stories to justify these things? Sure, you can ask about the weather or traffic while getting dressed for work in the morning, but does that alone override the downsides, like cost and snoop risk?
If your work or hobbies keep your hands busy* I can maybe see enough scenarios not covered by a smartphone, but what about others?
* I know what joke you're considering. Skip.
Table-ized A.I.
It would've been nice if they put a Raspberry Pi with Mycroft in this as well. I'd actually be interested in the results of that one.
"Don't meddle in the affairs of a patent dragon, for thou art tasty and good with ketchup." ~ohcrapitssteve
Alexa, kill Kenny
Do you think the wake word algorithm is perfect?
https://www.npr.org/sections/thetwo-way/2018/05/25/614470096/amazon-echo-recorded-and-sent-couples-conversation-all-without-their-knowledge
What kind of questions can they answer without web access?
A 3-way debate between Alexa, Siri and Trump.. who would win?
In a three way debate between those three you'd end up getting a $5 billion border wall ordered on your Amazon account by accident and be encouraged to buy a newer more expensive wall next year that is missing a headphone port.
"That's the way to do it" - Punch
I thought they administered an actual IQ test ... now that would be interesting ...
These results don't match my personal experience at least. Google's command support has gotten worse by them removing various phrases from support when they switched from "Google Now" to "Google Assistant" (or what ever they're calling it now). And even phrases it SHOULD know only work half the time. Things need to be phrased very awkwardly to get things to work sometimes, too. These devices still absolutely fail at natural language, and work better when speaking closer to what we would type on a terminal without extra words. "Timer 10 minutes" works, but asking it to "set a timer for 10 minutes" will have a higher chance of failure, as it has a higher degree of misinterpreting any of the words spoken.
"only anything you say after the wake-word is recorded" - YOU supplied this claim, YOU supply empirical proof of that. It's a very questionable claim as multiple cases have shown ongoing eavesdropping for various reasons/excuses.
It's the documented behavior of the device and confirmed by Amazon. You're the one making the extraordinary claim, so the burden of proof is on you.
I'm not aware of any claim that wasn't explained by the device being activated by the user, either by the wake word or a an inadvertent phone call.
You can't compare improvement as a percentage of success rate because the value of a % changes depending on what your success rate is. e.g. Increasing from 10% to 15% successes is not very impressive, while improving from 94% to 99% is very impressive, even though they're both a 5% improvement. To correctly compare, you have to invert and compare based on proportional decrease in failure rate.
Google
88% in 2018, or 12% failure rate
81% in 2017, or a 19% failure rate
12/19 = 0.63, or a 37% reduction in failures compared to last year
Siri
75% in 2018, or 25% failure rate
53% in 2017, or a 47% failure rate
25/47 = 0.53, or a 47% reduction in failures compared to last year
Alexa
72% in 2018, or 28% failure rate
63% in 2017, or a 37% failure rate
28/37 = 0.76, or a 24% reduction in failures compared to last year
Cortana
63% in 2018, or 37% failure rate
56% in 2017, or 44% failure rate
37/44 = 0.84, or a 16% reduction in failures compared to last year
The same problem crops up when comparing car MPG, which is actually the inverse of fuel efficiency so bigger MPG numbers actually represent smaller fuel savings. e.g. Switching from a 20 MPG vehicle to a 25 MPG vehicle saves 3.6x more fuel than switching from a 40 MPG vehicle to a 45 MPG vehicle despite both improvements being 5 MPG.
It also crops up in disk speed benchmarks, which are done in MB/s, when your perception of speed is the inverse (how many seconds you wait for an op to complete). So the "huge" improvement in sequential speeds from 500 MB/s for a SATA SSD to 3000 MB/s for a NVMe SSD actually matters a lot less than a "tiny" improvement in 4k read speeds from 30 MB/s to 50 MB/s.
Do you think the wake word algorithm is perfect?
https://www.npr.org/sections/thetwo-way/2018/05/25/614470096/amazon-echo-recorded-and-sent-couples-conversation-all-without-their-knowledge
Amazon explained what happened, it was still a wake-word activation, even if unintended.
"Echo woke up due to a word in background conversation sounding like 'Alexa.' Then, the subsequent conversation was heard as a 'send message' request. At which point, Alexa said out loud 'To whom?' At which point, the background conversation was interpreted as a name in the customers contact list. Alexa then asked out loud, '[contact name], right?' Alexa then interpreted background conversation as 'right'. As unlikely as this string of events is, we are evaluating options to make this case even less likely."
A 3-way debate between Alexa, Siri and Trump.. who would win?
Have them duel it out on Jeopardy.
sigs are for losers (except to point out that sigs are for losers)
Alexa, define 'begs the question".
Awesome furniture, accessories and cabinetry in Santa Rosa, CA: http://humanity-home.com/
Okay, he is smart at entertainment and manipulating a sufficiently large portion of the population using catchy sound-bites and bravado. He's pretty much dumb at anything else. I've never heard a coherent logical train of thought involving more than 2 steps from him on anything. Okay, once, when he was explaining why a beauty contestant should not have won. But, that probably means his dick has more working neurons than his brain.
Table-ized A.I.
At which point, the background conversation was interpreted as a name in the customers contact list. Alexa then asked out loud, '[contact name], right?' Alexa then interpreted background conversation as 'right'
This reminds me of my old non-flip, non-smart phone. It had a keypad lock but still allowed emergency calls while the keypad was locked. So jostling in your pocket, if it hit 15783791342, that was interpreted as a call to 112 the same as 5991531 would be considered a call to 911. Bad input was ignored, but did not cancel the digits already entered. So you were always working your way to dialing emergency calls in your pocket.
i really can't imagine amazon et al shuts down the entire smart speaker network a la lavabit because a gag order warrant ordered them to record everything from a particular subscriber.
you know, it's like "private" vpns. the gov't shows up demanding visitor ips to a particular site and the service says we don't keep them. "ok, here's your warrant. start."
- js.
Neither can you.
Seven puppies were harmed during the making of this post.
You're the one making the extraordinary claim
Corporations lie. There's nothing extraordinary about that - it's rather ordinary and even expected behavior nowadays.
Seven puppies were harmed during the making of this post.
The privacy issue is not about unwarranted recording. It's about how most people (including most people on Slashdot apparently) don't understand that the recordings they do have are valuable enough. From your voice data all kinds of new data can be derived.
- Your mood
- How your relationship is going (google it)
- Certain illnesses
Then the questions themselves can reveal a lot.
- Intelligence level (do you use complex words? Do you ask a lot of 'dumb' question?
- Life phase / unwanted pregnancy / money problems.
That second part is also quite valuable, as the questions you ask in the home might be more flippant and thus more revealing.
Thirdly, we know Amazon and other companies fingerprint your voice (in order to discern you from other householder members, for example). This means that if your voice is recorded in another location you will be recognized as having been there.
That last thing is important too as Amazon et al are slowly moving to always-listening devices. Again, this has been on Slashdot, and would be a logical progression we can all see coming.
All this is used to profile you. The profiles databrokers make are routinely used against your interests, such as when banks, insurers and employers access those profiles via hip software packages. Welcome to the age of the continuous background check.
The questions listed are the types of questions these "assistants" are designed to answer. Go off the beaten path, and you get much worse results.
For example, ask:
"What street am I on?"
"What city am I in?"
"How many people are in my contact list?"
"How many miles did I travel yesterday?"
"When is my next dentist appointment?"