Slashdot Mirror


Annual Smart Speaker IQ Test (loupventures.com)

Research firm Loop Ventures published its annual Smart Speaker IQ Test this week. Like earlier iterations of the test, it put the top smart assistants and speakers head-to-head, grading them on a wide range of queries and commands. From the report: We asked each smart speaker the same 800 questions, and they were graded on two metrics: 1. Did it understand what was said? 2. Did it deliver a correct response? The question set, which is designed to comprehensively test a smart speaker's ability and utility, is broken into 5 categories:
Local -- Where is the nearest coffee shop?
Commerce -- Can you order me more paper towels?
Navigation -- How do I get to uptown on the bus?
Information -- Who do the Twins play tonight?
Command -- Remind me to call Steve at 2 pm today.

It is important to note that we continue to modify our question set in order to reflect the changing abilities of AI assistants. As voice computing becomes more versatile and assistants become more capable, we will continue to alter our test so that it remains exhaustive.
Results: Google Home continued its outperformance, answering 86% correctly and understanding all 800 questions. The HomePod correctly answered 75% and only misunderstood 3, the Echo correctly answered 73% and misunderstood 8 questions, and Cortana correctly answered 63% and misunderstood just 5 questions.

2 of 129 comments (clear)

  1. Re:Be best. by Oswald+McWeany · · Score: 5, Funny

    A 3-way debate between Alexa, Siri and Trump.. who would win?

    In a three way debate between those three you'd end up getting a $5 billion border wall ordered on your Amazon account by accident and be encouraged to buy a newer more expensive wall next year that is missing a headphone port.

    --
    "That's the way to do it" - Punch
  2. Percentage improvement in TFA is wrong by Solandri · · Score: 5, Insightful

    You can't compare improvement as a percentage of success rate because the value of a % changes depending on what your success rate is. e.g. Increasing from 10% to 15% successes is not very impressive, while improving from 94% to 99% is very impressive, even though they're both a 5% improvement. To correctly compare, you have to invert and compare based on proportional decrease in failure rate.

    Google
    88% in 2018, or 12% failure rate
    81% in 2017, or a 19% failure rate
    12/19 = 0.63, or a 37% reduction in failures compared to last year

    Siri
    75% in 2018, or 25% failure rate
    53% in 2017, or a 47% failure rate
    25/47 = 0.53, or a 47% reduction in failures compared to last year

    Alexa
    72% in 2018, or 28% failure rate
    63% in 2017, or a 37% failure rate
    28/37 = 0.76, or a 24% reduction in failures compared to last year

    Cortana
    63% in 2018, or 37% failure rate
    56% in 2017, or 44% failure rate
    37/44 = 0.84, or a 16% reduction in failures compared to last year

    The same problem crops up when comparing car MPG, which is actually the inverse of fuel efficiency so bigger MPG numbers actually represent smaller fuel savings. e.g. Switching from a 20 MPG vehicle to a 25 MPG vehicle saves 3.6x more fuel than switching from a 40 MPG vehicle to a 45 MPG vehicle despite both improvements being 5 MPG.

    It also crops up in disk speed benchmarks, which are done in MB/s, when your perception of speed is the inverse (how many seconds you wait for an op to complete). So the "huge" improvement in sequential speeds from 500 MB/s for a SATA SSD to 3000 MB/s for a NVMe SSD actually matters a lot less than a "tiny" improvement in 4k read speeds from 30 MB/s to 50 MB/s.