Slashdot Mirror


Siri Team Didn't Learn About HomePod Until 2015, After Amazon Echo Debuted (9to5mac.com)

The Information (paywalled) has published a lengthy report today covering the development of Siri. The article documents Siri's tumultuous changes in leadership and management over the last few years, indicating that Siri 1.0's infrastructure was very creaky, which held back the service. From a report: One of the most interesting anecdotes is the claim that Apple's HomePod team didn't meet with the Siri group until 2015 (Amazon Echo debuted in late 2014). The story says Apple had originally considered launching the speaker without Siri. The big takeaway from The Information's reporting is that Siri launched with a poorly scalable infrastructure that caused bottlenecks for years after it launched in 2011. At the initial release, the popularity of Siri 'exceeded expectations' and led to a lot of unreliability. The backend was not designed to handle enough users. Apple has spent the intervening years modernising the system apparently.

4 of 31 comments (clear)

  1. I disagree by SuperKendall · · Score: 3, Insightful

    For all Apple's strengths in consumer electronics, they're not great at infrastructure

    That's true in some cases but definitely not all.

    iTunes, for example, has been pretty reliable as far as delivering music. Same for the App Store, which has worked extremely well in delivering a high volume of apps for years.

    iCloud used to be bad, but actually has been really stable and performed well for at least the past year or so. Siri since launch may have had trouble answering some questions at first but was pretty reliable about delivering some response almost all the time.

    One huge win has been push notifications where the Apple infrastructure has been SUPER reliable and could handle a ton of traffic pretty much from day one.

    So I'm not sure which teams you have worked with, but Apple does have very good infrastructure teams. Just like with any company though, not EVERY team is going to have amazing and super-competent people working on it...

    --
    "There is more worth loving than we have strength to love." - Brian Jay Stanley
  2. Re:If Siri wasn't a surveillance app... by jcr · · Score: 2

    You can't convince me that Apple, or any of the voice recognition players, are dedicating more processing power in their central servers on a per usage-basis than the mobile devices have natively inside them.

    You don't know what you're talking about, but don't let that stop you!

    The reason that Siri uploads the audio to a server is that the language models that the recording is scanned against are huge. The bigger the model, the more accurate the recognition can be. You might be happy to replicate all that storage on every single device, but most people wouldn't be.

    -jcr

    --
    The only title of honor that a tyrant can grant is "Enemy of the State."
  3. Re:Those statements may be true by 93+Escort+Wagon · · Score: 3, Insightful

    I understand what you're saying, but I don't think it's the whole explanation either.

    I'm in the Apple ecosystem, and I've attempted to use Siri a fair bit. I will ask her non-personal questions along the lines of "what time is the Seahawks game tonight?" * - she will interpret the words correctly but simply respond with "Here's what I found on the web regarding 'what time is the Seahawks game tonight'".

    That doesn't seem like a privacy issue, it's more of an "Siri isn't particularly good at determining context" issue. Siri falls back to the "here's what I found on the web" default - which I assume is intentional for any case when context can't be determined - far too often.

    * This question was made up for exposition purposes - it's possible Siri might actually handle this specific query better than described

    --
    #DeleteChrome
  4. Re:If Siri wasn't a surveillance app... by EndlessNameless · · Score: 2

    You can't convince me that Apple, or any of the voice recognition players, are dedicating more processing power in their central servers on a per usage-basis than the mobile devices have natively inside them. Apple phones are a multi-gigahertz computing devices with more DSP power inside them than your PC has.

    First of all, speech recognition doesn't run on standard DSPs at all. The platform probably uses the DSP's native noise reduction functionality to get a cleaner input, but modern speech recognition is based on deep learning, which is accelerated on GPUs or custom ASICs. Neither of those things was present in any phone when Apple and Google first shipped voice recognition. All those fancy DSPs for talking and media consumption do jack for speech recognition.

    Second, they don't need an insane amount of compute power. The servers alleviate memory pressure and reduce storage demands on the device. Even a limited language model dataset is over a GB. Compare that to a small buffer, maybe a few KB, to tx/rx the voice stream and response.

    Mainstream phones ship with 2-3 GB RAM and 16-32 GB storage, which is a serious issue for voice recognition. Budget devices have even less. There should be no issues running the Siri backend service on a decent workstation, but a phone is too cramped. I wouldn't be surprised if Siri's model data is much larger than a GB.

    As DRAM and flash densities increase, the ability to run speech recognition software locally will improve. You could get away with much smaller datasets if you are willing to trade off for a smaller vocabulary or lower accuracy, but who would do that? People have complained for years about poor accuracy---and we have finally beaten that problem, for the most part.

    --

    ---
    According to the latest ruleset, this post should be modded as Vorpal Flamebait +5.