Slashdot Mirror


Siri Keeps Your Data For Two Years

New submitter LeadSongDog writes with news that Apple has provided information on how long it holds onto voice search data used by its digital assistant software Siri. Speaking to Wired, an Apple representative said the data is kept for two years after the initial query. "Here’s what happens. Whenever you speak into Apple’s voice activated personal digital assistant, it ships it off to Apple’s data farm for analysis. Apple generates a random numbers to represent the user and it associates the voice files with that number. This number — not your Apple user ID or email address — represents you as far as Siri’s back-end voice analysis system is concerned. Once the voice recording is six months old, Apple “disassociates” your user number from the clip, deleting the number from the voice file. But it keeps these disassociated files for up to 18 more months for testing and product improvement purposes." This information came in response to requests for clarification of Siri's privacy policy, which was not very clear as written. The director of privacy group Big Brother Watch said, "There needs to be a very high justification for retaining such intrusive data for longer than is absolutely necessary to provide the service."

6 of 124 comments (clear)

  1. Comparison with Google search? by Anubis+IV · · Score: 4, Interesting

    Anyone have the timeline for Google's disassociation and destruction of search queries? I'm curious how Apple's policies compare against those.

    1. Re:Comparison with Google search? by fazey · · Score: 5, Insightful

      You mean google has an option to hide your search history from you?

    2. Re:Comparison with Google search? by Anubis+IV · · Score: 4, Interesting

      From what I can tell, disabling Google History doesn't seem to come with a promise that Google doesn't keep that data somewhere else. What they say they'll do is stop using your History to present targeted advertising for you across their services, or you can choose to delete individual items from your search history, that way they aren't considered when it comes to determining your interests and the like. What they very carefully seem to avoid saying is that they completely delete your queries from all of their systems, so I wouldn't be surprised if they're still using them in some sort of anonymized form for product improvement purposes, tracking trends, or other things of that sort.

    3. Re:Comparison with Google search? by Anubis+IV · · Score: 4, Informative

      Well, I've been searching since I made the comment, and the best I've found so far is this thread where a Google rep confirms that for every image search they keep a thumbnail of the item that was clicked on, as well as the IP address for 9 months (after which it gets anonymized), and identifying information for the cookie associated with you for 18 months (after which it gets anonymized and the IP address gets partially destroyed). What that means is that they never fully destroy the data, and that if the query was self-identifying in some way, someone could still tie all of the queries you made together since they would still be associated with the cookie data, even if that cookie data is no longer associated with you.

      Take it with a grain of salt, however, since that's from back in 2011. As we all know, these tech companies have made big strides to protect our privacy better since then. Wait, no, I have that backwards.

  2. Re:Rotten to the core. by Megahard · · Score: 4, Informative

    I just tried it with Siri and it also punts to Wolfram Alpha so the answers are identical. There's no lakefront properties.

    --
    I eat only the real part of complex carbohydrates.
  3. Sample data... by sl3xd · · Score: 4, Interesting

    Everyone I've ever spoken to or read about in the field of voice recognition tells me that having samples of people's voices is critical to improving it... and getting those samples (mainly the raw quantity of samples) is the biggest problem they face.

    So it doesn’t surprise me at all that anyone keeps a massive archive of samples... the sample data can be critical in improving voice recognition.

    As an aside: Google Voice's voice mail feature does more or less the same thing... and the reasoning is the same also: More sample data means better voice recognition.

    I can't help but shake my head at the comparison:

    Google samples user voices, reads (and transcribes) voice mail, reads your email, your stock information and then feeds it into their advertising engine, and does this for four years and counting; reaction: Meh...

    Apple samples voices, anonymizes it, uses it it improve voice recognition over a period of two years; reaction: EVIL! APPLE MUST DIE!

    --
    -- Sometimes you have to turn the lights off in order to see.