Siri Keeps Your Data For Two Years
New submitter LeadSongDog writes with news that Apple has provided information on how long it holds onto voice search data used by its digital assistant software Siri. Speaking to Wired, an Apple representative said the data is kept for two years after the initial query.
"Here’s what happens. Whenever you speak into Apple’s voice activated personal digital assistant, it ships it off to Apple’s data farm for analysis. Apple generates a random numbers to represent the user and it associates the voice files with that number. This number — not your Apple user ID or email address — represents you as far as Siri’s back-end voice analysis system is concerned. Once the voice recording is six months old, Apple “disassociates” your user number from the clip, deleting the number from the voice file. But it keeps these disassociated files for up to 18 more months for testing and product improvement purposes."
This information came in response to requests for clarification of Siri's privacy policy, which was not very clear as written. The director of privacy group Big Brother Watch said, "There needs to be a very high justification for retaining such intrusive data for longer than is absolutely necessary to provide the service."
Anyone have the timeline for Google's disassociation and destruction of search queries? I'm curious how Apple's policies compare against those.
My guess is the overlap between "people who complained Siri wasn't accurate" and "people who dont want apple keeping any Siri data so they can make it better" is pretty close to perfect.
Google reads your mail. Apple listens to your ravings. Don't like it, don't use it. And they only keep 'your' (ie identifable) data 6 months.
I just tried it with Siri and it also punts to Wolfram Alpha so the answers are identical. There's no lakefront properties.
I eat only the real part of complex carbohydrates.
Everyone I've ever spoken to or read about in the field of voice recognition tells me that having samples of people's voices is critical to improving it... and getting those samples (mainly the raw quantity of samples) is the biggest problem they face.
So it doesn’t surprise me at all that anyone keeps a massive archive of samples... the sample data can be critical in improving voice recognition.
As an aside: Google Voice's voice mail feature does more or less the same thing... and the reasoning is the same also: More sample data means better voice recognition.
I can't help but shake my head at the comparison:
Google samples user voices, reads (and transcribes) voice mail, reads your email, your stock information and then feeds it into their advertising engine, and does this for four years and counting; reaction: Meh...
Apple samples voices, anonymizes it, uses it it improve voice recognition over a period of two years; reaction: EVIL! APPLE MUST DIE!
-- Sometimes you have to turn the lights off in order to see.
...and have since 2007 These two great blog posts cover the details "Taking steps to further improve our privacy practices" http://googleblog.blogspot.co.uk/2007/03/taking-steps-to-further-improve-our.html and "
How long should Google remember searches? " http://googleblog.blogspot.co.uk/2007/06/how-long-should-google-remember.html an example from it "By anonymizing our server logs after 18-24 months, we think we’re striking the right balance between two goals: continuing to improve Google’s services for you, while providing more transparency and certainty about our retention practices." Google are suprisingly forthcoming about how and what they do with your data, which clashes sharply with Apple(pretend the don't) or Microsoft(who run hate campaigns)