Why Google's Wi-Fi Payload Collection Was Inadvertent
Reader Lauren Weinstein found a blog post that gives a good, fairly technical explanation of why Google's collection of Wi-Fi payload data was incidental, and why it's easy to collect Wi-Fi payload data accidentally in the course of mapping Wi-Fi access points. "Although some people are suspicious of their explanation, Google is almost certainly telling the truth when it claims it was an accident. The technology for Wi-Fi scanning means it's easy to inadvertently capture too much information, and be unaware of it. ... It's really easy to protect your data: simply turn on WPA. This completely stops Google (or anybody else) from spying on your private data. ... Laws against this won't stop the bad guys (hackers). They will only unfairly punish good guys (like Google) whenever they make a mistake. ... [A]nybody who has experience in Wi-Fi mapping would believe Google. Data packets help Google find more access-points and triangulate them, yet the payload of the packets do nothing useful for Google because they are only fragments."
Of course it was accidental, after all, their corporate slogan is "Do no evil". Obviously they wouldn't do anything that would be evil.
Tequila: It's not just for breakfast anymore!
Inadvertent or not Google broke laws in some countries. Accidentally breaking the law doesn't eliminate responsibility or culpability - even if people shouldn't have left their WiFi unsecured.
If I accidentally run over someone with my car because I wasn't paying attention to what I was doing, it doesn't absolve me of the liability - even if that old lady had it coming, er, was jaywalking.
Laws won't stop the bad guys, but if you have laws you can at least punish them if you catch them. Claiming Google are the good guys (based on what? their motto?) and saying therefore there should not be laws is just ridiculous.
Nothing explains why they stored the data so far. Recording names of access points? Okay. Recording locations of access points? Mmmmaybe. Recording data retrieved by connecting to unsecured access points? No. How can that data be used for any honest purpose? And let's be clear about this: collecting and storing data is an act directed by software which was written by a person or persons who were acting under direction ostensibly by specification. You find those specifications and directors and you will come closer to finding the truth as well as those responsible.
The argument is that capturing data packets is useful to find the SSID of access points which send beacon frames with blank SSID field or where only a client is within range but not the access point itself. That argument is bogus. The mobile devices which will later use the mapped SSIDs and BSSIDs to calculate their own position do not see anything but the beacon frames. It is therefore entirely sufficient to capture just the beacon frames.
There is a legitimate argument that Google was just lazy (or "scientific") by capturing everything they can get in the field and analyzing later. There is however no technical reason for this and we should not make one up to defend Google.
If you're broadcast your data via radio, why on earth would you expect anyone to consider it private?
Encryption. If you need it, use it.
A government is a body of people notably ungoverned - AC
So what TFA is saying is that the issue isn't simply Google snooping on networks and collecting data? And that there may have been a legitimate reason for this whole situation? And that it's blown out of proportion? STOP RUINING MY REASONS TO BE ANGRY AT GOOGLE!
My concern with what Google, and many other firms, are doing is that they are dedicated huge amounts of resources to collected huge amount of data on people. As profit making entities, these firms must at some point monetize this data to get a return on investment. Therefore, if google is keeping data other than basic acces point information, then they must be planning to do something with it.
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
Despite what everyone thinks (and how it seems to the uninformed) it very likely was accidental. If I was tasked to correlate Access Points to their locations, the simplest way would be to dump raw wireless traffic to one file, and raw GPS data to another. Later, you can zip them both up and run some analysis, and get the data you want out.
It'd be real easy to forget to filter the packets you dump to only anonymous, non-data-carrying packets. More than likely the people who designed it just forgot to, or figured it would be no big deal if they just never used that info. Sloppy engineering maybe, but certainly not malicious.
They accidentally recorded parts of publicly broadcasted data....
It is not much different from a phone recording a conversation in a busy enviroment and being blameed for accidentally recoring parts of other people's conversations that you walked past...
Yes, they should have only saved the SSID, location, and signal strength. Instead, they used off the shelf software which saved more data. There is no reason to believe this was intentional.
That's fine and legal to do in the USA, as you have no expectation of privacy using unencrypted broadcast:
http://www.law.cornell.edu/uscode/uscode18/usc_sec_18_00002511----000-.html
TITLE 18 > PART I > CHAPTER 119 > 2511
(g) It shall not be unlawful under this chapter or chapter 121 of this title for any person—
(i) to intercept or access an electronic communication made through an electronic communication system that is configured so that such electronic communication is readily accessible to the general public;
(v) for other users of the same frequency to intercept any radio communication made through a system that utilizes frequencies monitored by individuals engaged in the provision or the use of such system, if such communication is not scrambled or encrypted.
In the US, if you transmit in the clear on unlicensed spectrum, they can legally pick it up due to two different, non-overlapping legal clauses. ( Note, I am not a lawyer, this is not legal advice, this is but one of possibly relevant laws, etc.)
The problem is they didn't need to do so, and it creeps people in the US out. So even here where it is legal, they probably shouldn't have from a PR point of view.
In some other countries it is not legal to collect that data, and doing so intentionally might lower your penalties, but still does not make it legal.
Blessed are the pessimists, for they have made backups.
Basically Google probably could of swept this under the rug, and most companies would have. Google on the other hand came out as the only source. There was no accusations, or indication that this information would leak yet Google freely informed the public that this was an accident, and took responsibility. Maybe there was some underlying motive, maybe there's information we don't have, but with all the info that's out right now it seems Google acted as a good samaritan.
Any geek worth their salt also never makes mistakes. Myself, I think I made a mistake once many years ago, and for my negligence i was rightfully whipped for it. Now of course I never make them; my work is always perfect.
The thing most people forget to ask, but was asked in this article, is something you conveniently forgot to mention. Here it is:
What possible use could google have for this data? What would be their motive here?
As the article says, there's almost no personal data in the emails. Even if there is, there's so little of it that what useful purpose could it serve? You'd have a hard time correlating it to any one person, or even finding out what it is. There's going to be so little data here, and it'll be so fragmented, that turning it into anything useful would be impossible.
On the other hand, why would google risk collecting this data when they knew what was going to happen if it got out? The risk vs. reward here just doesn't make sense. They're going to risk their reputation on... what? Collecting a few fragments of unencrypted wifi traffic that probably contains so little information and could very well be generated by a bot running on your machine.
I'm not going to believe google did this on purpose until someone can give me a motive that doesn't sound like something from a UFO convention.
You may find your mistake early, after gigabytes worth of data. Then you fix it before it becomes TB or PB of data. Right?
We're all allowed mistakes. Mistakes of this size from the uber-geeks of Google isn't a mistake. It's negligence..... not quite of BP's size, but just as shamelessly stupid.
---- Teach Peace. It's Cheaper Than War.
You do, ensure that it's broadcast power is low enough so as not to escape the walls of your dwelling, and encrypt the traffic (WPA2 preferably).
No privacy was violated, it's not like the guy in van drove up the to the house, and shoved an antenna though the mail slot. I mean this is like complaining the guy making a movie in his backyard recorded your shouting over his fence, don't shout then!
All of the above was encrypted with a Quad ROT-13 method. Unauthorized decryption is in violation of the DMCA.
No, this is complaining that they are identifying that you have an access point at all and then (presumably) making that information publicly available. Setting the power so the signal doesn't escape the house - while still reaching all areas of the house - is not practical. It also puts the onus on you to "hide" rather than on them to obtain permission before publicizing information about you. As for your analogy, I think this is a better one: this is like them driving up beside your house and looking in the windows with binoculars and then publicising to the world the contents of your house.
The tyrant will always find a pretext for his tyranny - Aesop
You make an excellent point.
For my part, I'd like to point out that if Google wanted to read your email, they wouldn't bother collecting wifi data. They'd just read yer fucking email.
I think what is more likely is that someone came to the engineer and said they needed to get the data and nobody really bothered to think of the privacy concern since it was going to be used internally anyway. Sure, if the engineer was told that the requirements demanded better privacy, he could have stripped the payloads, but if someone asked you to just get the data, it's less likely you'd think of that as a problem.
I would redefine it as sloth on the part of the management for not considering the issues, as opposed to lazy engineers.
Regardless of whether it's accidental, or difficult as the OP suggests, the reality is that both of those are merely excuses and rationalizations for externalizing the bad effects of behavior while privatizing the profits. Try translating those excuses to another industry and see how satisfying an answer they are. Consider medicine, there are undeniable benefits to modern therapies. However because it's hard to get right, we don't just accept any random treatment. Before companies unleash their new products upon the public we require that they take the time to ensure, as much as possible, that they are safe and don't have unintended effects. You may suggest that Google isn't a medical company whose products and services won't be killing anyone or causing them to grow a third eyeball, therefore they don't have the same obligations. OK, then how about banking? Credit reporting? Private investigators? Mining companies?
Entirely outside any other arguments, I find it hilariously ironic that Google -- the company staffed entirely by PhDs, by the most brilliant minds in the industry, by saints who'll do nothing wrong -- always comes back to "look we have this awesome idea with splendid (but vague and non-specific) benefits beyond making us incredibly wealthy, however there are significant downsides for the rest of you and those downsides are hard to avoid." Which makes me think that maybe they aren't so smart, which means that maybe their idea isn't so great. Isn't the point of being smart that you can do things that are hard? QED.
The thing most people forget to ask, but was asked in this article, is something you conveniently forgot to mention. Here it is:
What possible use could google have for this data? What would be their motive here?
As the article says, there's almost no personal data in the emails. Even if there is, there's so little of it that what useful purpose could it serve? You'd have a hard time correlating it to any one person, or even finding out what it is. There's going to be so little data here, and it'll be so fragmented, that turning it into anything useful would be impossible.
On the other hand, why would google risk collecting this data when they knew what was going to happen if it got out? The risk vs. reward here just doesn't make sense. They're going to risk their reputation on... what? Collecting a few fragments of unencrypted wifi traffic that probably contains so little information and could very well be generated by a bot running on your machine.
I'm not going to believe google did this on purpose until someone can give me a motive that doesn't sound like something from a UFO convention.
What if this were a calculated marketing maneuver designed to test the waters and find out how much people really care about privacy and the possible hard-to-justify violation thereof? This is, after all, a company that would make far less money if everyone had excellent online privacy. How much people are willing to protect that privacy and how much outrage they express at real or perceived violations of it could be very important data to a company like Google.
This is data that would be difficult for Google to obtain from their usual channels. Just like in politics, it has to become an "issue" and then the reaction can be assessed. A privacy matter that collects little or no directly sensitive information (thus protecting Google from potential liability) that still raises the issue and gets people talking about it would be perfect for this purpose. That's exactly what happened here.
The more successful a company, the more resources it possesses, the more talent it has hired, the more difficult it becomes to believe that they'd make trivial mistakes that most Slashdotters, acting alone with an infinitessimal fraction of the same resources, would have easily avoided. Good long-term strategy looks a lot like things just happening to work out a certain way as a product of chance. It's possible someone at Google could have made the incredibly trivial mistake that caused this chain of events. What's unlikely is that among all of the managers, designers, and programmers involved in this project, not one person noticed such a mistake.
It is a miracle that curiosity survives formal education. - Einstein
Your ends-justifies-the-means concept holds no water.
My wifi access points are a matter of public knowledge. After all-- they're freaking radios. What's not public knowledge is anything after the location of it, and its authentication- if any.
The data that flows there is mine, and no one elses. The other MAC addresses associated with the AP are also my business, and no one else's. Differing jurisdictions have different views of the severity of the theft that their mindlessly-stupid shark-like gobbling did. I hope they suffer the higher of the common denominators of justice.
At the time of this writing, the parent post is marked "Troll".
How is this trolling? Consequentialism is a valid thing to argue against. Granted, you may disagree with parent's opinion of what is and is not a private component of a Wi-Fi transmission. If you disagree with him that a violation has occurred then you would necessarily also disagree that Google should suffer legal action from any sort of justice system. If that's the case, then the respectable non-cowardly way to handle it is to argue against it and take him to task.
I'll spell this out since a lot of mods clumsily fail to grasp a few basic concepts. "Troll" is something of an accusation or judgment. That doesn't change because you express it by selecting it from a menu rather than directly confronting the poster. As such, it requires at least some kind of positive indication. Specifically, it would require a good reason to believe that the parent poster could not conceivably express the above as a sincere opinion and is saying it merely to get a reaction out of others. There is no such indication here.
This reminds me of too many Apple discussions, in which the fanboyism towards $popular_company is stronger than the love of free speech or the ability to handle opinions with which you disagree. I don't particularly care so much about the waste of a perfectly good mod point. Rather, the hypocrisy is what needs to be pointed out.
It is a miracle that curiosity survives formal education. - Einstein
Your selective quoting and attempted sarcasm are rather pointless since I was merely pointing out the flaw in the suggestion I received. But your attempt at wit is noted.
As for your analogy, it is not apt. Let me fix it for you:
"If you want to get to the library, go down Main Street and take a left at the house that has a big screen TV and large leather couch in the living room."
Either you get that privacy is being increasingly encroached upon and that encroachment is a problem, or you don't. You don't seem to get that so I really see no point in further "discussion" with you (and wouldn't anyway since you seem to need to massage your ego by attempted wit and sarcasm). If it will make you feel better go ahead and have the last word. Make it a four letter one if you like.
The tyrant will always find a pretext for his tyranny - Aesop
That would make sense if Google wrote all of the code themselves. However, they used many off-the-shelf, open-source tools to perform their data collection.
The defaults in those tools is to grab all the frames. So, the guy who put together the tools (who probably was not a privacy-minded person) says "It works great! We have the data that we want, see?" and shows the finished product to his boss. The boss, who might have been more privacy-minded, probably looked at the finished product and saw no personal information, and gave it a checkmark. Completely missing the intermediate data product that no one was using.