Can Translucency Save Privacy In the Cloud?
MikeatWired writes "Jon Udell writes that when it was recently discovered that some iPhone apps were uploading users' contacts to the cloud, one proposed remedy was to modify iOS to require explicit user approval. But in one typical scenario that's not a choice a user should have to make. A social service that uses contacts to find which of a new user's friends are already members doesn't need cleartext email addresses. If I upload hashes of my contacts, and you upload hashes of yours, the service can match hashes without knowing the email addresses from which they're derived. In the post Hashing for privacy in social apps, Matt Gemmell shows how it can be done." (Read more, below.)
"Why wasn't it? Not for nefarious reasons, Gemmell says, but rather because developers simply weren't aware of the option to uses hashes as a proxy for email addresses. A translucent solution encrypts the sensitive data so that it is hidden even from the operator of the service, while enabling the two parties (parents, babysitters) to rendezvous. How many applications can benefit from translucency? We won't know until we start looking. The translucent approach doesn't lie along the path of least resistance, though. It takes creative thinking and hard work to craft applications that don't unnecessarily require users to disclose, or services to store, personal data. But if you can solve a problem in a translucent way, you should. We can all live without more of those headlines and apologies."
All my contacts upload their hash regularly.
Well... mostly on the weekends.
Hashing is more difficult than not hashing.
Customers are not going to stay away just because your security is atrocious.
So only legislation (or serious liabilty) is left to get this off the ground.
Gonna start generating the contact-data rainbow tables right now!
I can see how:
Uncle Bob 01234 123456
might be smart matched with:
Robert Smith +1 1234 123456
I'll be interested to see the hashing algorithm that will allow the hashes to be matched.
So only legislation (or serious liabilty) is left to get this off the ground.
You would really rely on legislatures to get the wording of such a law correct and not impede what we can do with mobile devices?
Apple is already changing the system to require user permission when accessing contacts. One of the main apps at fault, Path, has already switched voluntarily to using hashes.
So why go the trouble of crafting regulation to solve a problem taking care of itself already? All you can do is make things more annoying for people.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
apps wouldn't be free - 99c
You've unlocked 3rd level achievement in "LOL, Didn't Read!" tier: "Didn't Read The Fucking Title"
Nevermind, we'll just blame Micro$oft for making your computer slow
More seriously. What are you planning to match all that to? Don't you think just headers/filenames will do?
a) There's no salting here - you're looking for matches, after all - so reversing numbers is trivial and reversing e-mails is much easier than reversing unsalted password hashes, as entropy is much lower.
b) And even without reversing you can still build relationship graphs.
So, how exactly does this help privacy?
How do you make money from free cloud apps, if it's not by selling the private information you extract from your customers files? I thought the cloud efficiency (good service at low cost) came by design from taping into privacy.
Video of some good progressive thrash music
Almost yearly, I (as do most Americans) get a small little statement/disclaimer entitled "Notice of Disclosures" or something to that affect from various banking, insurance, and other types of institutions I regularly do business with. I believe the only reason they send this is by legal requirement, and it tells me all of the different bits of information they have on me and what they do with it/how they resell it, or excuse me, "share it with valued business partners." Some things I can opt out of, which I eagerly do, while others I simply do not have that option. I've become jaded enough to the point that I am under the impression much of this information is more valuable than my "core" business, such as bank seeing my regular spending habits as well as able to speculate at my income by way of direct deposit tracking. This sort of thing marketers salivate over.
For online companies such as Facebook, the model is hardly different. For companies that give away applications for your tablet or only charge you a small amount of money, this can be an appealing revenue source, whether as a secondary one beyond the $1 they charge you or even the primary one.
Given all this, I tend to question that when building a product such as this, I have a hard time believing Path or any similar companies even remotely care to try and NOT see your information since it blocks them from such sources. Only when the media yanks the covers off do things get more interesting. This lets them assemble a web of contacts which has value, whether for immediate return or as an "asset" for the organization should they ever sell it (and revise their privacy notices).
Lets see how things go once the dust settles and whether they have committed to any of these things.
some things to consider:
- when you hash a telephone number, a rainbowtable is easily generated
- even when you have ids, which are real pseudonyms, no option to crack them, then you can correlate "ah, user X knows Y, which is known by Z, too".
So uploading contact data is exposing private things, even when the nodes are ano(pseudo)nymous and only the edges of the social graph are known.
The issue is not how companies that want to preserve privacy can do so. They will find a way and the described solution is rather obvious. The question is how to stop companies that do not care about your privacy at all and _want_ to upload all your data to their own servers.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Once a provider has a large enough db, they can look for firstname.lastname@gmail.com or, knowing from the contacts distribution the region and language of the users, something like @free.fr or @yahoo.co.jp
The problem isn't taking care of itself. We are seeing Apple, Google and Facebook doing rearguard actions because they are afraid of regulation and lawsuits.
Lawsuits perhaps, but they are more afraid of CUSTOMERS. They want to serve CUSTOMERS better (and also avoid lawsuits).
Moreover, the Europeans are doing it already, so why not copy^H^H^H harmonize with their laws in America?
Well that's how we get SOPA. Great plan. Not.
Just because Europeans are willing to submit to tyranny why should the U.S.? Why should anyone?
Let's also bring over the vast array of cameras from the U.K. while w are at it! All of the security nazis wet dreams can come true across the globe, and when you cough I can find out from my command center two continents away!
"There is more worth loving than we have strength to love." - Brian Jay Stanley
The alpha-channel in JPEG sucks.
Join the Slashcott! Feb 10 thru Feb 17!
Maybe they knew, maybe they didn't know about this translucency method, but in the end one thing I am fairly certain of is that that "they" want all your information, it's how "they" get valid emails, make money and build profiles.
If your information is obfuscated in any way the reliability of what they want to do is diminished and therefore not worth as much.
"If any question why we died, Tell them because our fathers lied."
Especially if they are passed in the clear.
All that has to be done is record lots of hashes from as many phones as possible.
then from a single phone you identify all the hashes. From those hashes you have a phonehash list, which can identify those phones that have missinge elements - yes there will be some duplicates.
All that is necessary then is to go to the server to identify which hash is the owners identification equivalent.
This is an aggregation attack - and it works to identify each person, and without knowing the data that was hashed, but deriving that data from the hash and external information.
Of course they will want to spam people who haven't registered with their "social" service yet, so they need to harvest plaintext e-mail addresses / names and put the blame on you when they send them a spammy invitation. Remember, this is the "you are the product" market, practical solutions are whatever brings in more users/cash, not things that protect privacy as much as possible ...
"I love my job, but I hate talking to people like you" (Freddie Mercury)
If Iphone users cared about their privacy, they wouldn't be Iphone users.
And don't tell me companies don't use hashing because they don't know how to implement it. This is deliberate. They want your contacts. Data = money, personal data of real people plus who-knows-whom = more money.
It's not that the developer was too overworked or too stupid to come up with a hashing scheme. Quite the opposite. Often these applications exist only for fishing user data. The solution is to assume that all applications are vile and to limit their access to user data the same way they are denied access to other resources.
It is time to stop worrying about the 10-20 companies who make their money from violating privacy and selling data to advertisers. Just because Google and Facebook have become popular with this business model during the past decade doesn't mean that we should give up century old principles and that we have to protect this business model in all eternity.
The vast majority of all companies can or could do respectable business without violating privacy. Notwithstanding the software patent nightmare, it is possible to make products and sell them to customers to the satisfaction of both parties. It is ridiculous how very few companies, which only have become global players after stockholders have artificially inflated their values, can steer the public discussion about what should be possible for them and what not. We need laws not only to protect our privacy and stop parasitic, advertising-based business models, but also to put an end to frivolous EULAs and the copyright, trademark, and patent nonsense.
It's time to give the power back to real companies, who actually offer real products and who are interested in sustainable business based on making their customers happy. There are plenty of those, and it's pissing me off that a few black sheeps get all the media attention.
Yours sincerely,
aaaaaaargh!
Slashdot Ranter
Simply generate a (random) salt everytime you want to find common adresses between you and the other person and then send that salt to the other person (through the centralized server). All the server sees here is the salt: the server cannot use rainbow tables.
Then both you and your correspondent send your address book encrypted with the salt to the server and the server can tell which "friends" you have in common. And the server still doesn't know who is who.
Rainbow tables defeated. Plain and simple.
Why are we not doing this for passwords too? Every site on the internet shouldn't need to store a plaintext password. Does there exist an algorithm by which a site owner could send the salt, the user hashes with his password, and the site owner can tell the password is the same, without actually having the password?
1^2=1; (-1)^2=1; 1^2=(-1)^2; 1=-1; 1=0.
the hashed salt+password becomes the password.
And worse - it is sent in plain text.
And salts do not prevent rainbow tables, they just make them bigger.
To have all possible passwords (8+salt) only requires about 3 GB of data.
Disks are now 3/4 TB in size, and soon will be 8/16 TB.
It would take a fair amount of time to generate a rainbow table, but possible.
all the problems people had with mainframes you are starting to see with remaining mainframes "the cloud". No on cares about your data as much as you do no matter what they tell you or promise.
A bad actor could rather easily convert the hashes back to email addresses. All he needs is a good source of email addresses (readily available from the dirtbags who supply spammers), which he can then hash and index. Takes some computer resources, that's all.
A good actor need merely not misuse the email addresses in the first place.
The root of all these problems is that any idiot with a text editor can call themselves a "web developer" these days. The barrier to entry is extremely low, and the result is a very large group of people who have no forethought about what they're actually doing. They take the most naïve path from start to finish and end up creating all these security and privacy holes real programmers have long since learned to avoid.
Case in point: people still store passwords and credit card info in plaintext, typically behind sloppy PHP or Ruby scripts that are vulnerable to SQL injection. Feed that stolen data into a simple script that tests the passwords against a handful of popular services like GMail, Facebook, Hotmail, Paypal etc. Within minutes, you have a few dozen accounts ready to be abused all over the web without the user's knowledge - all because of one idiot who didn't know how to protect his users' info.
All this talk of securing the cloud is futile. It's like putting a dozen deadbolts on your front door, then leaving a spare set of keys under your neighbour's welcome mat.
-Billco, Fnarg.com
I've been thinking along similar lines for around the last 6 months. The thing that got me started thinking about it was the compromise of MtGox. I had an account there with a strong password, and a single-use e-mail address. When they were compromised, my password was fine because it was one of those dozen character random passwords that are getting such a bad rap these days, but it also wasn't shared with any other accounts.
But, the problem was that the e-mail address I used there, which I had white-listed because it was used only on that service, started getting quite a lot of spam.
I started imagining that they could have stored the hash of my address on their publicly accessible systems and if I needed to do a password reset they could still verify that I had the address, and send a message password reset message to it, without having the address stored in their database.
If the service needs to send e-mails as part of the service, you could imagine a second server which is isolated from public access, but has an API for updating e-mail addresses and sending the messages.
Extending this to finding of friends is a pretty good idea. I never do the "look for friends" function because I don't want to give out my address book to these places, it's just too often lead to spam for me and my friends in the past. But, if I could upload hashes of those addresses I'd be more interested in doing it.
This paper: Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization by Paul Ohm from the University of Colorado Law School is the best summary why it won't work even if people do it: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1450006 TL/DR: There are just far too many ways to infer the real meanings from a network of hashes.
This is a temporary solution that simply will not work for any large cloud service of significant size. I do program with hashing and other encryption mechanisms and I do know exactly what hashing does.
Hashing is a 1 to 1 algorith. So given any single input (like an email address), the hash output will remain the same. So any hash value can quickly be verified to match a known email address by simply running the hashing algorithm on a known email address and matching it to the existing hash. If you get a match, then you know the hash is really the email address you just hashed.
Any cloud service of any size will have many email addresses to run the hashing algorithm against. So any email address within the system can be hashed and checked against the stored hashes. So if two users are using the system and have each other in their contact list then the cloud service will "know" the true email address of these users despite each being a hash of the other in their contact list.
Salt is a mechanism to further obscure the original content. Salt is simply a 'secret work' added to the original unencrypted text (email) prior to hashing. It can be used to prevent such hash comparisons as mentioned above. However, this wont work in this case, as the hashing needs to either be done by the user or the cloud service. If it is done by the user, then the salt needs to be known by all users (in order to make sure the hash output is the same for same email addresses) and therefore is public and no longer a 'secret'. If the salt secret is held by the cloud service, then the service can use this salt at any time to do the lookups described above
Further, if all ur contact email addresses are hashed, then what possible use could there be in sending them ur contacts in the first place? If the service doesnt involve using the email addresses then why would you be sending them to the cloud service in the first place? If the cloud service doesnt require the data you are sending to it in order to provide the service, then it is an unethical service, and you should not be using it.
Hashing algorithms have value for comparing text and binary data. It is like a signature. If you know the hash value for some given data and you know how that hash was created (including the salt)... then u can use the same hashing algorithm to compare the hashes and verify that the data was not changed. Because it is known to be very hard to "fake" a hash. That is, if you know the email input and you know the hash output... it is nearly impossible to create a totally different email address that will produce the same hash output as the initial one. Therefore, hashing is very useful in verifying things to make sure the data was not changed... say for example a piece of code could be checked to make sure no virus was inserted before you run it.
Hashing algorithms are not useful in obscuring data. They are best at verifying that no changes have occurred. They do not provide a good mechanism of getting back to the original data.
I do not know the 'Path' cloud service or iphone app. I dont use it and dont want to bother reading about it. I will just say that it seems apparent that the app was downloading customer's address book in order to collect information that was not required. If the service does not require the data and instead asks the user for permission, then I doubt this the address info is required. If this is true, then it seems that the service was simply collecting as much information from the app user as possible in order to create searchable datamining information that can later be sold to 3rd parties. I suggest you do not use this service. If the app really does require contact information and email addresses to work, then I think PKI encryption is much more appropriate. Where the public and private keys are retained on two separate servers and the users encrypted contact information is on the server with the public key. And the private key is held on a highly secure server which only processes and decrypts in
if it's collision free, it's a one-to-one mapping, and hence theoretically reversible. collisions could result in, well, collisions...