Gravatars Can Leak Users' Email Addresses
abell writes "Gravatar offers a global avatar service, using an MD5 hash of the user's email as avatar ID. This piece of information in some cases is enough to retrieve the original email address. Testing a simple attack on stackoverflow.com, I was able to determine the email addresses of more than 10% of the site's users."
If this is directly related to MD5 (as it would seem), let's hope Gravatar switches to another algorithm. Of course, this won't do much about the existing hashes I suppose.
512 MB RAM, 20 GB disk, 200 GB transfer, five datacenters. $19.95/month.
It would have been trivial for them to just add a secret salt string to the email before hashing, and that would have solved most of the problem. It is possible that they wanted to be "nice", in that in the case they go out of business, anyone can regenerate the ID's without them. But, as this guy has shown, that's not a great idea.
Here's my own Gravatar hash:
b835b33911b93c136d8e61cbbbe6736d
Who will be the first to crack it?
I'm no expert in cryptography, but would it be helpful for them to add a salt? (Unless they do that already, of course)
Can anyone tell me if the "you can add extra stuff after a +" that GMail lets you do is standard in the RFC for all email addresses? If it is, to "fix" this, if you should sign up to Gravatar with an email address using a random string after an added "+" the brute force search on hashes will be much, much harder. (Assuming that your email provider is implementing that part of the standard.)
Unless I'm missing something, the article can be summarized: "Guess the person's email address, check if the md5 hash of the address you guessed matches the Gravatar. If it matches you guessed correctly."
Nothing to see here. Move along...
In other news, all password hashes can eventually be cracked by brute force... Oh noes!
Can anyone say Rainbow Tables? Tweak the algorithm to output valid e-mail addresses. As for the salt, as long as it isn't known, while it can make is computationaly difficult, it won't stop some addresses from being hacked using the aforementioned method.
...I thought "Gravatar" was a new theoretical exotic particle, like a Graviton, especially when used with the following "can leak", but this actually makes more sense - sort of - though I don't know if "leak" is the best verb here. In any case, I gotta stop reading science journals late at night.
It must have been something you assimilated. . . .
Do you consider your email address private info, need-to-know only? With a decent spam filter and easy-to-use block features, it really isn't a problem. I provide mine to pretty much anyone who asks. The only thing I do is keep it in a non-scrapable format, to keep it from getting on too many spam lists.
Maybe I am missing the point, but who cares?
I understand that there is this huge number of people that think that an email address is private information, but why?
Gravatar just needs every user to supply a "salt" along with there email where ever there gravatar is used, they could even call it a password. Combine the password/salt with the emacs to generate the hash. This would make guessing the email from the hash much more difficult.
TFA suggests trying email addresses related to the user's ID on some site and domain names of large hosting companies (for example, Michael Smith might be msmith@example.com, or michael.smith@example.com) and testing whether or not their md5sum is the same as the one associated with the avatar. However, a bad guy could just send any message to all such addresses and hope one hits. Of course, he might accidentally be spamming some other suckers with the same name, but no true villain would be bothered by this sort of collateral damage.
The rainbow table suggestion is more serious, since someone could find out your email address even if your screen name is different from the name in your email. (So if you registered at a site as "anonymous_user", but provided the Gravatar people with an email address containing your real name, then the bad guys could find out your real name.) This is bad, but as a mitigating factor, the real name has to be in the rainbow table in the first place, so it is probably fairly common. If the villain finds out your name is Michael Smith, he probably still has no idea which Michael Smith you are.
Wow. You can glean information from the Internets. I didn't realize that.
I actually *just* (20 minutes ago) put my picture up there. Can you guess my email ;)
that addy has a different icon
Crap. What did the new CSS do with the "Post anonymously" option??
Use your email address with "+randomsequence"@
Randomsequence will have to be consistent between the user and the sites they want the gravatar to work at, but it will generate an MD5 hash different than their actual address; yet if the site sends email to the user with it the user will receive it.
If you have an MD5 hash of a file or phrase, like an email, and have candidates, you can compare them and see if there's a match! Video at 11!
But seriously, this approach isn't really novel, just a novel application of existing technology. That said, it only effects users who's emails were already easily guessable. If your username is Jon Robert and your email is jon.robert@gmail.com...well, if I was guessing without this, I'd guess that first anyways. All this does it permit you to confirm an email. This is an exploit, but not really all that dangerous of one, because it doesn't reveal emails, only let you confirm that the email you guessed exists and what it is.
The approach of using rainbow tables, only discussed briefly, is a bit more concerning, and I'd like to see more about this.
But is this significantly easier than other methods of harvesting email addresses? Spammers already do dictionary attacks on big providers like yahoo. It's not clear to me that this method is a better way of generating a list of email addresses. If you carry out a dictionary attack on yahoo.com, you're going to come up with probably tens of millions of valid email addresses. If you carry out this attack on gravatar.com, how many addresses are you going to get for your trouble? 10% of gravatar's users, apparently -- which I'm guessing is not really that big a number. Remember, once a spammer has a botnet, it costs him zero to send out one more spam to test whether a particular address is valid. Therefore the dictionary attack is free.
The defense against dictionary attacks is also exactly the same as the defense against this attack: either don't use a big email provider, or use a big email provider but pick a username that has a lot of characters (so it's not vulnerable to brute-forcing) and is also not vulnerable to dictionary attacks.
Find free books.
... the emails about ad I got today on my email address I used to register a gravatar.
Email addresses are usernames. They are not secret information. If somebody can be bothered enough to find your email address through brute-forcing the MD5 hash of it; you've got bigger problems.
Far more than "10% of stackoverflow.com's users" can have their email addresses GUESSED far faster. Likely your email address is also FAR easier to establish through a simple Google search on your pseudonyms.
If you for some odd reason want your email address to be secret; for the same name as wanting a secret pseudonym or using a false name when signing up; register a fake email address instead (and set it up for forwarding). You're giving your email address in clear text to the site's owner and all the internet hops inbetween him and you ANYWAY.
It's important to learn to distinguish between what is a secret and what is not; and if you want to make things secret, at what level you should put your trust.
``OK, so ten out of ten for style, but minus several million for good thinking, yeah?''
What if Gravatar published a public key, and sites displaying Gravatars pointed their image links to encrypt(gravatar_id + random_salt)? It seems like this would solve the problem, since people viewing the page can't get access to the users' real Gravatar IDs. Sure, the forum sites would still see your Gravatar ID, but they already have your email address in the first place.
Gravatar! ... for i shall be your provider... your companion, meh, YOUR MASTER!
I more or less agree with you that this isn't particularly newsworthy (is Gravatar all that widely used?), except for the fact that if they had bothered to add a random, secret salt before hashing, everything would have been secure (or rather, as secure as the secret salt).
> In other news, all password hashes can eventually be cracked by brute force... Oh noes!
True, but that is like saying "No encryption which uses a key smaller than the length of the ciphertext is secure": mathematically true, but not true in practice.
I think what you should have said instead was:
"In other news, doing security is harder than you think."
It's not exactly big news that a system based on MD5 hashes is susceptible to dictionary-style attacks; this should be obvious to anyone who understands how hashes work. In order for this particular attack to work, the attacker already has to have some reasonable guesses as to what your e-mail address is; the Gravatar trick only confirms the address. So it seems to me that the amount of additional data leaked is fairly small.
OTOH, I suppose I'm somewhat desensitized to this sort of thing, since I've had the same primary e-mail address for something like 15 years (going back to the days when I was rather active on Usenet). My e-mail address is already in every spammer database on the planet, so I don't see how a few more people knowing it could make things any worse!
A) Isn't the point of it to be a public system, so that sites can accept users' email addresses, then find the gravatars themselves?
I suppose you're right. In which case no trivial workaround can exist (because the attacker just pretends to be a website wanting to discover the guessed emails' avatars). OTOH, if Gravatar would implement a two-step API for getting the information, and implement rate limits on the API, doing the attack could be made much, much harder.
I vaguely remember looking at the Gravatar site when it opened up a long time ago, but personally I have no use for avatars and prefer not to have a global net persona (or at least one which is trivially assembled from all of the little persona pieces I have spread around).
B) Wouldn't it be equally easy to reverse engineer the salt string, with your own known test email? (As long as the salt is shorter than some limit maybe)
The whole point of using a salt (in my eyes, anyway) is that it should be long enough that brute forcing it is unreasonable.
From Gravatar's FAQ:
MD5 isnt strong enough encryption, they’ve cracked that havent they?
MD5 is plenty good for obfuscating the email address of users across the wire. if you’re thinking of rainbow tables, those are all geared at passwords (which are generally shorter, and less globally different from one another) and not email addresses, furthermore they are geared at generating anything that matches the hash, NOT the original data being hashed. If you are thinking about being able to reproduce a collision, you still don’t necessarily get the actual email address being hashed from the data generated to create the collision. In either case the work required to both construct and operate such a monstrocity would be prohibitively costly. If we left your password laying around in the open as a plain md5 hash someone might be able to find some data (not necessarily your password) which they could use to log in as you... Leaving your email address out as an md5 hash, however, is not going to cause a violent upsurge in the number of fake rolex watch emails that you get. Lets face it there are far more lucrative, easier, ways of getting email address. I hope this helps ease your mind.
So, they might have already thought about this vulnerability and dismissed it as not interesting.
They could still fix their concept by providing an API where a website wanting to discover the avatar for a given email first hashes the email with MD5 and then the Gravatar URL which is generated redirects them to a link to the image (which contains no information about the email address, or perhaps uses a salted hash). This, in conjunction with rate limiting the number of queries per website, could provide a relatively secure way to do what they want.
Agreed. In fact, when I first created a gravatar, this "newly discovered" problem immediately occurred to me; I suspect the same is true for many other gravatar users.
I guess you could add a salt yourself, at least of your email provider works like gmail, and allows you to supply a meaningless string after a +. If the first part of your email address is guessable from your username, you could do something like:
homburg+randomsalt@gmail.com
Some email providers have a simple way of giving you a throw away id. E.g example+slashdotnospam@gmail.com is sent to example@gmail.com.
Say my name is Lary Page. If my email id is lary.page@gmail.com, I can still protect myself so that you will never get my email id.
MD5 (lary.page@gmail.com) = "1b8dbe98e2b1138fd3ba34e26fc55107".
So I provide my email id as lary.page+1b8dbe98e2b1138fd3ba34e26fc55107@gmail.com. If I gave you the md5 of that id, you'll find it hard to get back to lary.page@gmail.com.
Try, the MD5 hash of the above email id is 803efbc80ead933f28d0704d43d1f63b.
I think most of us figured out this possibility within 30 seconds of seeing how Gravatar worked.
One solution would be to have a private salt known only to Gravatar and the implementing website. Gravatar could determine the correct salt to use base on the referrer.
Of course this would mean each subscriber would need to be hashed against each salt in the Gravatar database.
In either case, I don't think it's really that big a deal.
Call me when he finds a way to determine the email after gravatar starts adding a pinch of salf to the hashed emails...
Using @ instead of @ is enough to stop most e-mail harvesting bots, I don't see them brute-forcing MD5s any time soon.
Is obvious for everyone that understand how it work.
Geez...
As the email of Gave (from Valve) is well know, and gravatars can be used in a pseudoanonymous way, I tried to search internet for the hash of is email in images.google.com. Not found. Either Gabe don't talk in forums gravatar powered, or he use a different email address.
So, If you use gravatars, and other people know your email, can search your post. This is obvious from the use of md5. With your addres hashed with md5 spamm bots can't collect address, but thats is, not privacy.
-Woof woof woof!
This is not related to the MD5 algorithm or use of salts. The fact is that Gravatar wants sites to use Gravatar without sending loads of requests to gravatar.com. Therefore Gravatar must provide a "client-side" API for generating Gravatar avatar URLs based on the known constant, email addresses. Sure, they could have salted things, but whatever they do, there's an essentially open source function somewhere that takes an email address and converts it to a Gravatar URL. As the algorithm is available to anyone, any attack can use it to check intelligent guesses against the known algorithm result.
There really isn't anything Gravatar can do without changing their design to decouple avatar URLs from email addresses. Basically whenever anyone registers an account with a blog, the site would have to ask Gravator for the user's Gravatar avatar URL -- and probably poll on some regular basis in case users add Gravatar avatars later. The blog would then have to pertain this data in their databases for later look-up when comments are viewed. This is certainly possible, and could probably be designed in a way that doesn't add additional load to Gravatar's servers. But compared to the current implementation, which can be added to blogs with very minimal coding (probably just a couple lines in PHP), to do this more safely would require persistence-layer/database schema changes that would severely limit the attractiveness of Gravatar.
my blog
That's why I use a new hotmail address usually made with the sites name and my own to keep logs of everything that comes from there, so if anything is compromised, then I know usually where it comes from. Also I have no worries someone gets my address as it is irrelevant seeing as it is not my real one.
The point of TFA is that one can identify the user's email address from the hash. The question is, why hash the email address. It could be just as easy to hash an integer value unique for the user. Hell, it could even be an incremental. Who cares if somebody can identify that joe@nothing.com has a Gravatar ID of 123. That ID can't be traced back to any specific Gravatar account, as the link between ID and email address would be internal.
The important part of the trick is that you have to assume the email address is the same as the username and then compare the hashes of that name @yahoo.com, @hotmail.com, @gmail.com, and other popular email services. Because people that use those webmail addresses have never received spam before.
If any spammer did try this, I would expect them to be very pissed off to discover that after all that work they already had 99% or more of those addresses to begin with.
If your email address is common-word@famousprovider.com, then the spammers have already put your email address into their lists. Why not? They don't care if 95% of the mail they send bounces, and they don't care if they target any specific person, the "hit" rate they need to make a profit is is negligible. I see spam attempts to thousands of never-existed addresses on my colo, and my home domain is pretty damn obscure. I'm sure Gmail gets hits from aaron.aardvark through zephram.zymurgy continually.
Pass the salt please.