Gravatars Can Leak Users' Email Addresses
abell writes "Gravatar offers a global avatar service, using an MD5 hash of the user's email as avatar ID. This piece of information in some cases is enough to retrieve the original email address. Testing a simple attack on stackoverflow.com, I was able to determine the email addresses of more than 10% of the site's users."
No it's not related to MD5 itself. period.
It would have been trivial for them to just add a secret salt string to the email before hashing, and that would have solved most of the problem. It is possible that they wanted to be "nice", in that in the case they go out of business, anyone can regenerate the ID's without them. But, as this guy has shown, that's not a great idea.
It's not, any hashing function would be subject to the same problem. If you RTFA you'll find that they just brute force combinations of the user name and common email domains.
To actually fix this would require not hashing (only) email address, you could mix in some secret salt with the email before hashing, or you could use encryption (with a secret key), or you could just hand out unique identifiers which are associated only in the Gravitar database. I don't know if any of these are feasible for this particular application though.
Game! - Where the stick is mightier than the sword!
Here's my own Gravatar hash:
b835b33911b93c136d8e61cbbbe6736d
Who will be the first to crack it?
Can anyone tell me if the "you can add extra stuff after a +" that GMail lets you do is standard in the RFC for all email addresses? If it is, to "fix" this, if you should sign up to Gravatar with an email address using a random string after an added "+" the brute force search on hashes will be much, much harder. (Assuming that your email provider is implementing that part of the standard.)
The attack doesn't rely on MD5 itself or MD5 collisions. It would work no matter what hashing algorithm was used.
MD5 collisions actually don't help the attacker here, in fact, an MD5 collision would simply be a false positive for this case (the attacker thinks they've found the email address, but they haven't).
Game! - Where the stick is mightier than the sword!
I disagree.
Granted, those are basically very unsophisticated databases that just store lookup values, but it's relatively easy to bruteforce an MD5 hash down into one of the possible original strings (obviously with any algorithm that has a fixed output size with limitless inputs like MD5 there are infinite inputs that will hash down to a single md5sum, but when you're trying to get a valid email address out of a hash it's easy to pick the right one). Couple that with the fact that in this situation, you know that the entire string is lowercased and probably 60% of the gravatar emails (probably more like 90% actually) are going to come from one of four or five domains... reversal becomes quite easy. If you're bored, you could spin up a few Amazon EC2 or Rackspace Cloud Server instances to dump out some large tables. One each for gmail, yahoo, msn, aol, whatever else; it'd be a very simple script to make. You could probably cover every alphanumeric email address under 12 characters overnight, at a cost of about a dollar and ten minutes of scripting.
The thing to realize here is that gravatar doesn't md5 emails to hide them from people who want to obscure their identity, just to obscure them from spambots. So it's really a non-issue. If you're that concerned, leave your blog comments with a fake email address.
How are sites slashdotted when nobody reads TFAs?
And you didn't think of Gravitar instead? Kids these days...
http://en.wikipedia.org/wiki/Gravitar
Do you consider your email address private info, need-to-know only? With a decent spam filter and easy-to-use block features, it really isn't a problem. I provide mine to pretty much anyone who asks. The only thing I do is keep it in a non-scrapable format, to keep it from getting on too many spam lists.
It's quite well known that MD5 shouldn't be used for anything privacy related, given the fact that it's been exploited quite publicly in recent history.
An email address isn't private... I suspect that MD5 was just a convenient way to get a fixed length id. I'd be more worried about collisions, but i'm too lazy to calculate how many avatars would be required before that might become a problem.
I actually *just* (20 minutes ago) put my picture up there. Can you guess my email ;)
Use your email address with "+randomsequence"@
Randomsequence will have to be consistent between the user and the sites they want the gravatar to work at, but it will generate an MD5 hash different than their actual address; yet if the site sends email to the user with it the user will receive it.
But is this significantly easier than other methods of harvesting email addresses? Spammers already do dictionary attacks on big providers like yahoo. It's not clear to me that this method is a better way of generating a list of email addresses. If you carry out a dictionary attack on yahoo.com, you're going to come up with probably tens of millions of valid email addresses. If you carry out this attack on gravatar.com, how many addresses are you going to get for your trouble? 10% of gravatar's users, apparently -- which I'm guessing is not really that big a number. Remember, once a spammer has a botnet, it costs him zero to send out one more spam to test whether a particular address is valid. Therefore the dictionary attack is free.
The defense against dictionary attacks is also exactly the same as the defense against this attack: either don't use a big email provider, or use a big email provider but pick a username that has a lot of characters (so it's not vulnerable to brute-forcing) and is also not vulnerable to dictionary attacks.
Find free books.
What I'm wondering is why this matters at all. A spammer would just send emails [your username]@[every common email domain]. Why would they bother to check if it's the correct address or not?
Not really, since the salt would need to be publicly known for Gravatar to work (and it would break any backwards compatibility to add it in now). This was a 'social engineering' attack, not a rainbow table lookup – it pieced the name together with common providers to find a matching MD5. Salt would just add a single extra step.
I believe it's exactly the same problem/attack as was brought up about MicroID in the past. The idea of Pavatar is a much better way to do this sort of avatar-finding (though the decentralisation comes with its own problems), since it relies on a public web address instead of a semi-private e-mail address.
It is, actually. If you don't include the -n option for echo, it will insert a \n to the string, changing the md5, which is the hash you got.
Email addresses are usernames. They are not secret information. If somebody can be bothered enough to find your email address through brute-forcing the MD5 hash of it; you've got bigger problems.
Far more than "10% of stackoverflow.com's users" can have their email addresses GUESSED far faster. Likely your email address is also FAR easier to establish through a simple Google search on your pseudonyms.
If you for some odd reason want your email address to be secret; for the same name as wanting a secret pseudonym or using a false name when signing up; register a fake email address instead (and set it up for forwarding). You're giving your email address in clear text to the site's owner and all the internet hops inbetween him and you ANYWAY.
It's important to learn to distinguish between what is a secret and what is not; and if you want to make things secret, at what level you should put your trust.
``OK, so ten out of ten for style, but minus several million for good thinking, yeah?''
Doubt it. there's 26 letters and 10 digits, in addition to that . is very common in email-adresses. Thus you get 37 possibilities for each position. 37 to the 12th power is 6582952005840035281 hashes to run, and even if you do 10^9 Hz (i.e. one giga-hash-a-second, which would require on the order of a few hundred cores), you'd still need 208 years to do that many hashes -- then you need to look up each of them in gravatar, and analyze the result for a hit-or-miss.
"every alphanumeric email-address under 12 characters" is infact much too large a keyspace to reasonably cover overnight with a "very simple script".
It's not a large enough keyspace to be cryptographically secure, but it's large enough to not be trivially exhaustible.
It's not exactly big news that a system based on MD5 hashes is susceptible to dictionary-style attacks; this should be obvious to anyone who understands how hashes work. In order for this particular attack to work, the attacker already has to have some reasonable guesses as to what your e-mail address is; the Gravatar trick only confirms the address. So it seems to me that the amount of additional data leaked is fairly small.
OTOH, I suppose I'm somewhat desensitized to this sort of thing, since I've had the same primary e-mail address for something like 15 years (going back to the days when I was rather active on Usenet). My e-mail address is already in every spammer database on the planet, so I don't see how a few more people knowing it could make things any worse!
That's assuming email addresses are random sequences of letters, digits and dots.
If you're a spammer and don't mind missing the email of mr. q9x7.3f.1zzp@hotmail.com, a phone book would probably provide an effective dictionary for narrowing that keyspace considerably
From Gravatar's FAQ:
MD5 isnt strong enough encryption, they’ve cracked that havent they?
MD5 is plenty good for obfuscating the email address of users across the wire. if you’re thinking of rainbow tables, those are all geared at passwords (which are generally shorter, and less globally different from one another) and not email addresses, furthermore they are geared at generating anything that matches the hash, NOT the original data being hashed. If you are thinking about being able to reproduce a collision, you still don’t necessarily get the actual email address being hashed from the data generated to create the collision. In either case the work required to both construct and operate such a monstrocity would be prohibitively costly. If we left your password laying around in the open as a plain md5 hash someone might be able to find some data (not necessarily your password) which they could use to log in as you... Leaving your email address out as an md5 hash, however, is not going to cause a violent upsurge in the number of fake rolex watch emails that you get. Lets face it there are far more lucrative, easier, ways of getting email address. I hope this helps ease your mind.
So, they might have already thought about this vulnerability and dismissed it as not interesting.
They could still fix their concept by providing an API where a website wanting to discover the avatar for a given email first hashes the email with MD5 and then the Gravatar URL which is generated redirects them to a link to the image (which contains no information about the email address, or perhaps uses a salted hash). This, in conjunction with rate limiting the number of queries per website, could provide a relatively secure way to do what they want.
Security through Obscurity is a reference to the METHOD being obscure. Your encryption codes and salts are SUPPOSED to be obscure!!!
Some email providers have a simple way of giving you a throw away id. E.g example+slashdotnospam@gmail.com is sent to example@gmail.com.
Say my name is Lary Page. If my email id is lary.page@gmail.com, I can still protect myself so that you will never get my email id.
MD5 (lary.page@gmail.com) = "1b8dbe98e2b1138fd3ba34e26fc55107".
So I provide my email id as lary.page+1b8dbe98e2b1138fd3ba34e26fc55107@gmail.com. If I gave you the md5 of that id, you'll find it hard to get back to lary.page@gmail.com.
Try, the MD5 hash of the above email id is 803efbc80ead933f28d0704d43d1f63b.
Or, use john -incremental -stdout. This will test reasonable names first, while not being restricted to RL names only.
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
Using @ instead of @ is enough to stop most e-mail harvesting bots, I don't see them brute-forcing MD5s any time soon.
This is not related to the MD5 algorithm or use of salts. The fact is that Gravatar wants sites to use Gravatar without sending loads of requests to gravatar.com. Therefore Gravatar must provide a "client-side" API for generating Gravatar avatar URLs based on the known constant, email addresses. Sure, they could have salted things, but whatever they do, there's an essentially open source function somewhere that takes an email address and converts it to a Gravatar URL. As the algorithm is available to anyone, any attack can use it to check intelligent guesses against the known algorithm result.
There really isn't anything Gravatar can do without changing their design to decouple avatar URLs from email addresses. Basically whenever anyone registers an account with a blog, the site would have to ask Gravator for the user's Gravatar avatar URL -- and probably poll on some regular basis in case users add Gravatar avatars later. The blog would then have to pertain this data in their databases for later look-up when comments are viewed. This is certainly possible, and could probably be designed in a way that doesn't add additional load to Gravatar's servers. But compared to the current implementation, which can be added to blogs with very minimal coding (probably just a couple lines in PHP), to do this more safely would require persistence-layer/database schema changes that would severely limit the attractiveness of Gravatar.
my blog
Bolex make [motion picture] cameras, not watches, and were very important in the early television news reels. Even today they are a staple in film schools.
1) register as a website with gravatar, find out how long the salt is
2) register on stackoverflow with your email address
3) enumerate the possibilities until you find the hash of your own address and therefore the salt
4) extract 8000+ emails from stackoverflow
5) repeat for other sites
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
Correct: the attack here is:
Take big Site with thousands of user, many using thier (sorta) "real names".
Permute these names with some known big email provider hostnames.
Send them all some spam.
It does not really matter if 90% of those emailadresses are incorrect, the rest will hit.
I would not do the MD5 validation thing, why should I?
we need an "-1 Plain wrong" moderation option!
The salt can be user and website dependent (4 bytes user/4 bytes website for an 8 byte salt). Although I think that the added complexity won't be welcomed by the website owners