Gravatars Can Leak Users' Email Addresses
abell writes "Gravatar offers a global avatar service, using an MD5 hash of the user's email as avatar ID. This piece of information in some cases is enough to retrieve the original email address. Testing a simple attack on stackoverflow.com, I was able to determine the email addresses of more than 10% of the site's users."
No it's not related to MD5 itself. period.
It would have been trivial for them to just add a secret salt string to the email before hashing, and that would have solved most of the problem. It is possible that they wanted to be "nice", in that in the case they go out of business, anyone can regenerate the ID's without them. But, as this guy has shown, that's not a great idea.
I'm sure that there will be a lot more emails offering mostly new Bolex watches in a few inboxes around...
Science advances one funeral at a time- Max Planck
Or just add some salt?
It's not, any hashing function would be subject to the same problem. If you RTFA you'll find that they just brute force combinations of the user name and common email domains.
To actually fix this would require not hashing (only) email address, you could mix in some secret salt with the email before hashing, or you could use encryption (with a secret key), or you could just hand out unique identifiers which are associated only in the Gravitar database. I don't know if any of these are feasible for this particular application though.
Game! - Where the stick is mightier than the sword!
Here's my own Gravatar hash:
b835b33911b93c136d8e61cbbbe6736d
Who will be the first to crack it?
A normal implementation of salt (with the salt in plaintext along with the hash) would not help in this case.
Game! - Where the stick is mightier than the sword!
Can anyone tell me if the "you can add extra stuff after a +" that GMail lets you do is standard in the RFC for all email addresses? If it is, to "fix" this, if you should sign up to Gravatar with an email address using a random string after an added "+" the brute force search on hashes will be much, much harder. (Assuming that your email provider is implementing that part of the standard.)
Unless I'm missing something, the article can be summarized: "Guess the person's email address, check if the md5 hash of the address you guessed matches the Gravatar. If it matches you guessed correctly."
Nothing to see here. Move along...
In other news, all password hashes can eventually be cracked by brute force... Oh noes!
The attack doesn't rely on MD5 itself or MD5 collisions. It would work no matter what hashing algorithm was used.
MD5 collisions actually don't help the attacker here, in fact, an MD5 collision would simply be a false positive for this case (the attacker thinks they've found the email address, but they haven't).
Game! - Where the stick is mightier than the sword!
I did, in fact, RTFA. It points out that even in the absence of search space limiting tricks employed by the author, rainbow tables could be used to achieve the same goal. Adding salt would have made the problem quite a bit tougher for an attacker, but wouldn't have put it completely out of reach. It's quite well known that MD5 shouldn't be used for anything privacy related, given the fact that it's been exploited quite publicly in recent history.
512 MB RAM, 20 GB disk, 200 GB transfer, five datacenters. $19.95/month.
...I thought "Gravatar" was a new theoretical exotic particle, like a Graviton, especially when used with the following "can leak", but this actually makes more sense - sort of - though I don't know if "leak" is the best verb here. In any case, I gotta stop reading science journals late at night.
It must have been something you assimilated. . . .
Yeah, I read it wrong :). Salt probably would've helped a bunch, though.
512 MB RAM, 20 GB disk, 200 GB transfer, five datacenters. $19.95/month.
I disagree.
Granted, those are basically very unsophisticated databases that just store lookup values, but it's relatively easy to bruteforce an MD5 hash down into one of the possible original strings (obviously with any algorithm that has a fixed output size with limitless inputs like MD5 there are infinite inputs that will hash down to a single md5sum, but when you're trying to get a valid email address out of a hash it's easy to pick the right one). Couple that with the fact that in this situation, you know that the entire string is lowercased and probably 60% of the gravatar emails (probably more like 90% actually) are going to come from one of four or five domains... reversal becomes quite easy. If you're bored, you could spin up a few Amazon EC2 or Rackspace Cloud Server instances to dump out some large tables. One each for gmail, yahoo, msn, aol, whatever else; it'd be a very simple script to make. You could probably cover every alphanumeric email address under 12 characters overnight, at a cost of about a dollar and ten minutes of scripting.
The thing to realize here is that gravatar doesn't md5 emails to hide them from people who want to obscure their identity, just to obscure them from spambots. So it's really a non-issue. If you're that concerned, leave your blog comments with a fake email address.
How are sites slashdotted when nobody reads TFAs?
Do you consider your email address private info, need-to-know only? With a decent spam filter and easy-to-use block features, it really isn't a problem. I provide mine to pretty much anyone who asks. The only thing I do is keep it in a non-scrapable format, to keep it from getting on too many spam lists.
In order for Gravatar to work, the algorithm has to be publicly known. Which means every site uses the same salt (pointless) or each domain has its own salt, which can be determined from the referrer header (not only also pointless since a potential attacker knows what site they're on, but it would also make the service pretty much impossible to implement). The only other option would be two-way encryption with some sort of per-domain shared key, but given that most of the point of Gravatar is simplicity of implementation, that's just not going to happen.
How are sites slashdotted when nobody reads TFAs?
Maybe I am missing the point, but who cares?
I understand that there is this huge number of people that think that an email address is private information, but why?
Gravatar just needs every user to supply a "salt" along with there email where ever there gravatar is used, they could even call it a password. Combine the password/salt with the emacs to generate the hash. This would make guessing the email from the hash much more difficult.
Wow. You can glean information from the Internets. I didn't realize that.
It's quite well known that MD5 shouldn't be used for anything privacy related, given the fact that it's been exploited quite publicly in recent history.
An email address isn't private... I suspect that MD5 was just a convenient way to get a fixed length id. I'd be more worried about collisions, but i'm too lazy to calculate how many avatars would be required before that might become a problem.
You hit the nail on the head. If one uses these, they should either use an alias (I know Hushmail and Yahoo both offer alias functionality) that they can filter incoming mail with.
Even better, because Gravatar is essentially Alice and Bob, they should have gone with either a salt (64 bits is "meh", 128 is decent, 256 is good for the forseeable future), SHA-256, and toss in a site key that only their backend database knows. This way, it would be immensely difficult to associate the hash with an E-mail address even if the attacker suspected both were connected.
Best of all would just to have Gravatar use random nonces and have their backend database store the nonce -> user tuple. This way, there is no algorithm that would allow an attacker to correlate decisively the pictures and E-mail addresses. Even better would be a many to one ratio so a user can have hundreds of nonces, so an attacker couldn't use frequency guessing to figure out an E-mail address.
I actually *just* (20 minutes ago) put my picture up there. Can you guess my email ;)
that addy has a different icon
Crap. What did the new CSS do with the "Post anonymously" option??
Use your email address with "+randomsequence"@
Randomsequence will have to be consistent between the user and the sites they want the gravatar to work at, but it will generate an MD5 hash different than their actual address; yet if the site sends email to the user with it the user will receive it.
But is this significantly easier than other methods of harvesting email addresses? Spammers already do dictionary attacks on big providers like yahoo. It's not clear to me that this method is a better way of generating a list of email addresses. If you carry out a dictionary attack on yahoo.com, you're going to come up with probably tens of millions of valid email addresses. If you carry out this attack on gravatar.com, how many addresses are you going to get for your trouble? 10% of gravatar's users, apparently -- which I'm guessing is not really that big a number. Remember, once a spammer has a botnet, it costs him zero to send out one more spam to test whether a particular address is valid. Therefore the dictionary attack is free.
The defense against dictionary attacks is also exactly the same as the defense against this attack: either don't use a big email provider, or use a big email provider but pick a username that has a lot of characters (so it's not vulnerable to brute-forcing) and is also not vulnerable to dictionary attacks.
Find free books.
What I'm wondering is why this matters at all. A spammer would just send emails [your username]@[every common email domain]. Why would they bother to check if it's the correct address or not?
Not really, since the salt would need to be publicly known for Gravatar to work (and it would break any backwards compatibility to add it in now). This was a 'social engineering' attack, not a rainbow table lookup – it pieced the name together with common providers to find a matching MD5. Salt would just add a single extra step.
I believe it's exactly the same problem/attack as was brought up about MicroID in the past. The idea of Pavatar is a much better way to do this sort of avatar-finding (though the decentralisation comes with its own problems), since it relies on a public web address instead of a semi-private e-mail address.
Email addresses are usernames. They are not secret information. If somebody can be bothered enough to find your email address through brute-forcing the MD5 hash of it; you've got bigger problems.
Far more than "10% of stackoverflow.com's users" can have their email addresses GUESSED far faster. Likely your email address is also FAR easier to establish through a simple Google search on your pseudonyms.
If you for some odd reason want your email address to be secret; for the same name as wanting a secret pseudonym or using a false name when signing up; register a fake email address instead (and set it up for forwarding). You're giving your email address in clear text to the site's owner and all the internet hops inbetween him and you ANYWAY.
It's important to learn to distinguish between what is a secret and what is not; and if you want to make things secret, at what level you should put your trust.
``OK, so ten out of ten for style, but minus several million for good thinking, yeah?''
Doubt it. there's 26 letters and 10 digits, in addition to that . is very common in email-adresses. Thus you get 37 possibilities for each position. 37 to the 12th power is 6582952005840035281 hashes to run, and even if you do 10^9 Hz (i.e. one giga-hash-a-second, which would require on the order of a few hundred cores), you'd still need 208 years to do that many hashes -- then you need to look up each of them in gravatar, and analyze the result for a hit-or-miss.
"every alphanumeric email-address under 12 characters" is infact much too large a keyspace to reasonably cover overnight with a "very simple script".
It's not a large enough keyspace to be cryptographically secure, but it's large enough to not be trivially exhaustible.
What if Gravatar published a public key, and sites displaying Gravatars pointed their image links to encrypt(gravatar_id + random_salt)? It seems like this would solve the problem, since people viewing the page can't get access to the users' real Gravatar IDs. Sure, the forum sites would still see your Gravatar ID, but they already have your email address in the first place.
There are two attacks here. The primary attack has absolutely nothing to do with the hash used. They just checked based on user names likely email addresses. The example given was from User Michael Smith to then check things like michael.smith@majoremailprovider.com and so on. This method, which nowhere uses anything about MD5 got around 10% of the emails. Another attack which did use hash collision detections only got 1%.
You're dead on about using thousands of hashes. The practice hurts an attacker far more than it hurts legitimate users. It's called key stretching, or key strengthening.
I think you need to stop giving crypto advice for the day, it's not going very well.
Salting would help a bit here, but far more effective would be key stretching. Hash the email, then feed the hash back through the hash function a few thousand times. The extra computation doesn't have much of an impact when generating a single email identifier, because hash functions are blazing fast, and 1,000 iterations is still blazing fast. But the extra computation grievously hurts people who are using brute force to create rainbow tables, making the whole thing take thousands of times longer.
So keep the salt secret to the server so at least someone has to brute force it?
I more or less agree with you that this isn't particularly newsworthy (is Gravatar all that widely used?), except for the fact that if they had bothered to add a random, secret salt before hashing, everything would have been secure (or rather, as secure as the secret salt).
> In other news, all password hashes can eventually be cracked by brute force... Oh noes!
True, but that is like saying "No encryption which uses a key smaller than the length of the ciphertext is secure": mathematically true, but not true in practice.
I think what you should have said instead was:
"In other news, doing security is harder than you think."
Salt could work if it was only known between the web site owner and Gravatar. After all the users only need the hash to download the avatar. But I guess that would be security through obscurity, and we don't want that.
It's not exactly big news that a system based on MD5 hashes is susceptible to dictionary-style attacks; this should be obvious to anyone who understands how hashes work. In order for this particular attack to work, the attacker already has to have some reasonable guesses as to what your e-mail address is; the Gravatar trick only confirms the address. So it seems to me that the amount of additional data leaked is fairly small.
OTOH, I suppose I'm somewhat desensitized to this sort of thing, since I've had the same primary e-mail address for something like 15 years (going back to the days when I was rather active on Usenet). My e-mail address is already in every spammer database on the planet, so I don't see how a few more people knowing it could make things any worse!
A) Isn't the point of it to be a public system, so that sites can accept users' email addresses, then find the gravatars themselves?
I suppose you're right. In which case no trivial workaround can exist (because the attacker just pretends to be a website wanting to discover the guessed emails' avatars). OTOH, if Gravatar would implement a two-step API for getting the information, and implement rate limits on the API, doing the attack could be made much, much harder.
I vaguely remember looking at the Gravatar site when it opened up a long time ago, but personally I have no use for avatars and prefer not to have a global net persona (or at least one which is trivially assembled from all of the little persona pieces I have spread around).
B) Wouldn't it be equally easy to reverse engineer the salt string, with your own known test email? (As long as the salt is shorter than some limit maybe)
The whole point of using a salt (in my eyes, anyway) is that it should be long enough that brute forcing it is unreasonable.
But you can easily get a list of known good domains and common user names (or you can just get a list of email addresses) which significantly reduces the search space.
That's assuming email addresses are random sequences of letters, digits and dots.
If you're a spammer and don't mind missing the email of mr. q9x7.3f.1zzp@hotmail.com, a phone book would probably provide an effective dictionary for narrowing that keyspace considerably
From Gravatar's FAQ:
MD5 isnt strong enough encryption, they’ve cracked that havent they?
MD5 is plenty good for obfuscating the email address of users across the wire. if you’re thinking of rainbow tables, those are all geared at passwords (which are generally shorter, and less globally different from one another) and not email addresses, furthermore they are geared at generating anything that matches the hash, NOT the original data being hashed. If you are thinking about being able to reproduce a collision, you still don’t necessarily get the actual email address being hashed from the data generated to create the collision. In either case the work required to both construct and operate such a monstrocity would be prohibitively costly. If we left your password laying around in the open as a plain md5 hash someone might be able to find some data (not necessarily your password) which they could use to log in as you... Leaving your email address out as an md5 hash, however, is not going to cause a violent upsurge in the number of fake rolex watch emails that you get. Lets face it there are far more lucrative, easier, ways of getting email address. I hope this helps ease your mind.
So, they might have already thought about this vulnerability and dismissed it as not interesting.
They could still fix their concept by providing an API where a website wanting to discover the avatar for a given email first hashes the email with MD5 and then the Gravatar URL which is generated redirects them to a link to the image (which contains no information about the email address, or perhaps uses a salted hash). This, in conjunction with rate limiting the number of queries per website, could provide a relatively secure way to do what they want.
I disagree.
Granted, those are basically very unsophisticated databases that just store lookup values, but it's relatively easy to bruteforce an MD5 hash down into one of the possible original strings
No, it's not. Or at least, it only is if you have truly awesome amounts of time or computing resources to spend. Hence lookup databases like those you reference.
Agreed. In fact, when I first created a gravatar, this "newly discovered" problem immediately occurred to me; I suspect the same is true for many other gravatar users.
Security through Obscurity is a reference to the METHOD being obscure. Your encryption codes and salts are SUPPOSED to be obscure!!!
get a list of email addresses
If they had that, they wouldn't need to do anything now would they?
If the gravatar makes the pairing trivial then it's trivial to automate. And so the spam filter will have to iterate.
To clear that up- rather than spamming the email address spammers will likely target the blog that displays the gravatar.
I guess you could add a salt yourself, at least of your email provider works like gmail, and allows you to supply a meaningless string after a +. If the first part of your email address is guessable from your username, you could do something like:
homburg+randomsalt@gmail.com
True. And -that- is feasible. I was just commenting on the claim that you can exhaustively search all 12-character alphanum strings in a trivial amount of time. you cannot.
The salt is not exactly supposed to be obscure. If it was then it would be just a password, and this is not the case as it would be called a password and not a salt. The salt should be available to the entities generating the hash (in this case the web sites). Now find me a practical way to distribute the salt to the legitimate web sites without the bad guys knowing...
Some email providers have a simple way of giving you a throw away id. E.g example+slashdotnospam@gmail.com is sent to example@gmail.com.
Say my name is Lary Page. If my email id is lary.page@gmail.com, I can still protect myself so that you will never get my email id.
MD5 (lary.page@gmail.com) = "1b8dbe98e2b1138fd3ba34e26fc55107".
So I provide my email id as lary.page+1b8dbe98e2b1138fd3ba34e26fc55107@gmail.com. If I gave you the md5 of that id, you'll find it hard to get back to lary.page@gmail.com.
Try, the MD5 hash of the above email id is 803efbc80ead933f28d0704d43d1f63b.
I think most of us figured out this possibility within 30 seconds of seeing how Gravatar worked.
One solution would be to have a private salt known only to Gravatar and the implementing website. Gravatar could determine the correct salt to use base on the referrer.
Of course this would mean each subscriber would need to be hashed against each salt in the Gravatar database.
In either case, I don't think it's really that big a deal.
Or, use john -incremental -stdout. This will test reasonable names first, while not being restricted to RL names only.
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
Call me when he finds a way to determine the email after gravatar starts adding a pinch of salf to the hashed emails...
Using @ instead of @ is enough to stop most e-mail harvesting bots, I don't see them brute-forcing MD5s any time soon.
By the birthday paradox, you'd have a 50% chance of a single collision after roughly 2^64 avatars (since MD5 hash size is 128 bits).
Is obvious for everyone that understand how it work.
Geez...
As the email of Gave (from Valve) is well know, and gravatars can be used in a pseudoanonymous way, I tried to search internet for the hash of is email in images.google.com. Not found. Either Gabe don't talk in forums gravatar powered, or he use a different email address.
So, If you use gravatars, and other people know your email, can search your post. This is obvious from the use of md5. With your addres hashed with md5 spamm bots can't collect address, but thats is, not privacy.
-Woof woof woof!
Wouldn't it be easier then to just email john.doe@gmail.com, john.doe@hotmail.com, and john.doe@aol.com, instead of passing the email addresses through a cloud based online avatar brute force MD5 email validating script?
That's assuming email addresses are random sequences of letters, digits and dots.
If you're a spammer and don't mind missing the email of mr. q9x7.3f.1zzp@hotmail.com, a phone book would probably provide an effective dictionary for narrowing that keyspace considerably
That's assuming nothing. You know how to read? Parent is talking about covering the ENTIRE range of emails under 12 characters with those characters.
This is not related to the MD5 algorithm or use of salts. The fact is that Gravatar wants sites to use Gravatar without sending loads of requests to gravatar.com. Therefore Gravatar must provide a "client-side" API for generating Gravatar avatar URLs based on the known constant, email addresses. Sure, they could have salted things, but whatever they do, there's an essentially open source function somewhere that takes an email address and converts it to a Gravatar URL. As the algorithm is available to anyone, any attack can use it to check intelligent guesses against the known algorithm result.
There really isn't anything Gravatar can do without changing their design to decouple avatar URLs from email addresses. Basically whenever anyone registers an account with a blog, the site would have to ask Gravator for the user's Gravatar avatar URL -- and probably poll on some regular basis in case users add Gravatar avatars later. The blog would then have to pertain this data in their databases for later look-up when comments are viewed. This is certainly possible, and could probably be designed in a way that doesn't add additional load to Gravatar's servers. But compared to the current implementation, which can be added to blogs with very minimal coding (probably just a couple lines in PHP), to do this more safely would require persistence-layer/database schema changes that would severely limit the attractiveness of Gravatar.
my blog
By using this exploit, spammers get additional user useful data: They'll know each user's full name in most cases. They'll know that the user is interested in the site he's commenting on. They'll know what language he speaks. Basically, they can compose much more compelling emails with a higher probability of getting through and even being seen as relevant to the recipient.
my blog
Bolex make [motion picture] cameras, not watches, and were very important in the early television news reels. Even today they are a staple in film schools.
1) register as a website with gravatar, find out how long the salt is
2) register on stackoverflow with your email address
3) enumerate the possibilities until you find the hash of your own address and therefore the salt
4) extract 8000+ emails from stackoverflow
5) repeat for other sites
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
This programmer used a bot to gather over 8k email addresses. So it's pretty useless against spam.
It's useless against bots specifically created to line up the user name with the Gravatar hash. That bot will only work on Stackoverflow.
If all you have is the hash, then bots would be pretty useless.
Correct: the attack here is:
Take big Site with thousands of user, many using thier (sorta) "real names".
Permute these names with some known big email provider hostnames.
Send them all some spam.
It does not really matter if 90% of those emailadresses are incorrect, the rest will hit.
I would not do the MD5 validation thing, why should I?
we need an "-1 Plain wrong" moderation option!
Yeah, I read it wrong :). Salt probably would've helped a bunch, though.
no, salt wouldn't help because it would have to be public and therefore known to the attacker, right?
I'm no expert in cryptography, but would it be helpful for them to add a salt? (Unless they do that already, of course)
The salt would have to be secret, which would ruin the whole point of other sites being able to calculate the md5 and use the gravatar. Making it public wouldn't work, because it would then be known to the attacker.
The salt doesn't have to be known. Gratar URLs work like this
http://en.gravatar.com/site/implement/url
The association email -> avatar is done through a MD5 hash function. If you register to a website with username@mailprovider.com, the website will compute the hash of your email address (in this case 476c8a979eed603fb855dca149c7af6b) and associate the avatar url
http://www.gravatar.com/avatar/476c8a979eed603fb855dca149c7af6b?d=identicon
to your profile. All other websites using gravatars will associate the same url to your profile, because the computation of
md5sum ( username@mailprovider.com )
will always yield the same result.
Now if you only want to use it to stop people smurfing you could salt
md5 ( 'my site salt' + 'username@mailprovider.com' ) instead of just md5 ( 'my site salt' + 'username@mailprovider.com' )
You could give people an option to not salt too, if they want to be recognisable across sites.
Of course you'd have to verify the email address for this to work. Or you could md5 ( 'salt'+'password') (or even 'salt' + 'IP address' ) instead and not have accounts at all. Mind you then you'd just have a 4chan style tripcode. Still it does stop smurfing.
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
You can never give bad crypto advice. If it's insecure you've put a vulnerability in the wild that people might pick up on and then you can sell it to the Ukrainian mafia. If it's secure you get a reputation for giving good advice which leads for more opportunities for slipping in the odd insecure advice. And then it's time to cash in with the Ukrainians.
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
So why bother searching? Since the only reason Gravatar obscures email addresses is to stop spammers, the spammers can just send email to all addresses that correspond to [common user name]@[common domain]. In fact, that's exactly what they do. There's no need to waste time and money breaking MD5 hashes.
He who lights his taper at mine, receives light without darkening me.
Actually, it's not infeasible. If you own a botnet with 100,000 machines at your disposal, you could set them to cranking through these hashes. If they could crank through at your estimated speeds (which are generous given that most infected machines are likely to be slower, but it still gets the point across) they'd crack it in less than a day. Even if the problem was two orders of magnitude harder than you suggest, it's still doable in about two months. You are still correct with your tightly qualified statement "every alphanumeric email-address under 12 characters" is infact much too large a keyspace to reasonably cover overnight with a "very simple script", but the margin of comfort is pretty darn thin.
And if that is too hard, they can simplify the problem by reducing the search space. 12 alphanumeric characters is an arbitrary limit, and if they want to scan them all with their botnet in under a day, they'll just set a timer on the loop and be happy with the output they get -- it may only be all 10 character names instead of 11 or 12, but that's still sufficient for their evil purposes.
The worse news is that the guys with 100,000 bots in their net are the exact same spammers who have a business motive to come up with email addresses.
Problems of scale are more complex to analyze than they first appear. The bad guys have resources beyond what you might picture, and unlike cryptography where there is only one correct key, they will derive value from partial results.
John
Wull, anyway
No thanks to kdawson who perpetrated this and alerted every spammer with a slashdot tab open that they could start harvesting email addresses there and how to do it.
DORK! No pie for you!
*Repent!Quit Your Job!Slack Off!The World Ends Tomorrow and You May Die!
The salt can be user and website dependent (4 bytes user/4 bytes website for an 8 byte salt). Although I think that the added complexity won't be welcomed by the website owners
>> An email address isn't private... I suspect that MD5 was just a convenient way to get a fixed length id. I'd be more worried about collisions, but i'm too lazy to calculate how many avatars would be required before that might become a problem.
2^128^.5 = 2^64
Phew, I'll have to take a break after that one.
-- I was raised on the command line, bitch
That's why I use a new hotmail address usually made with the sites name and my own to keep logs of everything that comes from there, so if anything is compromised, then I know usually where it comes from. Also I have no worries someone gets my address as it is irrelevant seeing as it is not my real one.
I was thinking of modding you up but instead I'll expand on your explanation: This is not using MD5 collisions or "reversing" the hash in any way. Using a better hashing algorithm would only slow this attack slightly. All TFA is saying is that if you have a list of potential email addresses that might belong to the user, you can hash them and find out if one of them is correct (NO DUH).
You can generate a list of potential emails using usernames, so you could run through GameboyRMH@aol, gmail (got me!), hotmail, etc, and compare hashes until you find a match. 10% of Gravatar users have an email address of the format username@popularemailprovider.com. The attack this guy used depends on the user's email being of this format.
"When information is power, privacy is freedom" - Jah-Wren Ryel
The point of TFA is that one can identify the user's email address from the hash. The question is, why hash the email address. It could be just as easy to hash an integer value unique for the user. Hell, it could even be an incremental. Who cares if somebody can identify that joe@nothing.com has a Gravatar ID of 123. That ID can't be traced back to any specific Gravatar account, as the link between ID and email address would be internal.
But they already have that. They know that korin43 likes computers (and using other websites they could find my name). Now they try every combination of korin43 + common domain name and check md5's to see if it's the right one, but why would they bother? They could skip the md5 step and just send emails. Worst case scenario: They send [number of domains - 1] extra emails (basically free so who cares).
Sounds like the problem is the authentication system, not the gravatar service. I bet you could crack 90% of these by just trying [username]@gmail.com, yahoo.com and hotmail.com.
The important part of the trick is that you have to assume the email address is the same as the username and then compare the hashes of that name @yahoo.com, @hotmail.com, @gmail.com, and other popular email services. Because people that use those webmail addresses have never received spam before.
If any spammer did try this, I would expect them to be very pissed off to discover that after all that work they already had 99% or more of those addresses to begin with.
If your email address is common-word@famousprovider.com, then the spammers have already put your email address into their lists. Why not? They don't care if 95% of the mail they send bounces, and they don't care if they target any specific person, the "hit" rate they need to make a profit is is negligible. I see spam attempts to thousands of never-existed addresses on my colo, and my home domain is pretty damn obscure. I'm sure Gmail gets hits from aaron.aardvark through zephram.zymurgy continually.
2^128^.5 = 2^64
Phew, I'll have to take a break after that one.
That's an approximation for a 50% probability though... right? I'd be more inclined to think that anything over a one-in-a-million probability of a collision is unacceptably high, as a collision would break things, but that's what i'm too lazy to figure out. It's not purely a maths problem.
How is that going to work? If each site uses a different salt, it will produce a different hash for the same email, thereby defeating the whole purpose of Gravatar.
Yes. For a birthday attack, math says you need about one square root of the number of items to get a collision.
2^64 is a lot of items, which is why hashing is still useful.
But back to TFA, these items should be salted with a secret salt to make the data unusable to outsiders.
eg: md5('mypass'+$youremail) = useless information to hackers
-- I was raised on the command line, bitch