Zebras Get Less Spam Than Aardvarks
MojoKid writes "A recent study (PDF) by Richard Clayton at Cambridge University determined that the first letter of a someone's email address directly affects how much spam they receive. As shown in the graph at either link above, email addresses with numbers as their first characters receive even fewer spam emails. The corpus used in the study was 8 weeks' worth of email from the UK ISP Demon Internet, just over half a billion messages, of which 56% was deemed to be spam."
Spammers will now alter their programs to start with "z" and numbers, so they can get the people who aren't as desensitized by spam.
We had an ex-city manager, who had a son named Zachary Z. Zoul.
Zebra's and aardvarks don't eat Spam. Or ham.
The conclusion is ridiculous. There's more spam for addresses starting with 'a' than with 'z' because there is more traffic to those addresses. See the the graph. The line in the graph is the only solid piece of information, and it is just a lot of noise around the mean value of 56%; if anything, it indicates the opposite conclusion.
The state you are in while your HEAD is detached... - wait, what?
I thought most in the know see a far higher percentage, my ISP records over 95%:
Xs4all statistics
Makes me wonder about the rest.
"The likes of Facebook and WhatsApp are free to those whose privacy is of zero value."
I know nobody actually bothered to read it, but from the graph it looks like there are much more email addresses starting with an "a" than with a "z". The former get about as much spam as legit emails, while the latter get about 2 or 3 times more spam than legit emails.
this post contain no useful information, no need to mod it down
and yes they get tons of spam, about 99.999% of connection attempts are spam, but a couple of RBLs and Spam Assassin takes care of it. If I turn the protection off, then I get about 10,000 spams per hour, which seems to be a limitation of the server. If the server was faster, then it would probably get more spam. With the filters on, I get about 1 message per hour, which is more acceptable. I don't like the idea of RBLs, but I see no other way to handle the problem - if you are a spammer, then I don't want to talk to you - ever. Stupid idiots. It is also interesting that all brute force attacks that I have observed start at 'a'. So the best passwords will start with 'z'.
Excuse me, but please get off my Pennisetum Clandestinum, eh!
I'd guess that addresses with numbers at the beginning are often invalid, so they don't bother with them. I get spam attempts addressed to message IDs, which are generally something like 238947529345user@example.com.
But in the real world there is no such thing as perfection. It is a philosophical construct.
Deleted
Indeed, the PDF paper says this is measuring the rate of filtering AFTER using Spaumhaus black holes, and the measured rate is their custom "Cloudmark" spam detection tool. Importantly, if their tool sucks enough that people opt out of it entirely, all email is considered "not-spam". But as long as these effects are not influenced by the first letter, that's okay.
Unfortunately, the paper tries very hard to present a very silly notion about 'a' versus 'z'. The important concept here isn't order, it's letter frequency, and they should have sorted the letters by that to plot their regression.
Effectively spam is a combination of email harvesting and email guessing. Harvesting email addreses contributes to spam, but probably builds lists closely resembling the distribution of valid inboxes. Guessing attacks generally do not reflect the distribution of letters used in the English language (the language of the ISP's host nation, and presumably most of the users and domains hosted). The assumption isn't that these attacks stop before they make it to Z, but that they overweight z*@example.com. So more spam is sent to those addresses per valid inbox than more common letters. And the paper goes on to say a lot of those land in nonexistant mailboxes relative to more populated leading inbox letters.
They go on to try to quantify the difference but seem to fail for various reasons, including the aforementioned spamhaus.
I Browse at +4 Flamebait
Open Source Sysadmin
From looking at that graph; it would be more interesting to see the signal to noise ratio for each of the letters and numbers. Those names beginning with an 'A' do indeed receive more spam, but also far more non-spam. In fact it looks to be more like 50:51 (non-spam : spam), whereas from first glance those email addresses beginning with a 'P' receive 40:60.
Spammers work from lists of email addresses, and those email addresses are typically sorted by domain and then alphabetically. So, the receiving domain gets a rush of emails for users with addresses beginning with A, B, C etc. But usually (at some point) many mail systems will detect that there is a spam attack in progress and they will block subsequent messages of the same format or from the attacking IP address (depending on the spam filtering setup in place).
So, but simply the people beginning with "A" get nice new spam that the adaptive filters don't detect. By the time it gets to "Z" a good filter will automatically block the attack.
What's sad is that I watch spam attacks often enough to know this.
Never email donotemail@WeAreSpammers.com
I think most of the spam targeted at a message ID comes from crawling USENET.
On my server, I see lots of e-mail with a "rcpt to:" that matches the regex "(mpg\.)?[a-f0-9]+\@news\.domain\.com". This is the format that inn uses to create message IDs.
I just created the e-mail address zh80lukgwggok4kko0kcbrhjm@hotmail.com (yes, seriously)
Now, let's see if that holds true.
Zebras already have big penises!
FRA: STFU GTFO