Zebras Get Less Spam Than Aardvarks

← Back to Stories (view on slashdot.org)

Zebras Get Less Spam Than Aardvarks

Posted by kdawson on Sunday August 31, 2008 @07:18AM from the so-do-johns-and-smiths dept.

MojoKid writes "A recent study (PDF) by Richard Clayton at Cambridge University determined that the first letter of a someone's email address directly affects how much spam they receive. As shown in the graph at either link above, email addresses with numbers as their first characters receive even fewer spam emails. The corpus used in the study was 8 weeks' worth of email from the UK ISP Demon Internet, just over half a billion messages, of which 56% was deemed to be spam."

28 of 115 comments (clear)

Min score:

Reason:

Sort:

You know what this means by Shajenko42 · 2008-08-31 07:20 · Score: 5, Insightful

Spammers will now alter their programs to start with "z" and numbers, so they can get the people who aren't as desensitized by spam.
1. Re:You know what this means by Anonymous Coward · 2008-08-31 07:31 · Score: 2, Funny
  
  Hi. I note you don't publish your gmail address on /. Try that and then tell me about the spam you haven't seen.
2. Re:You know what this means by Chandon+Seldon · 2008-08-31 07:32 · Score: 2, Informative
  
  Gmail isn't perfect at filtering spam. I've received 35,214 spam messages in the last month. I estimate that Gmail failed to filter around 100 of them.
  
  --
  -- The act of censorship is always worse than whatever is being censored. Always.
3. Re:You know what this means by E+IS+mC(Square) · 2008-08-31 07:55 · Score: 4, Informative
  
  Nothing is perfect when it comes to this. But they are the best among all 'free' email providers I have used - by miles. Now get in and flag them as spam - next time, you may receive fewer.
4. Re:You know what this means by cypherwise · 2008-08-31 08:37 · Score: 4, Insightful
  
  I'm (incorrectly?) assuming this comment was facetious. 100/35,214 (that's 99.71%) is a pretty damn good ratio when it comes to this type of thing.
5. Re:You know what this means by AngryLlama · 2008-08-31 09:05 · Score: 5, Funny
  
  Why would spammers look for email addresses in their own working directory (./)? I guess I am just not up-to-date on my spamming techniques.
6. Re:You know what this means by lysergic.acid · 2008-08-31 09:52 · Score: 2, Insightful
  
  but, like the article says, there are fewer people whose e-mail addresses start with z or numbers. so they'd be getting fewer hits by targeting those starting characters. there's already more spam messages being targeted at "zebras" per legitimate target than there are spam messages being targeted at aardvark addresses.
  so the smart thing for spammers to do is to stop wasting time with zebra addresses, since they'd have a higher chance of actually reaching a real mailbox by targeting more popular character ranges.
7. Re:You know what this means by arth1 · 2008-08-31 10:47 · Score: 2, Interesting
  
  I think this might be a distant relative to Benford's law (the one that shows that about 30% of all counted numbers will start with the digit "1", not 10% as one might think).
  Going through some crack and john-the-ripper logs, I saw that there was a good correlation between the position in the alphabet not only for the passwords, but also for the user names.
  Based on pure letter frequency, you'd think that there would be a typical E-R-S-T-N ranking, but this doesn't appear to be the case for the initial letter. It appears to be far more often "a" than "e" or "s".
  (The letter "r" is special and overrepresented due to the "root" user ID not only being ubiquitous on Unix-like systems, but also being the prime target for crack and john.)
8. Re:You know what this means by uncqual · 2008-08-31 17:07 · Score: 2, Interesting
  
  I find gmail almost perfect at classifying spam as such.
  
  Unfortunately, gmail is my only mail account where I feel I have to scan the spam "folder" every week or so to look for false positives -- of which there are a couple a month. My other accounts, which receive more mail and more spam (both as a percentage and an absolute number), have given so few false positives that I don't bother looking in the spam folders on those accounts.
  
  So, unfortunately, I end up looking at all the spam on gmail and just a little of it on other accounts. I don't think this is a win.
  
  --
  Why is there an "insightful" mod and why isn't it "-1"? If I wanted insight, I wouldn't be reading /.
I bet this guy gets the least amount of spam by Anonymous Coward · 2008-08-31 07:32 · Score: 2, Interesting

We had an ex-city manager, who had a son named Zachary Z. Zoul.
1. Re:I bet this guy gets the least amount of spam by eln · 2008-08-31 08:21 · Score: 4, Funny
  
  Did the city manager get fired because every time anyone tried to talk to him about city management, he would say, "There is no city manager, only Zoul"?
  I'm so sorry.
2. Re:I bet this guy gets the least amount of spam by davidbrit2 · 2008-08-31 11:22 · Score: 2, Funny
  
  No, I think it was the sleeping above his covers thing that did it.
This is silly by knappe+duivel · 2008-08-31 07:32 · Score: 4, Funny

Zebra's and aardvarks don't eat Spam. Or ham.
What? by pablomme · 2008-08-31 07:34 · Score: 5, Informative

The conclusion is ridiculous. There's more spam for addresses starting with 'a' than with 'z' because there is more traffic to those addresses. See the the graph. The line in the graph is the only solid piece of information, and it is just a lot of noise around the mean value of 56%; if anything, it indicates the opposite conclusion.

--
The state you are in while your HEAD is detached... - wait, what?
1. Re:What? by Oidhche · 2008-08-31 07:52 · Score: 5, Insightful
  
  Indeed. The conclusion that I'd draw from presented data is that there are more e-mail addresses beginning with 'a' than with 'z' (and that very few addresses begin with a number). Even the percentage of spam is nearly meaningless. To find anything about which addresses receive more spam, you should look at the average amount of spam per e-mail address in a given group, not the total number of messages.
2. Re:What? by Oidhche · 2008-08-31 07:56 · Score: 3, Insightful
  
  No. Look at the data. It shows the total amount of messages received by Alberts and Zeds. It's painfully obvious that Alberts receive far more of both spam and genuine messages than Zeds. Not because the average Albert gets more messages than the average Zed, but because there are more Alberts than Zeds.
3. Re:What? by 4thAce · 2008-08-31 08:13 · Score: 2, Informative
  
  According to the PDF, this graph is for all email addresses, not for 'real' addresses, which they define, more or less, as those addresses which receive at least one non-spam email every other day. Since they are looking only at Demon's logs, not the contents of actual mailboxes, they have to use this heuristic to filter out the bogus combinations that the spammers are trying.
  If they impose the condition that only 'real' addresses are considered, the graph changes to one with a higher percentage spam for A addresses than for Z addresses, as asserted in the summary.
  
  --
  Inventor of the LOLbalrog meme.
Very little spam at demon.uk by Teun · 2008-08-31 07:36 · Score: 2, Interesting

56% percent deemed spam?
I thought most in the know see a far higher percentage, my ISP records over 95%:
Xs4all statistics
Makes me wonder about the rest.

--
"The likes of Facebook and WhatsApp are free to those whose privacy is of zero value."
The f*** article says otherwise by paulatz · 2008-08-31 07:54 · Score: 5, Informative

I know nobody actually bothered to read it, but from the graph it looks like there are much more email addresses starting with an "a" than with a "z". The former get about as much spam as legit emails, while the latter get about 2 or 3 times more spam than legit emails.

--
this post contain no useful information, no need to mod it down
My domains start with a by flyingfsck · 2008-08-31 08:07 · Score: 2, Insightful

and yes they get tons of spam, about 99.999% of connection attempts are spam, but a couple of RBLs and Spam Assassin takes care of it. If I turn the protection off, then I get about 10,000 spams per hour, which seems to be a limitation of the server. If the server was faster, then it would probably get more spam. With the filters on, I get about 1 message per hour, which is more acceptable. I don't like the idea of RBLs, but I see no other way to handle the problem - if you are a spammer, then I don't want to talk to you - ever. Stupid idiots. It is also interesting that all brute force attacks that I have observed start at 'a'. So the best passwords will start with 'z'.

--
Excuse me, but please get off my Pennisetum Clandestinum, eh!
Re:Unexpected by xaxa · 2008-08-31 08:16 · Score: 2, Interesting

I'd guess that addresses with numbers at the beginning are often invalid, so they don't bother with them. I get spam attempts addressed to message IDs, which are generally something like 238947529345user@example.com.
Sorry to break this to you. by Colin+Smith · 2008-08-31 08:17 · Score: 2, Informative

But in the real world there is no such thing as perfection. It is a philosophical construct.

--
Deleted
Re:Filters by xenocide2 · 2008-08-31 08:23 · Score: 2, Insightful

Indeed, the PDF paper says this is measuring the rate of filtering AFTER using Spaumhaus black holes, and the measured rate is their custom "Cloudmark" spam detection tool. Importantly, if their tool sucks enough that people opt out of it entirely, all email is considered "not-spam". But as long as these effects are not influenced by the first letter, that's okay.
Unfortunately, the paper tries very hard to present a very silly notion about 'a' versus 'z'. The important concept here isn't order, it's letter frequency, and they should have sorted the letters by that to plot their regression.
Effectively spam is a combination of email harvesting and email guessing. Harvesting email addreses contributes to spam, but probably builds lists closely resembling the distribution of valid inboxes. Guessing attacks generally do not reflect the distribution of letters used in the English language (the language of the ISP's host nation, and presumably most of the users and domains hosted). The assumption isn't that these attacks stop before they make it to Z, but that they overweight z*@example.com. So more spam is sent to those addresses per valid inbox than more common letters. And the paper goes on to say a lot of those land in nonexistant mailboxes relative to more populated leading inbox letters.
They go on to try to quantify the difference but seem to fail for various reasons, including the aforementioned spamhaus.

--
I Browse at +4 Flamebait
Open Source Sysadmin
Signal to Noise ratio by aembleton · 2008-08-31 09:10 · Score: 2, Insightful

From looking at that graph; it would be more interesting to see the signal to noise ratio for each of the letters and numbers. Those names beginning with an 'A' do indeed receive more spam, but also far more non-spam. In fact it looks to be more like 50:51 (non-spam : spam), whereas from first glance those email addresses beginning with a 'P' receive 40:60.
Yes, the beginning of the alphabet gets more spam. by Dynamoo · 2008-08-31 09:22 · Score: 4, Informative

Yes, the beginning of the alphabet gets more spam.. and it's really very simple to explain why.
Spammers work from lists of email addresses, and those email addresses are typically sorted by domain and then alphabetically. So, the receiving domain gets a rush of emails for users with addresses beginning with A, B, C etc. But usually (at some point) many mail systems will detect that there is a spam attack in progress and they will block subsequent messages of the same format or from the attacking IP address (depending on the spam filtering setup in place).
So, but simply the people beginning with "A" get nice new spam that the adaptive filters don't detect. By the time it gets to "Z" a good filter will automatically block the attack.
What's sad is that I watch spam attacks often enough to know this.

--
Never email donotemail@WeAreSpammers.com
Re:Unexpected by nabsltd · 2008-08-31 09:50 · Score: 2, Insightful

I think most of the spam targeted at a message ID comes from crawling USENET.
On my server, I see lots of e-mail with a "rcpt to:" that matches the regex "(mpg\.)?[a-f0-9]+\@news\.domain\.com". This is the format that inn uses to create message IDs.
Oh really? by Antony-Kyre · 2008-08-31 12:46 · Score: 2, Interesting

I just created the e-mail address zh80lukgwggok4kko0kcbrhjm@hotmail.com (yes, seriously)
Now, let's see if that holds true.
Even spammers know.. by The+Creator · 2008-08-31 12:53 · Score: 2, Funny

Zebras already have big penises!

--

FRA: STFU GTFO