Slashdot Mirror


Eisenstadt's Analysis Of 8 Years' Worth Of Email

Hylton writes "Thought this might be of interest: Marc Eisenstadt's saved every email he's gotten over the past eight years, including spam, and run an analysis of it."

14 of 230 comments (clear)

  1. Apparently the analysis is still running by Anonymous Coward · · Score: 5, Funny

    on their webserver.

  2. Spam by thinkliberty · · Score: 5, Funny

    I have received more spam in the past week than I have legitimate email in the past 10 years.

    1. Re:Spam by BosstonesOwn · · Score: 5, Funny
      I have analyzed it all and apparently the people sending me these spam messages know my plight.

      I need a bigger penis

      I need teen sluts who suck **** on webcams

      And apparently I shouldn't be telling anyone this but this nice man in nigeria , who is the lawyer in charge of my long lost grand father mutambi wikimbo is trying to get me $5 million american dollars but I have to pay a tax of $5 thousand american dollars to nigeria and he will gladly handle it for me.What a swell guy.

      --
      This package Does Not Contain a Winner
  3. Remembering when.. by Bite-lover · · Score: 5, Funny

    Must be nice to be able to look back on porn-spam and feel old. 'Hot XXX - Newcomer Jenna!'

    --
    Bite me. Seriously, I enjoy it.
  4. Article Text by Anonymous Coward · · Score: 5, Informative

    February 11, 2005
    Eight years of email stats, pass 1Email This EntryPrint This Entry
    Posted by Marc Eisenstadt

    What's the reality behind the 'email overload' talk? Let's look at some numbers... personal numbers.

    To kick things off, I've got a huge email archive. I started emailing in the early ArpaNet days, around 1972, and haven't stopped since. My archive has been extremely thorough for at least the past 12 years (and, in case you think I'm nuts for keeping all of these, my actual regret from a scientific/archive perspective is that I don't have the earlier ones too!). Why? Let's just say that one day I planned to do an analysis of it all... types of mails, social networks, the whole works. But things got a little out of hand.... (anyone lookin' for some data, give me a shout... but first read on)...

    Most of this 'storage mania' was triggered by a casual comment in around 1992 or 1993 by Ron Baecker, of the University of Toronto, a longtime research colleague and acquaintance and someone whose work I have long admired and respected. Ron asked me, "given ultra-cheap storage and ultra-fast search, both clearly on their way, why would you ever need either to delete or indeed to accurately file/categorize your emails?"

    OK, so as a little personal experiment, I decided to keep 'em, and to see what happened. The quick story is that migrating across machines, operating systems, and preferred email clients, plus being a bit cavalier about the whole thing, has meant that although all the emails are 'there' in various archive files, it takes a little work to get 'em all back in a harmonious form, that is with all headers intact and no duplicates (the main formats are Vax mails, Unix mails, Mac Eudora, PC Eudora, Outlook Express, and Outlook).

    The longer story, with some data and preliminary analysis, begins like this:

    Even though I haven't had the time or motivation thus far to put in the harmonization work required to get all the data in one format and with duplicates eliminated, I nevertheless thought that a little 'first pass' set of totals (with my estimate of their accuracy) would be interesting, and maybe even provide a little coarse empirical support for Stowe's "Just Say No To Email" campaign.

    So I quickly eyeballed-and-tallied the most coherent of the archives, spanning eight years of emails, from January 1st 1997 to December 31st 2004. The totals are real enough, but the 'eyeballing' was needed to assess the approximate propotion of spam and duplication involved in the emails. A more detailed analysis later will enable me to do these more accurately. I've indicated my estimate of the margin for error in the third column, and my estimate for the percentage of spam received (and I mean real spam: i.e. either 'greedily-lookin-for-suckers' or 'low-down-mean-and-nasty spam', not conference announcements - you know what I'm talkin' about). For 2003, this number is precise, because I filtered off such spam using SpamAssassin, and counted them! 2004 spam numbers are an extrapolation, but the totals are accurate, as explained below. Here goes:

    TABLE 1: Eisenstadt's 1997-2004 email totals
    Year

    Emails received Est. Error Est. Spam

    1997 4320 20% 2%
    1998 3996 20% 3%
    1999 6821 10% 5%
    2000 7580 5% 6%
    2001 6125 5% 7%
    2002 6497 5% 10%
    2003 13092 1% 37.6%
    2004 13889 1% 40%

    2003 is the most accurate, because (unlike earlier years when I was changing clients and machines) I have all emails in one clean format and all spam preserved, auto-filtered by SpamAssassin into a folder that I look at only a few times a year, scanning rapidly for false rejections. Incidentally, that falsely rejected email rate appears to be roughly 1 in 5000: good enough for me! By 2004, although I kept all emails, I got fed up keeping the spam even for analysis purposes, and can't even be bothered to scan it, so stuff auto-filtered by SpamAssassin is now deleted without my looking at it - so the column 4 '40% spam' in the lower

  5. Einstein? by kristopher · · Score: 5, Funny

    Don't misread like I did. I was like, what the hell was Einstein doing with email..

  6. Re:Indeed by iced_773 · · Score: 5, Informative

    I should point out that you shouldn't respond to spam under ANY circumstances - it just verifies to the spammer that your address exists.

  7. Re:Link seems to be down... by Vario · · Score: 5, Informative

    This is the google cache linked with slashcode: http://64.233.183.104/search?q=cache:GshwWambHvEJ: www.corante.com/getreal/archives/2005/02/11/eight_ years_of_email_stats_pass_1.php

    It still tries to access the original site, so it rather slow but you can read the article.

  8. GMail by GNUALMAFUERTE · · Score: 5, Interesting

    This is pretty interesting (sadly i can't access TFA)
    Google should have such a program, there should be a preference in you GMail account, where you can allow /deny google to take stats out of your email. Many interesting information can be collected, like, for example, Ammount of SPAM / Legitim E-mail, % of each kind of spam (viagra, drugs, porn, etc), spam by countrys, % of Text / HTML email, and even other interesting stats not e-mail related, for example, language analisys, frequent mispells, toppics of interest by age, etc,etc,etc. I Would gladly allow google to make such stats, it can be done in such a way that no personal / sensitive information would be leaked.

    (Thinks about what has just said, and puts tinfoil hat on)

    ALMAFUERTE

    --
    WTF am I doing replying to an AC at 5 A.M on a Friday night?
  9. Raymond Chen's Analysis... by ticklish2day · · Score: 5, Interesting

    Microsoftie Chen's analysis, slashdotted a while ago, has pictures too!

  10. If you think this article is about spam, read end by linuxbaby · · Score: 5, Interesting

    If you think this article is about spam, make sure you read it all the way to the end. It's not.

    He's questioning the entire technology of email as an effective way of communicating.

    Analyzes not just the spam-count in his email, but the work-time needed to respond to the non-spam emails, too.

    This is one of the most thought-provoking articles posted on Slashdot in a long time.

  11. Re:Back in the old days... by Ian+Action · · Score: 5, Funny
    I'm sorry you're so upset...

    So, would you like to buy some ink cartidges?

    --
    Why am I not rapping? I am rapping with you in a way.
  12. Re:Femto's Law of Email by dvdeug · · Score: 5, Insightful

    Given enough time, nearly every email becomes irrelevant.

    Given enough time, nearly everything becomes irrelevant. That job resume you're writing up now is going to be pretty irrelevant in 3 years; but that doesn't mean you can ignore it now.

  13. Own domain offers new methods by 4Lancer.net · · Score: 5, Insightful

    Having your own domain offers a neat way of tracking where spam comes from. For example, if you see the email I use here, I will know any spam that comes from someone getting my address from here. Of course, /. isn't the best example. Say I sign up at a website, misfitriprapper.com. I will use misfitriprapper.com as the username before the @4la... I use this method EVERYWHERE. I just sent an email last night to Epson support. My email address? epson.com@4la... We've all learned years ago to not trust anybody, so, I don't even trust the big companies like Epson.

    --
    All your searching needs (and free money!) - 4Lancer.net