Slashdot Mirror


Kevin Rose Load Tests Gmail

SishGupta writes "Load Testing Gmail - fillmybox@gmail.com A few weeks ago, Kevin Rose of the The Screen Savers decided to load test Google's new email service, Gmail. He asked everyone to email him their favourite 5MB attachments to 'fillmybox@gmail.com.' The test Gmail account is now 102% maxed out. You can read about the test and the results at Kevin Rose.com (his weblog)."

19 of 366 comments (clear)

  1. Just in case the page gets slashdotted by Anonymous Coward · · Score: 0, Interesting

    Posting anonymously to avoid karma whoring...

    A few weeks ago, you may remember that we decided to load test Google's new email service, Gmail. I called on all of you to email me your favorite 5MB attachments to "fillmybox@gmail.com". Well, we did it! My Gmail account is now 102% maxed out.

    Here are the results of the test:

    July 12th: Created account and promoted it on The Screen Savers. Within 5 minutes, Gmail processed over 300 mail messages (most with 5+ MB attachments). 10 minutes into the test, I started receiving various internal server error messages and was no longer able to login. Proceeded to login with other Gmail accounts to ensure this was not a site wide problem. All other accounts worked fine.

    July 13-17th: Still unable to login. At this point, I'm feeling I must have triggered some type of internal fraud flag that suspended my account. I have received 2,000+ emails (to my personal non-Gmail address) from viewers who have received bounced mail when attempting to message the Gmail account. So how many emails where sent? No way to tell for sure, but considering that our network is in 50+ million homes, that I plugged it twice, and I received over 2,000 complaints from people who actually took the time to dig around and find my personal email address, I'm thinking we hit Gmail with around 50-75,000+, 5MB+ emails in a 10-15min window.

    July 26th: Finally able to log back in, and the account is full! It took 686 email messages that were received on July 12th-15th. For some reason Gmail allowed me to go past my 1000MB limit to 1023MB (1GB=1024MB). The account can no longer receive mail messages (they are bounced) or send outgoing mail. The strange thing is that Google doesn't send any type of warning emails once you are nearing or reach your limit - but I'm sure this will be added in the final release.

    The only odd thing about the entire test is the login issue. It's strange to me that all of my other accounts worked fine, but for some reason the test account was frozen. [THEORY] I have a feeling this had to do with mail message processing. Google knows that 80% of mail messages are text, and we all know that text is highly compressible. That said, they probably only have around 2-300MB of storage allocated for each 1GB account (obviously this will fluctuate up to 1GB depending on the user's mail content). My take on this, is that they have a huge series of RAID arrays at their server farm. Every time an email comes in, it is compressed and stored in that users account on the RAID. Upon account login the user's data is decompressed, indexed, and moved to some type of RAM/solid state drive for quick access. Once this is complete, it is then displayed to the end user (a 1-2 second process). I have a feeling that due to the huge number of incoming emails, I had some type of processing time out occur. The server was trying to write data to my library (thousands of new msgs), while retrieve and displaying data - resulting in errors. A few days after things settled down (and a few cron jobs later), Gmail was able to sort things out [END THEORY]. At some point I will talk to my buddy over at Google and have him find out how they really do it.

    Anyhow.. Here are a few pics of the maxed out account:

    - 1023MB
    - Outgoing Error Message

    Sorry for the rambling and poor post structure, it's 2:35am and I can hardly keep my eyes open, I'm off to bed. night.

    +krose
    -- sign up for my newsletter to find out what I'm up to --

  2. And your point is ? by IanBevan · · Score: 4, Interesting

    Mod me down as flamebait if you really must, but what really was the point of this exercise ? I'm sure Google would find it an interesting test -assuming they've not already tried it themselves - but as the author says, he's never actually told anybody at google about it. It just doesn't strike me as particularly constructive...

    1. Re:And your point is ? by TheLink · · Score: 2, Interesting

      "Kevin Rose is a wannabe hacker (or cracker, whatever term you want to use) who tries to portray himself as a technology guru. "

      Sure he's not that "1337" but you're probably just jealous - he has a girlfriend and you don't, he actually gets email from nonspammers, etc etc.

      It's a US TV show, what are you expecting d'oh? As is it's already too "1337" for the TV bosses that they're dumbing down TechTV into another channel.

      --
  3. Next step, try the spam filters by sssmashy · · Score: 4, Interesting

    I received over 2,000 complaints from people who actually took the time to dig around and find my personal email address, I'm thinking we hit Gmail with around 50-75,000+, 5MB+ emails in a 10-15min window.

    Think of all the spam that one of these accounts could hold. I propose testing Gmail's spam filters next: disseminate your Gmail addy to porn sites, and everywhere else it will likely be harvested by a spam bot. Sit back, and let the spam roll in. It should be interesting to see just how fast this sucker fills up with ads for penis enlargers.

    1. Re:Next step, try the spam filters by Lshmael · · Score: 2, Interesting
      See Gmail Spam Filter Testing or Spam My Gmail Account (prattboy@gmail.com)

      Not surprisingly, Pratt's account maxed out at 102% or 1023 MB. Unlike Kevin Rose, Pratt's account filled up two months ago. Rose's test, however, was not about filling his account:
      [THEORY] I have a feeling this had to do with mail message processing. Google knows that 80% of mail messages are text, and we all know that text is highly compressible. That said, they probably only have around 2-300MB of storage allocated for each 1GB account (obviously this will fluctuate up to 1GB depending on the user's mail content). My take on this, is that they have a huge series of RAID arrays at their server farm. Every time an email comes in, it is compressed and stored in that users account on the RAID. Upon account login the user's data is decompressed, indexed, and moved to some type of RAM/solid state drive for quick access. Once this is complete, it is then displayed to the end user (a 1-2 second process). I have a feeling that due to the huge number of incoming emails, I had some type of processing time out occur. The server was trying to write data to my library (thousands of new msgs), while retrieve and displaying data - resulting in errors. A few days after things settled down (and a few cron jobs later), Gmail was able to sort things out [END THEORY].
    2. Re:Next step, try the spam filters by Motherfucking+Shit · · Score: 4, Interesting
      Think of all the spam that one of these accounts could hold.
      I set up a Gmail account just over a month ago (on June 23rd). After I used it for a couple of test messages with friends, I set up a few of my most spammed email accounts to forward to Gmail. As of now, I have 67497 spam messages, using 360 MB (36%) of my 1000 MB.

      Gmail has gotten better at catching spam on its own, but it's not great yet. I use SpamAssassin and score anything over 6.1 as spam. Gmail sends stuff with scores as high as 8 straight to my inbox. Granted, it's easy to set up a system that works for me; it's hard to set up a system that works for everyone.

      One thing I've found really interesting is the ability to instantly search through 67,000+ spams! It's amazing how prolific the "random words to defeat Bayesian filters" spam tactic has gone. Just about every word I've tried appears somewhere within the contents of 67,000 spams...

      Search results for: in:anywhere anthropomorphic 1 - 20 of about 80

      Search results for: in:anywhere antagonistic 1 - 20 of about 150

      Search results for: in:anywhere necromancy 1 - 20 of 61

      Search results for: in:anywhere juxtaposition 1 - 20 of 58

      Search results for: in:anywhere loquacious 1 - 20 of 51

      It's crazy. I wasted a few minutes last week searching through my Gmail spam archive trying to find a word that didn't appear anywhere, and came up with very few successes. If nothing else, Gmail is probably the world's biggest and most accurate archive of spam.
      --
      "BSD: Free as in speech. Linux: Free as in beer. Windows 10: Free as in herpes." --Man On Pink Corner in #52607549.
  4. conspiracy theories by F2F · · Score: 2, Interesting

    we all want to know how google does it, don't we?

    here's what he thinks:

    Google knows that 80% of mail messages are text, and we all know that text is highly compressible. That said, they probably only have around 2-300MB of storage allocated for each 1GB account (obviously this will fluctuate up to 1GB depending on the user's mail content). My take on this, is that they have a huge series of RAID arrays at their server farm. Every time an email comes in, it is compressed and stored in that users account on the RAID.

    this should be closer to the truth: Venti: a new approach to archival storage

  5. Re:Whoah by Kris_J · · Score: 4, Interesting
    I can tell you that the theory is realistic, having run several compressed filesystems and generally having an interest in (transparent) compression, but I can't say if it's correct. It sounds a little wrong as it's fairly easy to say "no, don't try to compress files with extension X because it won't work". More likely Gmail choked on the decoding of attachments -- as you wouldn't store them in a wasteful 7-bit format.

    I hammered my own gmail account by forwarding up all my old messages using an Eudora filter. I was sending as many as 2,000 messages in a 15 minute period at one stage. While Gmail didn't lock me out, some messages took a particularly long time to appear. These messages were typically old automated receipts, such as eBay messages, that all look very similar but are in fact separate conversations. I'm guessing that there's a lot of overhead when a message arrives to determine if it's related to existing messages.

  6. I've been doing my own load testing. by Steamhead · · Score: 2, Interesting

    I signed up my GMail account to every Apple mailing list, mainly because I am a developer and want a searchable archive of exactly the mailing lists i want.

  7. Re:false advertising, and email wars by lakeland · · Score: 4, Interesting

    Nobody can read 1GB of text. Therefore the only way to use a gig of email is if either a) it isn't text, or b) you're not actually reading it.

    For instance people getting MPEGs in the mail won't notice the difference between 1000MB and 1024MB. Similarly, people subscribed to a dozen mailinglists, hoping to use google to quickly find any message, won't notice the difference since a few days email will fill up the difference.

    To make it really clear... say you can read 100 text emails a day. Now, if those emails are text they'll be about 5k, or around half a meg a day. So you're talking about six _years_ worth of email before you fill your box, with the extra 24MB getting you an extra month on your six years. For people getting ten text emails a day worth keeping, 1GB will probably hold enough email for life.

  8. Re:1GB = 1024MB so... by drinkypoo · · Score: 3, Interesting
    The simple fact is that mega means million, and giga means billion. Giga simply does not mean 2^30. Hence, it does not make any sense whatsoever to call 2^30 bytes a gigabyte, any more than it makes sense to call 2^10 bytes a megabyte.

    Why is it so hard for people to admit when they are doing something stupid and correct it? The idea that we should continue doing something simply because it is entrenched is folly at best and is better described as arrogant. I find the idea that we should do something simply because it is the way it has always been done to be absolutely horrific.

    Apparently the moderatorship agrees with you that I am wrong, because they have moderated my comment as flamebait, in spite of an utter lack of intent to flame. I simply want words to have as few meanings as possible. The english language, made up as it is of smatterings of all different languages, is complicated enough without me having to now consider all different possible meanings for a technical lexicon as well, while at the same time trying to retain knowledge of assorted programming languages, operating system commands, and so on. Hence, I attempt to do my part against entropy - it can never be stamped out, but it can be minimized on a local scale.

    Now you probably think (more than before) that I am a wanker. However, we are faced with incontrovertible proof that you are a coward, and as such I will not allow your opinion to bother me more than is required to write this comment.

    I state my assertion once more: mega means one million, 10^6. To try to use the prefix to mean something else, besides the idea of something very large which makes sense given that it is from a greek word meaning "great", is a mistake. To make it try to mean 2^10 is sheer folly. Continuing in this vein, we see that giga means one billion (10^9) and is descended from greek gigas, or "giant", and that tera means 10^12 (one trillion) and is descended from the greek word for monster but none of these words (or parts of words) has a meaning that has anything to do with powers of two. To take a word with a specific meaning, and to assign it a similar but different meaning can not be anything but ridiculous. To attempt to correct such an error might be impossible, but it won't stop me from trying to do what I see as essentially the right and more importantly logical thing. Computers are tools of logic - why encumber the very language with which we describe them with illogic?

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  9. Re:1GB = 1024MB so... by Anonymous Coward · · Score: 1, Interesting

    Drives are advertised in Fake Gigabytes. To convert FGB (Fake Gigabyte) to GB, multiply by the Fake Gigabyte Constant equal to 109/230, or approximately 0.93. For example, a HD claiming to have 40 GB actually has 40 FGB, or 40 * 0.93 = 37.2 GB.

  10. Re:gmail invite? by rohan_leader · · Score: 1, Interesting

    good point, i've just done the research :)

    In that case, you can send it to my obscure isp address at hungyao@telus.net. Hopefully, they haven't done any funny business with blocking invites.

    Thanks :)

  11. anything but rediculous by fireboy1919 · · Score: 1, Interesting

    Yeah...let me further your line of reasoning and say that there is no reason that we should have switched away from using base 1 for everything, since it's the first way, and every other approach is merely a redundant way to write it, and therefore part of the plot to lead to the eventual heat-death of the universe.

    Computers use powers of two for every kind of calculation. The most important reason of all to do the measurements this way is because it's easier. It wasn't random, wasn't folly, wasn't totally rediculous.

    The redefinition allows us to use the units in a calculable way. It also makes a kind of sense to redefine mega, giga, and tera in terms of base 2 because a byte is a base 2 unit. Why not just go all out when you're using them and make everything else base 2 as well? You may not that bits, which come out to a nice, round number in every number base, are measured in base 10. A terabit is 10^12 bits.

    It's too much to ask that a microcontroller that reports usage have half of it's hardware devoted to base conversion, especially when the result may come out to some terrible fraction. To use your statement, I find that just because people are using mega, giga, and tera with the original meanings just because they're entrenched is folly at best and is better regarded as arrogant.

    --
    Mod me down and I will become more powerful than you can possibly imagine!
    1. Re:anything but rediculous by Detritus · · Score: 2, Interesting
      Computers use powers of two for every kind of calculation.

      Except when they don't.

      I've seen base 3, 10, 40, 100, 256.

      --
      Mea navis aericumbens anguillis abundat
  12. Re:Woops! by shaitand · · Score: 2, Interesting

    Drinky, your still not fully corrected. While those notations do exist, they serve little purpose and aren't likely to ever be widely adopted.

    I can write a new notation on a napkin or a webpage, that doesn't make it correct. Only widespread acceptance can do that. And outside of a few engaging in debates regarding the subject on slashdot, nobody has accepted it.

    Personally I might be more inclined it they reversed the standard *B's and *iB's. Since the correct value for *B's was always powers of 2, the marketing guys should be the ones forced to move to *iB.

    Do that and I'll accept the change... since it's the tech industry that has to adopt it to make it stick, not the marketing industry, they need to cater to making US not have to change anything.

    If they do it that way, we change nothing we do, and completely ignore the rest of the notation (since there is no use for any of it outside marketing) and carry on blissfully wife our lives.

  13. stress testing conversations by adpowers · · Score: 4, Interesting

    I've done a little bit of my own stress testing. However, I've done it a little bit different. I wanted to see how Gmail handled huge conversations. I e-mailed my brother and we spammed each other back and forth in the same thread, seeing if we could influence the Ads. After a while we started adding more people to the conversation (our current test thread has nine people). We started out by hitting Reply All and saving the quotes from the previous e-mail. It became a huge list of >>> near the bottom and eventually Gmail clipped the messages. After a few hundred replies, opening the thread became slower and slower. When it reached 426 replies, it took me a week to finally get into it. With that I made one last reply and closed the thread. Hey, just out of curiosity, I opened the thread now and it loaded pretty easily. I wonder if they have optimized their behind the scenes engine to make it faster for large conversations. Maybe I'll continue the thread. Also, if you want to be part of the new test thread, just send an e-mail to adpowers@gmail.com.

    Anyway, here is my Gmail stress test.

    Also, you'll notice I have a few mailing lists on the side. I only read the Freenet one, but I subscribed to the Linux Kernel list and some others because I know them to be high traffic. Gmail is pretty impressive and they seem to be optimizing it even more.

  14. DDoS by losttoy · · Score: 1, Interesting

    The guy ran the *test* without informing Gmail people and his *promotion* resulted in Gmail receiving 70-80K 5MB mails in 10-15 minutes. Isn't that tantamount to a DDoS? Shouldn't the guy be booked for something?

    PS - I RTFA

  15. Re:Old news by hattmoward · · Score: 2, Interesting

    I know what you mean; I never even liked The Screen Savers! For a while, about 350 TotalFark members were running a Music Ring, where everyone was required to use a GMail account. We moved about 500-600 MB of mail into 350 mailboxes each day. I never saw a hiccup speed-wise when downloading attachments, that is, whatever connection I was on would max out. We did run against some GMail-imposed limitations that are intended to control load: First, outgoing messages are limited to 100 recipients, beyond that point, they are silently removed. Second, GMail limits a single client to a handful (3 or 4) of simultaneous connections at a time. (For example, if you have 3 or 4 attachment downloads running, any new connections will block until another connection closes or the timeout is reached.) Third, and this is actually documented, but not well, GMail limits single messages to 10MB in size. What they don't explain, and what I had to explain repeatedly, is that you actually have a limit for attachments of 7.5MB, because of Base64 encoding for MIME.

    Oh well, this is Slashdot; what do you expect? =)