Slashdot Mirror


Google Apps Gets a 99.9% Guarantee

David Gerard passes along a posting on Google's official blog announcing that they have extended the three-nines SLA for the Premier Edition of Google Apps from Gmail alone to also cover the Calendar, Docs, Sites, and Google Talk services. 99.9% uptime translates to 45 minutes a month of downtime, and the blog post puts this in context with Gmail's historical reliability, which has been between three and four times as good over the last year (10-15 min./mo.). It also claims, based on research by an outside group, that Gmail's historical reliability beats that of in-house hosted solutions such as Groupwise and Exchange, on average. Reader Ian Lamont adds an article in The Standard that digs down into the details of the SLA, revealing for instance that outages of less than 10 minutes aren't counted against the monthly 45 minutes.

37 of 155 comments (clear)

  1. Umm... by Sylos · · Score: 3, Insightful

    so if I have 60 1 minute downtimes, I'm keeping within the 99.9% uptime range? I call shenanigans.

    --
    'Number-memorizing Chinese people.'-Anon
    1. Re:Umm... by Creepy+Crawler · · Score: 4, Insightful

      Most likely it's the time for node crash detection and load balancing to take effect.

      If service is that bad or intermittent, nobody would buy service there.

      --
    2. Re:Umm... by ILongForDarkness · · Score: 4, Interesting
      Well if they cache the current session locally and it is just the connection to the back end that you lose temporarily I think it would be alright. Losing data sucks. That said who uses desktop suites without a crash? "Hopefully" (not sure if that is the right word to use when referring to an outage), they manage to have the downtime clumped together and planned in non-peak hours for the region (say upgrades done first Saturday of the month at midnight or something).

      My big concern with this type of offering is it increases a companies dependence on their internet line. If your network is down not only can't retrieve files, email or browse, you now can't work on productivity software either. Essentially if your doing a job that requires a computer in this environment you can't work whenever the internet or network has a hickup. I like having something else to do in the rare instances where the network isn't working right.

      Add to that the fact that wireless/laptops are becoming of larger importance in companies (and wireless is flaky at the best of times IMHO) you're really courting disaster not just in terms of outages but in terms of accidental data loss. Say your not so gifted technologically colleague decides to walk over to your desk with their laptop to show you the spreadsheet they've been working on. They get out of range of the router that they were using and presto session time out and the chance of data loss.

    3. Re:Umm... by cgenman · · Score: 2, Interesting

      Is that 99.9% uptime or 99.9% planned uptime? Many companies refer (rather facetiously) to *planned* uptime, which means that you can have unlimited downtime so long as it isn't unplanned.

    4. Re:Umm... by aaarrrgggh · · Score: 2, Informative

      The concept of "unplanned downtime" seems to originate in the banking world, where something as benign as daylight savings time could force you to take down the mainframe for two hours. It has unfortunately spread to other industries (healthcare records management pops up). The real question is if Google's application architecture requires planned downtime for the service as a whole or individual users.

      Based on their roots, I would expect them to be able to do any upgrades in the ten minute window they exclude from their SLA.

  2. Re:Wait.. by mikael_j · · Score: 5, Informative

    It's called a cluster, "The cloud" is a really annoying buzzword for software as a service.

    /Mikael

    --
    Greylisting is to SMTP as NAT is to IPv4
  3. What about internet downtime? by Dan+East · · Score: 4, Insightful

    Yes, but what is the average company's internet downtime verses their LAN downtime for a single-campus outfit?

    So instead of LAN / Exchange Server (or whatever is being used) you now have LAN / WAN / Google downtime. WAN gateway downtime is probably the weakest link in the chain, so wouldn't the total downtime be greater using something internet based?

    --
    Better known as 318230.
    1. Re:What about internet downtime? by vadim_t · · Score: 4, Insightful

      With an internal server, the mail you got it stays there so you can still read it, and compose replies. With an internal SMTP you can queue emails for delivery even if they don't get out (nice for laptops that may not stay around until the connection comes back). With an internal IM server you keep being able to talk to people inside the company, and can depending on the server, can queue messages until the connection comes back.

      Now if you happen to use say, gmail, then you're out of luck. You can't read your mail, can't compose replies, can't IM people in the next room. All you can do is sit there and wait for somebody to fix the problem.

    2. Re:What about internet downtime? by mysidia · · Score: 3, Insightful

      So instead of LAN / Exchange Server (or whatever is being used) you now have LAN / WAN / Google downtime. WAN gateway downtime is probably the weakest link in the chain, so wouldn't the total downtime be greater using something internet based?

      E-mail is internet based and isn't going to work if your WAN is down, regardless (you can't e-mail anyone, or receive e-mail from other people).

      One of the costs of using a service like Google Apps is the increased need to design a proper resilient network at your site that won't go down.

      If you are multi-homed and have dual WAN links that take an independent path, with a standby router, and ensure your ISP provides redundancy, and your network is properly designed according to network industry standard and respected network equipment manufacturer's best practices: then a failure of your internet connection is unlikely.

      Much less likely than the probability of failure of a single mail server.

      The cost of internet link failure or congestion is significant for companies that rely on internet-based resources and online communications for productivity.

      For companies that conduct eCommerce, it is unthinkable to have the website going down, or to not have planned enough capacity for the network connection to meet all anticipated needs in a failure scenario. Bad connectivity is already costly, even without relying on application service providers for business apps.

      In a well-designed setup, the WAN itself should not much reduce that 99.999% figure. Although yes, there are some new failure modes introduced.

      Loss of connectivity to Google, for example, even if the network is otherwise working. Some unexpected Tier1 depeering ala. Sprint/Cogent may cause issues on rare occasion.

    3. Re:What about internet downtime? by aaarrrgggh · · Score: 2, Informative

      Google appliance unfortunately is just for search. Here's to hoping they add app support as well in the future.

    4. Re:What about internet downtime? by aaarrrgggh · · Score: 2

      My company's IT budget is likely a good order of magnitude smaller-- 20 person organization. Not having your network on UPS is just stupid!

      We are in a major metropolitan area, but we have a UPS for workstations even if they are being used by a part-time student just above minimum wage. We get about four hits a year, and that alone is enough for it to make sense.

      We also have a T1 and ADSL from different providers. While automated failover isn't in place, it is on our list as time allows.

  4. Re:Wait.. by Anonymous Coward · · Score: 4, Insightful

    Google is a company. Saying "Google doesn't have 100% uptime" makes as much sense as saying "Microsoft takes 40 minutes to install". What specifically are you trying to say?

  5. 3 9's is meaningless without customer support by syousef · · Score: 4, Interesting

    The 99.9% guarantee is great, if there's someone to talk to who'll actually look at the problem when those three 9s aren't met. Otherwise it's marketing propaganda.

    --
    These posts express my own personal views, not those of my employer
  6. Server uptime is not the issue. by B5_geek · · Score: 3, Informative

    The issue is your internet connection AND your ISPs connection to the world. Your connection to the world is more likely to go down before a Google cluster would. Think of how often Telco's, ISP, and major hubs go down. This is the point behind having LOCAL copies of apps/servers/services, the odds that the hub/switch dies (with nothing else inhouse to patch around) is very slim compared to the odds of internet connectivity going south.

    --
    "The price good men pay for indifference to public affairs is to be ruled by evil men." ~Plato (427-347 BC)
    1. Re:Server uptime is not the issue. by Predius · · Score: 5, Informative

      As a commercial user of Google Apps, I have observed this not being the case. GMail does go down, and the cause is not our connectivity. What's worse is when there is a problem, all the 'phone support' does is tell you to post on their forums... not impressed.

    2. Re:Server uptime is not the issue. by Predius · · Score: 3, Insightful

      Gee... you don't think I haven't brought it up, multiple times, with data? I pointed out the pitfalls before we jumped in, and we got bit. If I had control we'd be off GMail, but it's not my final decision.

      That doesn't make my observation any less salient.

  7. Nothing has 100% uptime by EsJay · · Score: 3, Insightful

    If your organization will fail without 100% email uptime - bon chance in the real world, mon friend, bon chance.

    Make sure your users have a phone directory available on their local PCs (or paper copies on their cubicle walls). Have a phone tree notification system scheme in place in case the network is REALLY down.

    And prepare for the troublesome PRODUCTIVITY SURGE when your users cannot reach the Internet!

    1. Re:Nothing has 100% uptime by tomhudson · · Score: 4, Funny

      It's "bonne chance"...

      ... his Google Apps spellchecker only has a 99.9% SLA, you ignorant clod!

  8. Re:Wait.. by Drakonik · · Score: 2, Interesting

    A Beowulf cluster?

  9. What I actually posted by David+Gerard · · Score: 4, Funny

    was their claim that this is 4x less outages than on-site-maintained Exchange or GroupWise.

    (Notes, of course, gets 45 minutes of uptime a year.)

    --
    http://rocknerd.co.uk
  10. Re:Still beta? by David+Gerard · · Score: 2, Informative

    The service they sell isn't beta. The service they give away is what they inflict new features on.

    --
    http://rocknerd.co.uk
  11. Wow, that's pretty terrible by yttrstein · · Score: 4, Informative

    I achieved four nines (%99.99) 8 years ago with Netscape's broken mail server "Suite Spot" running on a (at the time) three year old Sun E450 with 4 gigs of RAM. As I recall, it served about 120,000 clients on a large cable network in Chicago.

    This whole "new web" thing is very pretty, but it seems like about three steps back to me.

    1. Re:Wow, that's pretty terrible by hax0r_this · · Score: 4, Insightful

      That may be true, but what you were able to achieve and what you guarantee clients you will achieve are two very different things.

  12. Re:Wait.. by game+kid · · Score: 3, Informative

    It's a King Arthur cloud, maaan. Get with the times!

    --
    You can hold down the "B" button for continuous firing.
  13. Re:Wait.. by TooMuchToDo · · Score: 3, Insightful

    On a related subject, next person who says "in the cloud" is going to get cockpunched. As parent said, there are no clouds, just highly available clusters.

  14. Re:Wait.. by moosesocks · · Score: 4, Informative

    There'd be no need for a Beowulf-type cluster in this case.

    Have a bunch of machines running identical instances of Apache, and randomly fire requests at them individually. This balances the load, and ensures that the servers themselves aren't a single point of failure.

    It's quite a bit more complicated than this in reality, although you should get the basic idea.

    Beowulf is typically used for clusters that seek to emulate a supercomputer (usually for scientific number-crunching), rather than a server. For this reason, something like Google's setup would more typically be referred to as a "server farm"

    --
    -- If you try to fail and succeed, which have you done? - Uli's moose
  15. Re:Wait.. by Anonymous Coward · · Score: 3, Funny

    Yeah, punch those bastards. Punch 'em so hard they'll go flying up high in the sky. In the cloud, even.

  16. Re:Wait.. by osu-neko · · Score: 2, Informative

    Google doesn't have 100% uptime? They have never gone down when I've noticed, guess its that sweet cloud setup they have there.

    Seriously? I see it happen at least once every few weeks or so. It's usually very temporary, like as in less than a minute, but I'm quite familiar with the look of Google's error/service unavailable page...

    --
    "Convictions are more dangerous enemies of truth than lies."
  17. Re:Wait.. by glwtta · · Score: 2, Insightful

    I thought it was a really annoying buzzword for compute capacity as a service?

    --
    sic transit gloria mundi
  18. Re:Wait.. by Midnight+Thunder · · Score: 2, Funny

    It's called a cluster, "The cloud" is a really annoying buzzword for software as a service.

    An from my experience clouds are full of unpredictable vapour and they tend to have this annoying tendency to turn to rain - not really something I would want for my data ;)

    --
    Jumpstart the tartan drive.
  19. Re:Wait.. by Anonymous Coward · · Score: 2, Funny

    I am a clod, you insensitive cloud!

  20. Re:Push by networkzombie · · Score: 2, Funny

    Push email is actually very important when there are donuts in the break room. When you alert everyone they all get the email at the same time and no one gets left out of the Monday morning cofee and donuts feeding frenzy (gotta be fast to get the eclairs, though).

  21. Microsoft has 5 nines ... by tomhudson · · Score: 3, Funny

    0.00099999.

    Hey, it's five nines ... and with all the "exceptions" and bogus metrics in google's SLA, they're not offering 3 nines.

  22. Sysadmin = Punching Bag by lucm · · Score: 2, Interesting

    When Google is down, all you get is access to lousy forums with little or no support, while your end users keep asking for an ETA or at least for an explanation. You end up being a punching bag for the failure of a solution you probably never agreed with and that was forced down your throat by the management.

    I guess this is an ok deal for small biz with no technical employees, but as soon as your users headcount goes over 20, Novell Groupwise or Microsoft Small Business Server becomes more interesting. And when hosted locally, it will at least work as internal groupware and allow users to access shared documents while the internet connection is down.

    --
    lucm, indeed.
  23. Penalties? by Jeff+Hornby · · Score: 2, Interesting

    Google guarantees 99.9% uptime, right? So what do you get if they don't deliver? A lollipop? A cookie? A profound apology personally signed by Larry and Sergey?

    Actually you get extra time.

    If the system is down for betwwen 45 minutes and 7.2 hours, you get an extra three days. &.2 hours is pretty much a full business day if it starts at the wrong time.

    If the system is down for 7.2 hours to 36 hours you get 7 free days.

    And if the system is down for more than 36 hours you get 15 free days.

    I don't know about the rest of you, but most of my clients would be losing at least tens and perhaps hundreds of thousands of dollars an hour if all of their key systems went bust. Email is down? No communications because not only is that a communication channel, that's also where you keep most of your contact information. Productivity suites are down? There goes work for the entire office for the duration. Not only are they unable to create new documents, they're unable to access existing information.

    You can say what you want about Microsoft Office (or even move to something else like OpenOffice or StarOffice) but at least when something happens to Office, it only stops one user. If Google goes down, your entire enterprise grinds to a halt for the duration.

    --
    Why doesn't Slashdot ever get slashdotted?
  24. Re:Wait.. by dkf · · Score: 2, Informative

    If you need to search through your 100GB of indexed documents, you want to be able to transparently break up that search query over multiple machines.

    Actually, it's building the index of the documents that is especially computationally intensive. Particularly chunky is the algorithm to assign a significance score to each document. Once you've done that, actual searching can then be done by merging streams of information suitably, which it is pretty easy to do fast.

    --
    "Little does he know, but there is no 'I' in 'Idiot'!"
  25. 99.9% uptime of a pain in the butt? by Roadkills-R-Us · · Score: 2, Insightful

    Google apps is NOT enterprise ready. It's taken us a month, an outside consultant, and a week's worth or intermittent, screwed up email to even get close to what we had before, email-wise. We haven't had any time to work on calendars, etc. It was extremely difficult getting google's attention at all, much less a path to anyone who could actually help. This has been the most painful rollout I've worked on in years.

    "It all depends on what your definition of 'evil' is."

    YMMV. I would only recommend google apps to a competitor I wanted to hurt. 8^)