Why eCommerce Sites collapse
Rahul Mehra writes "ZDNet has an interesting article about how eBay and other e-commerce sites collapse under heavy loads. It talks about how massive growth, incomplete planning, rising expectations (24x7 uptimes) and immature technology all contribute. " This train of thought, for me at least, leads to neo-Luddite question - what do you folks think?
whenever you are talking about serious work. You are back to SP clusters and s/390s and S/70s and E10ks and so on.
The Scwab issue is clearer than they are making it out to be -- I have some knowledge of this back two years and Schwab has some complete idiots running things, still, even after a series of disasters (some of which didn't make the news). I really can't explain it in any way other than people who have gotten MBAs seem to only trust other people with MBAs, no matter how poorly they perform. I set up my account there as soon as I could, but after hearing really unpleasant stories for a few years, I finally went to Fidelity.
s/390s aren't unreliable, and Parallel Sysplex stuff works well dynamically, but if a)you are basing your maintenance window and procedures on a saleman's promises to an MBA and b)you aren't keeping the better mainframers because of pay and poor treatment, you will have problems.
Similarly, Cisco routers aren't unreliable. Encryption makes a 30% performance hit. If you are about to be swamped by transactions (if the market is tanking, for instance), then turning the encryption off is a command decision that you "get paid the big bucks for." Not doing so and having systems choke is not a problem with Cisco, anymore than undersizing the systems is a technical problem.
I am relatively confident in Schwab -- I would be confident enough to keep my money there if they would take my money as seriously as I do and spend less on MBAs in technical positions and more on technical people in technical positions.
And no, to the best of my knowledge (I have several funds and they have positions in everyone out there), I own no Schwab or Fidelity stock.
At least, that's the way a lot of companies seem to treat sysadmins. A good sysadmin, who keeps systems running smoothly, *appears* to be doing nothing. Why should such a person be paid very much just to run backups and turn a few screws, they'll ask. The sysadmin is viewed as a high-tech janitor, and is given about as much respect. This often results in companies only hiring one sysadmin, or worse, foisting the sysadmins duties onto other people on a standby basis. So when the fires come, the company is suddenly understaffed. A bad sysadmin, who's always recovering from crashes, restoring backups, rerouting network traffic, looks like a busy employee. If not for him, our machines would all be down right?, they will say. So what we have here is a scenario that favors either bad sysadmins, overworked sysadmins, or standby sysadmins who are actually full-time employees with other stuff to do and worry about. Welcome to hell.
Does anyone else think that everyone out there *expecting* eBay to be available every second of every day is a bit extreme? I mean look at it this way: you can, most of the time, go on eBay, probably find something close to what you are looking for for about half the price of retail, and even order the damn thing straight to your door within a few days. And you bitch when the service burps?
It's really sickening to hear that people can't get a grip on how far technology has come, and expect it to be way farther than it is.
You should never take life too seriously - You'll never get out of it alive.
One thing the article didn't mention directly, but plays a very important part in the stability/quality of your infastructure is the quality of your sysadmins. It doesn't matter if your boxes are triply redundant hot-swappable never go down systems if the sysadmins inadvertanty blow away key files periodically. Lots and lots of IT places seem to hire semi-trained monkeys as sysadmins and then wonder why their site is always going down. Look at the chart of outages on the second or third page of that article, notice how often "Failed software upgrade" appears? The problem is that the hardware vendor is usually blamed for those kinds of problems, which draws attention away from the true problem of unqualified sysadmins. Of course most of the Slashdot crowd doesn't fall in that category.
I read the internet for the articles.
>> turning the encryption off is a command
>> decision that you "get paid the
>> big bucks for."
Gee, if I were a hacker, I'd *never ever* wait until a big event (eg. market goes to hell) to start dumping to disk if I had managed to hack into a decent-sized ISP (or worked there and was a pissy sort of person). The prestige for showing an online broker to be vulnerable has to be pretty significant, especially if you moonlight as a "security consultant" or whatever.
Maybe I'm just a wuss, but it seems like
s/get paid/get fined/g;
is a distinct possibility if the ruse is uncovered. (It's also a tacky thing to do)
I suppose that with those sorts of loads, you could make a case for it being statistically infeasible to pull any real information out without a huge amount of disk space to dump the packets onto and a lot of time to pore through them... but people don't change their passwords very often, and you could probably assemble useful information in a reasonable amount of time. And a decent lawyer should have no trouble spooking a jury into overreacting if a trial came to pass.
Either way I submit that the magnitude of the negative publicity that would ensue would make such a decision very hard to justify.
Why not colocate at, say, Above.Net, and rely on their monster pipes for the big loads? It's not like it would cost that much more, and you rely on an extremely high caliber of technical staff to keep things running.
>> Encryption makes a 30% performance hit
In my experience you're off by almost an order of magnitude, in terms of CPU load. If you're only talking about packet throughput, then yeah, the handshake, key exchange, and renegotiation every few minutes adds about 30%. It seems like CPU power is usually the bottleneck in doing SSL transactions on big fat pipes, though.
>> people who have gotten MBAs seem to only trust
>> other people with MBAs
I had the misfortune of working at one of the top business schools in the country for about a year, and this is what I perceived: MBAs without a physical science or engineering background are categorically inept at technical decisions, no matter how much they think they have learned by reading InfoWorld. Negotiation is an MBA's strong point; following through is Someone Else's Job, as best as I could make out. So why don't they recognize that they are likely to make more money (enough to offset the cost) if they hire the best (and most expensive) technical staff? Beats me...
'Cause otherwise you're presenting an opening for someone else to gain publicity as Those Guys That Suck Less (tm) and steal your mindshare and profits. That can't possibly be lost on MBAs. (can it?)
Remember that what's inside of you doesn't matter because nobody can see it.
Doing it right the first time of course isn't that easy, but once it works, don't break it. It must be possibal to run a 24x7 site for an entire year, while stuck on gilligan's island without any way to contact the rest of the world including the site.
Nasa has comptuers in buildings where at any moment deadly (a few seconds from a small leak and everyone in building is dead!) chemicals are around. Do you think that their IS wants to touch the comptuers? Not unless they first send everyone else home and empty those tanks. If it wasn't so heavy they would probably insist on space suits too.
You decide a year or more in advance how much bandwidth you will get. Then decide how many customers that will support, and you don't allow marketing to sell to any more customers. Thats right, you refuse to allow more onto the system. Marketing can deal with this if you make them, and long term satisfaction will go up.
Once you know how much bandwidth you will have, you make sure you have comptuers that can deal with it. Mainframes have been doing 24x7 for years. Unix is very close to matching that (with Sun's redundant hot swapable system perhaps better, not that sun is the only chioce) I have seen tripple redundant systems with a polling mechanism where if one comptuer gives a different result it is shut off. Guess what: none of this is cheep. Thats right, doing buisness on the internet in volumn isn't cheap. Spend the money on system that will stay up, and enough power that you don't run out, and you will run 24x7. There are plenty of companies that make equipemtn that is ment for this use.
Last, and foremost: hire system administrators that have proven they can keep the systems running 24x7, and pay them to do so. These people are older, in their 50s or so. Hire thebest of the expirenced, and then give them a deal: you pay them to keep the systems up there or not. They should soon find a paycheck arriving every two weeks, with only a few hours a month work.
Remember, design the system so you can run it from Giligans island (no access by you) without your boss realising, and you will do fine.
Of course reality is that you do have to replace crashed harddrives, but with RAID-6 (raid-5 plus more redundancy, raid-6 isn't officialy defined) that is any time. You do need to buymore backup tapes once in a while, but automatied backups are the norm in 24x7 enviroments.
This ZDNet article is probably the most fact-filled piece I've read from them.
I tend to agree with the theory that many of these companies still think like start-ups; they act like they don't have any money to spend! Perhaps they're just not aware of where their money is best spent. I can't say I know the start-up web content business mentality to its very ends, but when money is tight you start betting against catastrophe, and hope your odds are good. Duplicate server hardware is expensive for a small shop, but when you have billions of dollars in revenue, and your _entire_ business relies on your information infrastructure, the least you should do is build a duplicate server farm right down to the cables on the power supplies.
Yeah, you'll blow a million dollars on it, and you might not need it, but the maintenance costs are lower than the cost of losing your auction site, on-line trading service, bank, or retail market for five days.
You co-locate services at multiple network access points. You use reliable software--the kind you have source code to, so you're not on the phone at midnight with a "knowledge engineer" across the country who is trained in taking bug reports. You need to fix the problem so you hire people who can.
You spread the load at all points (you have multiple web servers, multiple database servers, multiple administration access points, redundant networking hardware), and you always have ample staff around for that 4:00 AM breakage.
Using age as a disqualifier?
Someone who's 50 has a chance to have 30+ years experience in the field. Let's see you, hot shot 25 year old, have 30 years experience.
--
Ben Kosse
Remember Ed Curry!
That first link should have read 'eBay problems probably preventable'.
The first link basically says that the eBay guys weren't paranoid enough about making sure the setup was reliable. This is always a problem. (hey, I'm working on a commercial web site that only got a proper sys-admin 2 years after it started...). Little side-note - one guy says Sun's clustering stuff is not that great... I know Sun have been a bit late in starting doing clustering stuff, but I've also heard that what they have done is pretty good, *shrug*. Actually, they just annouced version 3 last week, which also allows clustering of 16 Starfires, for 1024 processors. (they're also making the source code for this available...)
I remember seeing a blurb recently on Microsoft's site slamming Sun for causing the problems at Ebay. According to MS, the Sun server failed, causing the outage, while the NT front-end servers were golden. Lots of factors were cited, including the E10K's sensitivity to config changes, reliance on a smaller domain server, and other factors.
:-)
Now we learn that the problem was caused by Ebay, and Ebay alone, by not keeping up on their vendor patches, and that Sun had fixed this particular bug quite some time earlier.
It would seem that MS needs to print a retraction. Any bets on when we'll see it?
Agreed; it's always more interesting when you can hear the voice of an an author or an editor, rather than the bland predigested output of a committee. The Cluetrain manifesto is a good argument that companies shouldn't be homogeneous and faceless on the Web.
Looks like the definition of load balancing to me...
Look at AOL. They introduced their flat rate scheme, had constant service problems, infuriated all their customers -- and now they're by far the dominant ISP. The goal here is market share. It doesn't matter how much your customers bitch, how much other people hate you, how many "Why XXX Sucks" pages there are about you. The only thing that matters is how many bodies you can claim.
Providing quality service probably means that you're doing something wrong, just like making a profit does.
What I'm listening to now on Pandora...
Thinking that over for another minute, that's much more true for cases like AOL where the system was inadequate for the new load than here where they had plenty of hardware but didn't maintain it properly...
What I'm listening to now on Pandora...
It's funny how people's expectations get so out of whack when dealing with technology. Knowing how to use a paintbrush does not make one an artist. Yet when it comes to computers everyone knows what's best. E-commerce sites are just one example, but they are a VERY GOOD example.
We let pilots fly the airplanes; we let chefs cook the dinner; but we cannot let technical experts exert technical expertise. Sometimes it's scary.
Oracle uses files or logical volumes, which are basically glorified disk partitions. My experience is on HP-UX, but what generally happens is that root creates logical volumes for oracle which are accessed via /dev in the root filesystem. Once the LV's are created and opened, nothing should be able to read/write blocks in them except Oracle, under oracle's own user id. It's a basic device locking process.
Apparently Solaris screwed up this arrangement and wrote some blocks in Oracle's space. It's odd that Oracle was then able to crash the OS - the only reason I can think of is that Solaris put something really critical in those blocks, and Oracle overwrote them for some reason while it was aborting.
---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger
No, it's a basic layered design which puts block i/o on an open layer underneath the file system. A kludge is when you need a facility like this and have to work around the OS to get it, eg partition magic, Norton utilities, etc.
The logical volume layer is a great thing to work with in normal situations. Mirroring, striping, RAID, backups, and failover all work at this level. To give an example, if you want to do a hot backup of a mirrored filesystem, you can split off one mirror, mount the copy and fsck it, dd it to tape, and then merge the storage back into the mirror, without disturbing the primary FS. That works for oracle instances as well (just substitute some oracle commands for fsck above).
---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger
I am hardly an Oracle (or Sun, for that matter) expert, but I thought Oracle used its own filesystem?
Also, note that Microsoft's view on the matter is nowhere near the actual cause of the problem. It's as if Microsoft was keeping tabs on this Oracle/Sun combo and decided to come forward with their "competative analysis" when the time was right. Looks like they had some "Haloween" documents on Oracle/Sun too... ;-)
-------
Warning: Slashdot may contain traces of nuts.
Every job I've ever gone into was a mess. When I first took the job I have I was pulling my hair out this place was so bad, I mean everything here was wrong; the users habits, the way they did things, the software they used, everything was just fucked. Now, a year later, things have started to calm down and I find myself with a LOT of free time to implement back burner projects.
... back to the books ...
A sysadmin should never look idle. There are always things to do, things to improve.
The reason I stayed was that I have so much control over things, and people will listen to me.
And as clueless as my users are, I still like most of them.
Unfortunately the slashdot/linux today conspiracy - lately - is really hurting my productivity. Urgh
support gun control: take guns from cops
I Totally agree! I've been preaching this for a couple years and am implementing a very cool, full featured middleware solution now.
I read a description of what eBay had a few months ago and was shocked at the predictable crash they were heading toward.
The thing is you can't easily patch a monolithic system to run on loose clusters with replication and redundancy. It will appear much more attractive to continue down the monolithic road and add hot-spares.
Few people seem to get what it takes to build truly scalable and reliable systems.
sdw
Stephen D. Williams
The entire Dell "Shopping Cart" idea *IS* dynamic. Change an option, click a button, new price on same .asp.
I've done some ASP work and it's pretty memory intensive. Kudos to Dell for making it work -- it's slow as molasses sometimes, but it's never been down in my experience.
Three Step Plan:
1. Take over the world.
2. Get a lot of cookies.
3. Eat the cookies.
But seriously... Planning planning planning. Don't run everything on one box, keep backups, have backup plans, etc, etc. If you don't do these things your site is bound to have problems
This sig is false.
The difference is much bigger than just sheer numbers. Lets give Dell the benefit of the doubt and say that they have twice as many browsers pointed at them. They serve up mostly static pages. The parts that are dynamic are not very dynamic. Now look at eBay. They have a site that is nearly 100% dynamic what with the very nature of their business. Add in the personalized things that users can set up and you have a system that is working several orders of magnitude more than a site with heavier traffic but static content. Now, I don't think that Dell has anywhere near the number of hits. eBay junkies (My parents are antique dealers) check the site almost constantly throughout the evening. Once you've seen a Dell laptop, you know what it comes with. eBay customers expect a constant change in the site and thus check every few minutes as auctions that they are interested in draw near a close. eBay is easily a more taxed site.
neutrino
History has the relation to truth that theology has to religion-i.e. none to speak of. - Lazarus Long
Anonymous Coward wrote:
> woah, hemos you have to explain your thinking
> on this one. i cant even come close to finding
> anything with a neo-luddite feel to it in this
> article.
I think what he's talking about is the angle the
author is taking: "Could this be the death of
e-commerce?"
Answer: No, it won't. Next?
Code has become bloated... I remember when I was in development, we had to fit our software on a low density floppy or two, since most of our users would not have HD floppies (Europe was a major factor in this decision) and more than two floppies would raise the Cost of Goods.
Appears to me that a lot of programmers, webmasters and networking people have forgotten how to optimise their crap.
I remember a LARGE bank in Malaysia running their servers on DOS(!) doing transactions at the rate of a couple of thousand a day. Where have we lost our ability to optimise code, data and out thoughts?
This is all very interesting to read about. One would think that the internet is generating "unheard amounts" of loads on various systems for the first time. Mainframes (IBM, Unisys, Amdahl) have taken much more than this in terms of loads or transactions / second. The problem that I see is that people tend to isolate architectures that have worked in the past for new cool things that vendors tend to shove down their throats. CICS or for that matter virtually any transaction intensive database on proper mainframe (some of my customers are doing 20-30K transactions per second and they are in no way "big" users) could handle that load. At times, the whole internet revolution reminds me of the "client server" phase that the industry went through. Ziff David was one of the proponents of this phase (well they had to sell them damn magazines didn't they?) often claiming that a Novell file server would be damaging to companies like IBM. Well perhaps it is time to step back and examine how some of the legacy systems have worked (heck... imagine your bank telling you that their systems got overloaded on pay day!?! Then let see how we can adapt them to the Internet. IBM is doing an awesome job on this and so is HP. I strongly belive that the systems we're seeing today are "prototypes" doing proof of concepts, waiting on the big iron boxes to become internet enabled. One more point. Most of the classic "brick and mortar" businesses, people who know their technology, customers, systems.. are NOT internet or e-business enabled. Lets drop a few names of the DOW Jones components.. Ford, GM, GE, DOW, Coke etc, do more business than the e-business startups and probably process more transactions per day on their mainframes. I'd be more concerned about what happens when they start up their internet "storefronts"... Ok.. just a few random thoughts before I head into work...
All the touchy-feely bullshit on the web. Countless self-absorbed homepages, insipid rantings and more. Electronic Navel Gazing I'd say. Dennis Leary was hilarious, especially that one advert with the kid crying about keeping the net free, and Leary pops up to ask how his mom and dad paid for the computer he was using...
Blar.
I think that may be part of the problem with some sites in regard to 24/7/365 uptime. Mostly what I have seen is that microsoft products tend to work well for the tasks that MICROSOFT specifically thinks that you will use them for. Although it is technically feasible to run such a system for heavy duty services I would rather choose IBM for its reliability as a vendor of database tools and support if I had to choose a proprietary solution. However linux or bsd (which has some pretty optimised code for fast net access from all indications) would be a better idea if someone is there to get it up and running.
The death of one man is a tragedy; the death of a million is a statistic --Joseph Stalin
As it stands now, eBay's auctions are so time-critical that they're in the same league as online brokerages. And speaking of brokerages...
Fidelity is running TV ads (plastered all over Pirates of Silicon Valley last night) touting the speed of their systems and how seconds count, with a quick disclaimer at the end of the ad that response time depends on network conditions. This is a pet peeve of mine: ads with disclaimers which make the rest of the ad meaningless. Example: "99c Big Macs! That's right, 99 cents! Only 99 cents! Prices may vary." But the point is that they're promoting the idea that the internet is suitable for real-time transactions, even though they recognize that it isn't quite there.
you should invest in MBA Technologies, theyre a good company and you obviously love mba's...
-- your knees hurt, don't they?
...and you don't allow marketing to sell to any more customers. Thats right, you refuse to allow more onto the system. Marketing can deal with this if you make them,...
This reminds me of the IBM TV commercial where Bob is at an AA like meeting..."No one here is stupid"... Then Bob tells them that he forgot to tell his staff to ramp up the website for more hits because of their new PR. Then they all turn
on Bob... "That WAS stupid, Bob."
That was funny. You got me thinking about other Fox computer-related specials:
America's Funniest Core Dumps
When Spammers Attack
I Married A SysAdmin
Real Life Reboots
Totally Shocking Backups -- Caught On Tape
/* Alright -- quit yer groanin' */
Save the whales. Feed the hungry. Free the mallocs.
'Prior Proper Planning Prevents Piss-Poor Performance'.
If we can teach this to grunts, -why- cannot those who are allegedly more intelligent fail -repeatedly- to learn it?
Ah, well. I recall when Comdisco failed in the attempt they made to show Shwlob what was about to happen in a simple email system, too.
Cheers,
Drieux
...the easy way is -always- mined...
We plan, backup, build redundant systems, isolate production from testing and implementation, and still every now and then something happens that makes you realize how young all this technology really is, and that bottlenecks still exist.
I am just coming off a twenty hour day repairing problems in a production system. Both members of a cluster affected (by the clustering software itself of course). In the end we end up hacking out the best fix available on the fly.
Dependability is expensive, and that expense is often hard to justify to economy minded business people. Add to that the fact that even the most secure, stable, and isolated system will eventually break and it is a recipe for some very long days for those of us who answer the pages when it all falls down.
Good thing I enjoy this kind of work. Now its off to a nap then back to the office to listen to a vendor tell me his next release will address the trouble and explain to a few business folks that simply stating a system will be up 24/7 doesn't make it so.
OTOH, I'm not fazed by ebay's problem. I'll never hand over my CC to Amazon, and half the stuff I look for is uniquely weird - the only kind of things you can get on ebay. The crabbers are probably newbies - geez, live with it, it happens, you know? If they ever used Lynx they'd realize how far browsing on the net has come!
I remember seeing ads for Lotus that went like "the net is screaming for capitalists" or something...I suppose what hurt is that one ad had the line "short stories that nobody reads". That seemed awful crass. The net thrives on its humanity. Take out the humanity and what have you got? The soulless machine that writers have been warning us about for ages (just reread Farenheit 451 - so hard to believe it was written in the 50s!)
It hurts me that everything is done for eyeballs and money. Newbies will never realize how great it was to surf. They aren't wary of schemes that focus on them as a pyschographic and target markets. They think, "Oh how nice they want to give me free webspace". They don't think, "Gee, I don't like having my page cluttered with ads that I don't agree with."
Ebay is pretty cool. I snagged a lot of good books dirt cheap there. But it is getting harder. Before I could make a bid a few days before and still win. Now I have to wait until the last few seconds to swoop down on everyone else
You want to keep in mind that Dell is using UNIX servers on the backend. It's my understanding that they have several Sun E10K's and IBM S70s. (All in Dell black and grey, of course)
Wrote anonymous on Monday:
> Seriously though 24/7 can be done with
> present day technology. The phone system
> comes to mind.
On Thursday/Friday two Swedish magazines carried
a story about upstart "Bluetail", a spinoff from
Ericsson. These people have a telecom background
and their "Mail Robustifier" product is just out
in release 1.0. Written in the Erlang programming
language used by Ericsson in telephone exchanges,
it does load sharing between "mail servers" (I'm
not sure whether this means SMTP or POP3/IMAP)
and promises 99.999 % uptime. The targeted
market is large or medium scale ISPs.
With more problems like eBay's we should see more
telecom people moving over to doing web-related
products. Either telecom companies will change
their business or there will be spinoffs like
Bluetail, http://www.bluetail.com/
Having worked in a number of environments (small start ups, educational and corporate) there are some I have noticed that people do not seem to realize:
1) the technology is a lot more fragile than the marketroids will have you believe and the engineers want to believe.
2) Complexity seems to increase as 2 to the nth power where n is the number of components. E.g.,
2 servers have 4 ways they can interact, while 3 would have 8 ways to interact (not counting the fact that there may be many software interdependecies on and between machines).
3) Planning is key, and the central tenent should always be KISS (keep it simple and stupid -or- keep it simple, stupid!).
4) The larger the environment, the more it has a life of its own.
5) The larger the environment the more crucial communication becomes.
6) #5 above can lead to information overload.
7) There is no substitute for an intelligent, well trained well led staff. All the certifcication programs and fancy admin. tools cannot substitute for that.....
My $.02
putting the 'B' in LGBTQ+
Which is larger:
1. The number of people buying new $1000+ Dell computers.
2. The number of people wanting to bid on collectibles at eBay.
They're hardly in the same class.
What about the people who only look, I bet Dell gets a lot of window shoppers. More than e-Bay, maybe.
Ok, so eBay goes down and everyone gets all up in arms about it because this stuff is not 100% reliable.
Hey, we knew that. Even the best systems out there are expected to be down a few minutes a year, and most of 'em (including those "super-reliable" Suns) are on the order of a couple of *days* a year. Throw a relational database into the equation and, well, reliability ain't so hot.
There are ways to deal with that, and eBay didn't do ANY of them.
At a minimum they should have had a hot backup available, PARTICULARLY for the single point of failure -- the database. With a hot backup they could have been back online in a matter of a couple of minutes. It was insane to bet their business on a single Sun/Oracle box! Whoever made that decision should be out on the street.
But they can do a lot better than that with a little middleware infrastructure. There's no reason they can't replicate transactions to multiple databases -- or even split their databases up so they have lots of little ones handling part of the load rather than One Big Server.
Of course that will take some technology that is a bit beyond the duct-tape-and-bailing-wire stuff they're using. It's not rocket science but it's gonna be a bitch to do with CGI.
What it all comes down to is that they bet on an infastructure design that had a single point of failure and were screwed when it failed. That could have been -- SHOULD have been -- foreseen and protected against.
I could maybe see that being OK in a startup that didn't have the cash for duplicate hardware outlay, but eBay has the cash in spades and they STILL didn't do it. There's a certain level of stupidity at work here.
jim frost
jimf@frostbytes.com
I don't see why people are so surprised with this. The Internet still is not regarded as "acceptable" for real-time work, so why are people so affected by system faults?
"Planning planning planning."
Planning is the key. On personal systems, there are UPS devices, floppy drives, RAID configurations (well maybe not so often on a PC), Zip drives, CDRs... all sorts of mediums to circumvent the loss of data. Just because a system is online or owned by a large group is no reason to assume it is secure.
-Clump