Ask Slashdot: How Much Did Your Biggest Tech Mistake Cost?
NotQuiteReal writes: What is the most expensive piece of hardware you broke (I fried a $2500 disk drive once, back when 400MB was $2500) or what software bug did you let slip that caused damage? (No comment on the details — but about $20K cost to a client.) Did you lose your job over it? If you worked on the Mars probe that crashed, please try not to be the First Post, that would scare off too many people!
I was in charge of ordering a leak correlation system for a water utility that I work for. The system I choose was not quite what we needed, but worked. One week after the warranty expired, I dropped the correction unit and it has never worked since. I found out the correlator wad unrepairable and we had to order a whole new system.
I unplugged the wrong thing in a datacenter once which took 20k domains offline. Traced the cable from the machine to the wall 2 or three times before pulling too..
They didn't have any cable management and only one border router..
Didn't lose my job, I was a very young sysadmin who was learning but good at what I did.. everyone kinda shrugged it off as a lesson learned.
I used a system improperly over the course of a month. It connected to some services that ran up a $50k bill. I was mortified when my boss told me, thought for sure I'd be canned on the spot. I was only 22 and it was my first job out of college, so the amount was nearly double what I was being paid. The boss basically took the heat for not having explained it to me better, and I was not reprimanded in any way.
I don't know what monetary cost they assigned to this, but this is the one I got in the most trouble for.
Frankly, it was something I got blamed for. I guess I can take partial responsibility. You guys tell me.
I was the only UNIX guy at this place. We were moving our Main Internal Server to a newer machine. I had set up a cron job to rsync all user data nightly, so that when we transition over the rsync would be faster.
So, the big day comes. I come in on a weekend, do the final rsync, change some DNS entries, shut down old machine, bring new machine up. No problem.
Next day everyone is working happily, everything is working smoothly, no worries.
Or so I thought. Turns out the main developer wanted something off the old server, so he turned it back on to copy his files... and then left it up.
So, during the night, the thing automatically rsyncs and overwrites an entire day's work for about 80 people.
Definitely partially my fault for not disabling the cron job, but I was the only one who got in any kind of trouble at all for this (to the extent of almost losing my job, and frankly that was the catalyst for me leaving that place).
In the land of the blind, the one-eyed man is kinky.
Our group at FedEx released code that I wrote on a Saturday night. This was two days before the Apple iPhone 4 shipped. The code worked perfectly, however, despite our repeated warnings about nearly doubling downstream traffic, the downstream systems (like billing and tracking) weren't ready for it.
So, on the day everyone wanted to track their new iPhone, my code shut down all tracking on FedEx for about 12 hours before we could switch the config setting (10 minutes) and the downstream systems could catch up (11+ hours).
Estimate of cost was around $2 million in lost time and revenue and extra calls to customer service. Luckily, since I wasn't actually at fault, and we had multiple email chains backing up the volume estimates and warnings, we didn't get the axe.
Life, the Universe, and Everything... in my image.
I was hired as a firewall admin at an online trading company, then quickly discovered the director of IT was insane, but kept management happy because he made his numbers by keeping his team constantly understaffed; I was told to work on not just servers, but installing Sun servers in racks, running cable, and fixing just about anything plugged into the network.
I made the mistake of showing competence in networking, so was asked to "expand my role" (new title, same salary), and start working on the switches themselves, including executing an "upgrade" to stacked HP ProCurve switches with VLANs (replacing a hodge-podge of random manufacturer switches). The actual upgrade went fine, basic testing (ping) showed everything stable, but as soon as trading opened the next day, everything went to hell, performance dropped through the floor and customers started calling in about trades timing out. Long story short, turned out that Solaris HME cards were unable to negotiate properly with ProCurve switches, half the machines were dropping packets due to duplex mismatches. There's a reason people call the Sun interface cards "Happy Meal Ethernet"
Cost the company approximately $180,000 in direct and customer exodus losses, and was likely a factor in their eventual collapse. I wasn't fired, but management never trusted me again so I saw the writing on the wall, and quit to do consulting work at a (also doomed) dot-com online supermarket.
On the upside, I was able to make thousands in consulting income from installing those same "lock speed to 100 and duplex to full" Solaris scripts on servers for various customers who also had performance issues plugging in Sun servers to cheap switches.
I do not deploy Linux. Ever.
My worst IT disaster was suffering from a hard drive failure, click of death. I had warning of a few days of it, and I deliberately kept the pc on 24/7 instead of normal switch on/off, to make sure the drive stayed alive until its replacement arrived.
Obviously I had to turn the pc off to change the drive, it was not hot-swapable. When I powerd the pc up, the old hard drive failed, didn't work at all. I was faced with losing all the data on it. I left the drive alone for months wondering what to do, reading different ideas online, some of them weird.
Eventually I decided to try the least distructive idea first. I put a sheet of paper on the failed drive to make sure the label doesn't come off, and heated up the clothes iron, then applied the iron directly onto the top of the hard drive. When the drive casing was wam enough (not so hot as to make it hard to carry), I took it to my pc, and powered up.
The failed hard drive came to life, and I managed to grab all the files on it onto the new hard drive, uncorrupted.
Out of interest, the failed drive failed about three months before I do forced drive change as a backup / failure prevention. I got lucky.
Take Nobody's Word For It.
I let a vendor sell me a product without really testing it. Turns out it didn't work (at all) and we lost €50k on license fees for a product we could not use.
I was able to lay the blame on an accountant who had locked us into a 5-year contract in exchange for a minor discount. So I didn't get fired.
Some other fool did not install the panel properly, and left one of the three nuts off. Distinctive nuts, used in only one place.
Someone found it overnight, and held it up at the morning meeting. "Anyone know where this goes?" Unfortunately, I did not recognize it as a part one of my systems.
Aircraft flew, panel breaks off, punching several other holes in the side as it departs.
Training mission aborted. much sheet metal work needed.
Actual repair cost? Unknown, but easily 5 figures if not more.
Working for a desktop publishing house in it. Spent just under $4000 on 36 inch flat panel displays. Accidentally plugged in printer power cable. Immediately fried monitor. My boss was not happy. The internship did not go well the rest of. The summer.
The total cost was actually weet FA in numbers terms, but I think I put the final nail in the company's coffin.
My first 'job' was a jobbridge internship with a 'small' company. Small enough that I was literally person number three on the employee roster. The company worked in the renewable energy sector, and had been hammered pretty hard over the last few years by The Recession as domestic and corporate purse strings were pulled tighter and tighter.
I was taken as an Engineer, but rapidly found myself wearing a wide range of hats from Sales, to Customer Support, to System Design, to Project Management, web development in PHP, and finally, IT Support.
Because, one day, I managed to figure out why one of my colleagues couldn't log in to the server upstairs, and corrected the problem.
I will say, the Server was the problem.
It was a dinosaur. It was 14 years old - twice as old as the company - and had been bought second hand. It was a monstrous beige tower with a pentium II processor and God Knows What else inside. It ran Windows Server 2000, and was solely dedicated to serving the company accounts and acting as a networked file storage. Inside the case where four HDD's.... A pair of 9GB ones for the OS and programs, and a pair of 32GB ones for files. Both pairs were mirrored in RAID 1. It had a pair of lockable Zip disk drives still fitted though the keys long lost, along with a floppy drive and a CD Drive with no write ability. Or ability to read DVDs.
It creaked as it worked, then fumed, whuffed, whirred and occasionally burped. And it sat there, creaking away for years without thought or consideration to its well being or security. Until I came along.
By this stage, it was obvious the company was dying - the Titanic had hit the iceberg a long time ago, and everything that was happening was just a desperate attempt to bail it out. We might've slowed the sinking - from two months, out to six, even buying a full year - but the abyss of liquidation always loomed.
So, any suggestion of upgrading the server hardware was met by 'With What Money?'. At the same time, everybody knew the server was the lynchpin. If it broke, that was it - company gone. A suggestion that I use a spare computer from home was quietly discouraged - in case the company went under by surprise and someone decided to liquidate it to pay a creditor rather than give it back to me. Or we turned up to find the doors locked.
The best I could do was schedule a backup of the accounts and a few other critical systems, and have it go somewhere offsite. I asked our webhost if we could use our spare space for it, and they were happy to let it happen, provided we didn't cause them problems. So, I set it to run the backup every Sunday morning - 1am or so. Each successive backup would overwrite the previous because there just wasn't the spare space to hold two (No money to pay for it)
I figured even if the server went pop, or we had a building fire or some other catastrophe, at least those copies would survive. I'd figure out what to run them on afterwards.
Someone, somewhere, should see the potential problem in this. In my defence, I am not, nor ever was, an IT professional. The software education I have is more related to the engineering side of things - making machines and robotics work with a view towards industrial automation, rather than the maintenance and setup of IT infrastructure and data security.
I just did what I thought I could to keep the Titanic afloat.
So, one Monday morning, I come to the office and am met by shrill sound of metal screaming against metal and a high speed. There's a heart-in-mouth moment as I realise that it's coming from the server cabinet.
But, we have backups, I assured myself. The disks are mirrored in RAID 1, so if one drops out, the other should still be clean and working. If that fails, I've my own little backup too....
Unfortunately - that only works if the damaged disk decides to drop out of the array.
It didn't.
I find th
So there I was, scribbling down some notes off the PC screen by hand, when I reached for the keyboard and Ctrl-S'd.
How many people will refrain from posting because the statute of limitations hasn't run out yet?
Well, I'm certainly not going to admit to the most costly mistake as it appears no one realizes it was me and what I had done. So I'm not gonna do it; wouldn't be prudent.
The most embarrassing mistake was I inadvertently brought down the clients' network (a major hospital) during the middle of the day. Didn't realize what I had done until about three minutes later when about a dozen IT guys flooded the computer room paying particular attention to the area I was just working in. It appears I made an error. To this day I am likely persona non grata in that computer room.
No kidding. I'm glad we didn't. It means I can look at myself in the mirror. Career-wise, I've done okay without it. But it would have been a completely legal patent through which CI$ would have raked in millions and mililons of dollars. And, as far as I can determine, it would have been completely legal. There was no MySQL, no Postgres; OraPerl had *just* been released and was barely stable on SunOS, and there were no known instances of a CGI / OraPerl gateway on the Internet until Pacific Power & Light asked us if it was possible to connect their consumer-oriented energy savings database to that new thing called "the world wide web."
I over-promised on a time estimate once, or rather: I let myself be convinced to pad the estimate. Not by a vendor but by the client! One of the client's systems was due for an upgrade, and between myself and the support guys in India I figured it would be a 19 man-day job. I would run it as a "small project" meaning that I could run it any way I wanted. However, the client asked me: "Can you make the estimate 21 days?" That meant it would be a "proper" project run according to the client's methodology, which the client preferred for budgetary reasons. I had nothing to worry about according to the manager, a PM would be assigned to me to take care of the project formalities. So I agreed.
At the time I was not aware of the unbelievable bureaucracy of large multinationals, and what this would do to my project. Normally I estimate the amount of real work, and add 20% for project management overhead. Maybe another 20% for red tape. But in this case, the PM was more or less forced to involve an ever increasing legion of other teams from various Centers of Excellence in the client's organization. A simple upgrade turned into a project that ran for over half a year. And by agreeing to this approach, I probably cost the client around $300,000. Of course it was mostly their own organization that ran up the cost, and they asked for this in the first place, so they never gave me any grief.
If construction was anything like programming, an incorrectly fitted lock would bring down the entire building...