Slashdot Mirror


Ask Slashdot: How Much Did Your Biggest Tech Mistake Cost?

NotQuiteReal writes: What is the most expensive piece of hardware you broke (I fried a $2500 disk drive once, back when 400MB was $2500) or what software bug did you let slip that caused damage? (No comment on the details — but about $20K cost to a client.) Did you lose your job over it? If you worked on the Mars probe that crashed, please try not to be the First Post, that would scare off too many people!

17 of 377 comments (clear)

  1. $24,000 by Anonymous Coward · · Score: 2, Interesting

    I was in charge of ordering a leak correlation system for a water utility that I work for. The system I choose was not quite what we needed, but worked. One week after the warranty expired, I dropped the correction unit and it has never worked since. I found out the correlator wad unrepairable and we had to order a whole new system.

  2. Outage.. by steveb3210 · · Score: 4, Interesting

    I unplugged the wrong thing in a datacenter once which took 20k domains offline. Traced the cable from the machine to the wall 2 or three times before pulling too..

    They didn't have any cable management and only one border router..

    Didn't lose my job, I was a very young sysadmin who was learning but good at what I did.. everyone kinda shrugged it off as a lesson learned.

    1. Re:Outage.. by turbidostato · · Score: 4, Interesting

      "As with most mistakes, it is part of a system that is faulty and awaiting one simple mistake to escalate."

      Can't agree any more.

      "Chances are there was a culture of trying to save money"

      Sometimes the "cargo cult" is so ingrained that even the techs are unable to see it.

      Anecdote:

      Was in a hiring process, not remember if it was Google or Amazon. One of the questions (from a hands-on tech team lead) was about a single server that went crazy and couldn't spawn any more processes, so it was almost impossible to do nothing with the computer. It still was offering whatever services it hosted just OK.

      It went more or less like this:
      Me: Has this happened before?
      Recruiter: Nope.
      Me: So... Can I try this, or that, or this other one?
      R: No, because you can't run any new process.
      M: Ok, reboot it (I of course know saying somehting like that is taboo for a unix/linux sysadmin). Let's look at the booting messages to see if we get some clue and let's monitor it afterwards to see if this happens again. If that's the case, we will be in better position to diagnose, if not, we will put it on the "computer gnomes" account.
      R: Won't try to diagnose anymore before rebooting?
      M: Nope. My time is valuable and there will surely be more productive things on my to-do list.
      R: But the computer host a service that if turned off will cost the company a bazillion!
      M: Nope. If that were the case, the powers-that-be would have engineered the service with high avaliability in mind -which in turn means we could reboot the server without further hesitation. Since that's not the case, the implicit is that business already considered it not a critical service so point above about me costing money still applies.
      R: But, but, but...
      [...]

      Of course, I knew from the very begining the answer he wanted was to find a way to list the process list without spawning a new process so after a while I went throw that route -I vaguely remember there was some Bash built-in that would allow me to do it, but not exactly which one, but back in that time I wanted to see the culture of that place.

      There's no need to say I wasn't hired. But I didn't wanted to be hired either. Not within that team at least.

    2. Re:Outage.. by Anonymous Coward · · Score: 3, Interesting

      That lets me think about a cleaner who for some unknown reason had the keys to open all rooms including the server room. Around Christmas time she needed to find a wall plug for the Christmas tree. She found one in the server room with the switches/routers/ups/backups/aircos (why she had a key of the server room, nobody knows) and just plugged the Christmas lightning in an unused socket, between UPS and switches. Of course as usual, the Christmas lightning didn't work and short circuited the network, which shutdown the airco power supply. And she just left it there. It was winter, and the servers weren't heating up that much while just idling, but they started to heat up when work started again after the weekend and when they became under heavy load. One failure after the other, the servers started to shut down one of after the other, and it was over 50 degrees Celcius in the server room. I was a programmer, but was ordered to help in emergencies, like dragging new server hardware in and out the room, but spare aircos? That's something we didn't have. On top of that all the specialists of the aircos were on a holiday, those bastards could got the days of during the end year holidays, while the 'IT guys' always had to be present in case of failure. While the system administrators were close to get a heart attack, and already pulled out half of their hairs because they couldn't find the problem, and were like sweating like a horse (remember it was over 50 C in that room), I was the one who noticed the Christmas tree and followed the cable that went over the dropped ceiling into the server room and simply unplugged it. A few moments later the aircos turned on again, one after the other, and within half an hour the temperature went back to the 26-27 degrees and the system administrators could restart the servers again.

      I never told them what I did. I had some sympathy for the cleaner, she was a pretty smart Hungarian woman with a degree in Laws and philosophy that was useless in our country, and worked hard (16 hours a day) to give her only son a change to study in our country and get a decent degree and job. If I told, she would certainly be fired right at the time her son would need lots of money to spend on new books for the second semester. I told her of course that she should never enter the server room, and comforted her with the fact that I also was just a worker and didn't tell anyone.

      She was grateful for the whole time I worked there. I was eventually the one who got fired, for not wanting to create a Java Applet to power the client side of a web shop in 2011 (!!!!). Some marketing guy had read some completely outdated books about web shops (probably from the nineties) and decided that we also need such an advanced Java Applet based web shop.

      They actually wanted to do things with a web client, like editing photos with layers, like a mini Photoshop/Gimp, that could simply not be done with a webclient (maybe it could be done with some advanced Javascripting, but I was no expert in Javascript but it would still be overkill for a simple website). They actually found a fresh college 'Java expert' who was willing to pick up the job. The last time I checked their completely outdated web shop, the Java Applet simply could not be loaded because of security problems. The web shop was marketed to their customers so hard that it backlashed enormously. May customers ended up with malware because Oracle/Sun installed the Ask toolbar (most customers didn't have Java yet) and still couldn't run the Java Applet. So recommendation where done like using XP with IE 6 to run the webshop, and that was in 2013 when the webshop was finally ready.

      Ultimately the business went bankrupt because once you go the online service way, customers will find other services when yours sucks

      My failure in this was that I could not convince marketing people that they were wrong and I was right. I was fired and found a new, more interesting, higher paying job while they ran their business into the ground in jus

  3. Improper use of systems by pierced2x · · Score: 4, Interesting

    I used a system improperly over the course of a month. It connected to some services that ran up a $50k bill. I was mortified when my boss told me, thought for sure I'd be canned on the spot. I was only 22 and it was my first job out of college, so the amount was nearly double what I was being paid. The boss basically took the heat for not having explained it to me better, and I was not reprimanded in any way.

  4. Well... by Jethro · · Score: 4, Interesting

    I don't know what monetary cost they assigned to this, but this is the one I got in the most trouble for.

    Frankly, it was something I got blamed for. I guess I can take partial responsibility. You guys tell me.

    I was the only UNIX guy at this place. We were moving our Main Internal Server to a newer machine. I had set up a cron job to rsync all user data nightly, so that when we transition over the rsync would be faster.

    So, the big day comes. I come in on a weekend, do the final rsync, change some DNS entries, shut down old machine, bring new machine up. No problem.

    Next day everyone is working happily, everything is working smoothly, no worries.

    Or so I thought. Turns out the main developer wanted something off the old server, so he turned it back on to copy his files... and then left it up.

    So, during the night, the thing automatically rsyncs and overwrites an entire day's work for about 80 people.

    Definitely partially my fault for not disabling the cron job, but I was the only one who got in any kind of trouble at all for this (to the extent of almost losing my job, and frankly that was the catalyst for me leaving that place).

    --


    In the land of the blind, the one-eyed man is kinky.
    1. Re:Well... by Jethro · · Score: 4, Interesting

      You know the old saying, "make something idiot-proof and someone will come up with a better idiot."

      They'd have plugged it back in. Again, the guy physically went into the server room and pushed a button.

      I certainly should've disabled the cron job or, better yet (as pointed out by AC down there) have known what rsync actually was and used that - I know I said I did in the original post but in retrospect I couldn't have as it wouldn't have overwritten everything. This was about 20 years ago...

      --


      In the land of the blind, the one-eyed man is kinky.
  5. About $2M -- But not really a mistake... by jnaujok · · Score: 4, Interesting

    Our group at FedEx released code that I wrote on a Saturday night. This was two days before the Apple iPhone 4 shipped. The code worked perfectly, however, despite our repeated warnings about nearly doubling downstream traffic, the downstream systems (like billing and tracking) weren't ready for it.

    So, on the day everyone wanted to track their new iPhone, my code shut down all tracking on FedEx for about 12 hours before we could switch the config setting (10 minutes) and the downstream systems could catch up (11+ hours).

    Estimate of cost was around $2 million in lost time and revenue and extra calls to customer service. Luckily, since I wasn't actually at fault, and we had multiple email chains backing up the volume estimates and warnings, we didn't get the axe.

    --
    Life, the Universe, and Everything... in my image.
  6. Took an online trading company offline for a day by Nonesuch · · Score: 4, Interesting

    I was hired as a firewall admin at an online trading company, then quickly discovered the director of IT was insane, but kept management happy because he made his numbers by keeping his team constantly understaffed; I was told to work on not just servers, but installing Sun servers in racks, running cable, and fixing just about anything plugged into the network.

    I made the mistake of showing competence in networking, so was asked to "expand my role" (new title, same salary), and start working on the switches themselves, including executing an "upgrade" to stacked HP ProCurve switches with VLANs (replacing a hodge-podge of random manufacturer switches). The actual upgrade went fine, basic testing (ping) showed everything stable, but as soon as trading opened the next day, everything went to hell, performance dropped through the floor and customers started calling in about trades timing out. Long story short, turned out that Solaris HME cards were unable to negotiate properly with ProCurve switches, half the machines were dropping packets due to duplex mismatches. There's a reason people call the Sun interface cards "Happy Meal Ethernet"

    Cost the company approximately $180,000 in direct and customer exodus losses, and was likely a factor in their eventual collapse. I wasn't fired, but management never trusted me again so I saw the writing on the wall, and quit to do consulting work at a (also doomed) dot-com online supermarket.

    On the upside, I was able to make thousands in consulting income from installing those same "lock speed to 100 and duplex to full" Solaris scripts on servers for various customers who also had performance issues plugging in Sun servers to cheap switches.

  7. Click of death by Wowsers · · Score: 4, Interesting

    My worst IT disaster was suffering from a hard drive failure, click of death. I had warning of a few days of it, and I deliberately kept the pc on 24/7 instead of normal switch on/off, to make sure the drive stayed alive until its replacement arrived.

    Obviously I had to turn the pc off to change the drive, it was not hot-swapable. When I powerd the pc up, the old hard drive failed, didn't work at all. I was faced with losing all the data on it. I left the drive alone for months wondering what to do, reading different ideas online, some of them weird.

    Eventually I decided to try the least distructive idea first. I put a sheet of paper on the failed drive to make sure the label doesn't come off, and heated up the clothes iron, then applied the iron directly onto the top of the hard drive. When the drive casing was wam enough (not so hot as to make it hard to carry), I took it to my pc, and powered up.

    The failed hard drive came to life, and I managed to grab all the files on it onto the new hard drive, uncorrupted.

    Out of interest, the failed drive failed about three months before I do forced drive change as a backup / failure prevention. I got lucky.

    --
    Take Nobody's Word For It.
  8. Other way round for me by Anonymous Coward · · Score: 2, Interesting

    I let a vendor sell me a product without really testing it. Turns out it didn't work (at all) and we lost €50k on license fees for a product we could not use.

    I was able to lay the blame on an accountant who had locked us into a 5-year contract in exchange for a minor discount. So I didn't get fired.

  9. F-16 panel flew off in flight by YrWrstNtmr · · Score: 4, Interesting

    Some other fool did not install the panel properly, and left one of the three nuts off. Distinctive nuts, used in only one place.
    Someone found it overnight, and held it up at the morning meeting. "Anyone know where this goes?" Unfortunately, I did not recognize it as a part one of my systems.

    Aircraft flew, panel breaks off, punching several other holes in the side as it departs.
    Training mission aborted. much sheet metal work needed.

    Actual repair cost? Unknown, but easily 5 figures if not more.

  10. Power cable mistake by Anonymous Coward · · Score: 2, Interesting

    Working for a desktop publishing house in it. Spent just under $4000 on 36 inch flat panel displays. Accidentally plugged in printer power cable. Immediately fried monitor. My boss was not happy. The internship did not go well the rest of. The summer.

  11. The Final Nail by Dartz-IRL · · Score: 4, Interesting

    The total cost was actually weet FA in numbers terms, but I think I put the final nail in the company's coffin.

    My first 'job' was a jobbridge internship with a 'small' company. Small enough that I was literally person number three on the employee roster. The company worked in the renewable energy sector, and had been hammered pretty hard over the last few years by The Recession as domestic and corporate purse strings were pulled tighter and tighter.

    I was taken as an Engineer, but rapidly found myself wearing a wide range of hats from Sales, to Customer Support, to System Design, to Project Management, web development in PHP, and finally, IT Support.

    Because, one day, I managed to figure out why one of my colleagues couldn't log in to the server upstairs, and corrected the problem.

    I will say, the Server was the problem.

    It was a dinosaur. It was 14 years old - twice as old as the company - and had been bought second hand. It was a monstrous beige tower with a pentium II processor and God Knows What else inside. It ran Windows Server 2000, and was solely dedicated to serving the company accounts and acting as a networked file storage. Inside the case where four HDD's.... A pair of 9GB ones for the OS and programs, and a pair of 32GB ones for files. Both pairs were mirrored in RAID 1. It had a pair of lockable Zip disk drives still fitted though the keys long lost, along with a floppy drive and a CD Drive with no write ability. Or ability to read DVDs.

    It creaked as it worked, then fumed, whuffed, whirred and occasionally burped. And it sat there, creaking away for years without thought or consideration to its well being or security. Until I came along.

    By this stage, it was obvious the company was dying - the Titanic had hit the iceberg a long time ago, and everything that was happening was just a desperate attempt to bail it out. We might've slowed the sinking - from two months, out to six, even buying a full year - but the abyss of liquidation always loomed.

    So, any suggestion of upgrading the server hardware was met by 'With What Money?'. At the same time, everybody knew the server was the lynchpin. If it broke, that was it - company gone. A suggestion that I use a spare computer from home was quietly discouraged - in case the company went under by surprise and someone decided to liquidate it to pay a creditor rather than give it back to me. Or we turned up to find the doors locked.

    The best I could do was schedule a backup of the accounts and a few other critical systems, and have it go somewhere offsite. I asked our webhost if we could use our spare space for it, and they were happy to let it happen, provided we didn't cause them problems. So, I set it to run the backup every Sunday morning - 1am or so. Each successive backup would overwrite the previous because there just wasn't the spare space to hold two (No money to pay for it)

    I figured even if the server went pop, or we had a building fire or some other catastrophe, at least those copies would survive. I'd figure out what to run them on afterwards.

    Someone, somewhere, should see the potential problem in this. In my defence, I am not, nor ever was, an IT professional. The software education I have is more related to the engineering side of things - making machines and robotics work with a view towards industrial automation, rather than the maintenance and setup of IT infrastructure and data security.

    I just did what I thought I could to keep the Titanic afloat.

    So, one Monday morning, I come to the office and am met by shrill sound of metal screaming against metal and a high speed. There's a heart-in-mouth moment as I realise that it's coming from the server cabinet.

    But, we have backups, I assured myself. The disks are mirrored in RAID 1, so if one drops out, the other should still be clean and working. If that fails, I've my own little backup too....

    Unfortunately - that only works if the damaged disk decides to drop out of the array.

    It didn't.

    I find th

    --
    So there I was, scribbling down some notes off the PC screen by hand, when I reached for the keyboard and Ctrl-S'd.
  12. Re:I wonder... by dcollins117 · · Score: 4, Interesting

    How many people will refrain from posting because the statute of limitations hasn't run out yet?

    Well, I'm certainly not going to admit to the most costly mistake as it appears no one realizes it was me and what I had done. So I'm not gonna do it; wouldn't be prudent.

    The most embarrassing mistake was I inadvertently brought down the clients' network (a major hospital) during the middle of the day. Didn't realize what I had done until about three minutes later when about a dozen IT guys flooded the computer room paying particular attention to the area I was just working in. It appears I made an error. To this day I am likely persona non grata in that computer room.

  13. Re: Patent filing missed. by Elf+M.+Sternberg · · Score: 3, Interesting

    No kidding. I'm glad we didn't. It means I can look at myself in the mirror. Career-wise, I've done okay without it. But it would have been a completely legal patent through which CI$ would have raked in millions and mililons of dollars. And, as far as I can determine, it would have been completely legal. There was no MySQL, no Postgres; OraPerl had *just* been released and was barely stable on SunOS, and there were no known instances of a CGI / OraPerl gateway on the Internet until Pacific Power & Light asked us if it was possible to connect their consumer-oriented energy savings database to that new thing called "the world wide web."

  14. Re:I'm retired now by JaredOfEuropa · · Score: 5, Interesting

    I over-promised on a time estimate once, or rather: I let myself be convinced to pad the estimate. Not by a vendor but by the client! One of the client's systems was due for an upgrade, and between myself and the support guys in India I figured it would be a 19 man-day job. I would run it as a "small project" meaning that I could run it any way I wanted. However, the client asked me: "Can you make the estimate 21 days?" That meant it would be a "proper" project run according to the client's methodology, which the client preferred for budgetary reasons. I had nothing to worry about according to the manager, a PM would be assigned to me to take care of the project formalities. So I agreed.

    At the time I was not aware of the unbelievable bureaucracy of large multinationals, and what this would do to my project. Normally I estimate the amount of real work, and add 20% for project management overhead. Maybe another 20% for red tape. But in this case, the PM was more or less forced to involve an ever increasing legion of other teams from various Centers of Excellence in the client's organization. A simple upgrade turned into a project that ran for over half a year. And by agreeing to this approach, I probably cost the client around $300,000. Of course it was mostly their own organization that ran up the cost, and they asked for this in the first place, so they never gave me any grief.

    --
    If construction was anything like programming, an incorrectly fitted lock would bring down the entire building...