Slashdot Mirror


Ask Slashdot: How Much Did Your Biggest Tech Mistake Cost?

NotQuiteReal writes: What is the most expensive piece of hardware you broke (I fried a $2500 disk drive once, back when 400MB was $2500) or what software bug did you let slip that caused damage? (No comment on the details — but about $20K cost to a client.) Did you lose your job over it? If you worked on the Mars probe that crashed, please try not to be the First Post, that would scare off too many people!

377 comments

  1. I'm retired now by Anonymous Coward · · Score: 5, Funny

    But back in the 1960's, I figured we could save a bit of money by only storing the year in our data records. No one would use my program decades later, right? Boy, was I wrong!

    1. Re:I'm retired now by Rei · · Score: 5, Funny

      I don't have anything nearly that bad - my worst only cost me data. A friend taught me (while I was still learning Linux) a trick, how you could play music with dd by outputting the sound to /dev/dsp. But as I said, I was still learning Linux and hadn't quite gotten all of the device names into my head, and I mixed /dev/dsp up with /dev/sda...

      --
      Dear Lord: One of your creatures may be hurt tonight. Please let it be the other creature.
    2. Re:I'm retired now by JMJimmy · · Score: 1

      My worst was pretty tame in comparison. Over promised on some specs I couldn't deliver on in the end. Cost the client about $4k - oops.

    3. Re:I'm retired now by Anonymous Coward · · Score: 0

      Pulling the wrong cable at the wrong time during a rack equipment upgrade.
      No idea how much it cost, but they probably had a nightmare of a restore since this was a large company.

      rm -rf /some_NFS_directory_here
      cost: $0, went to backups straight away, luckily for a small company.

    4. Re: I'm retired now by Anonymous Coward · · Score: 0

      Never heard of it huh? It's that operating system that runs the majority of the 'net. Everyone uses it daily even though they're not aware.

    5. Re: I'm retired now by Anonymous Coward · · Score: 1

      Playing around as root is hazardous which I'm sure you're well aware of now. :)

    6. Re:I'm retired now by JaredOfEuropa · · Score: 5, Interesting

      I over-promised on a time estimate once, or rather: I let myself be convinced to pad the estimate. Not by a vendor but by the client! One of the client's systems was due for an upgrade, and between myself and the support guys in India I figured it would be a 19 man-day job. I would run it as a "small project" meaning that I could run it any way I wanted. However, the client asked me: "Can you make the estimate 21 days?" That meant it would be a "proper" project run according to the client's methodology, which the client preferred for budgetary reasons. I had nothing to worry about according to the manager, a PM would be assigned to me to take care of the project formalities. So I agreed.

      At the time I was not aware of the unbelievable bureaucracy of large multinationals, and what this would do to my project. Normally I estimate the amount of real work, and add 20% for project management overhead. Maybe another 20% for red tape. But in this case, the PM was more or less forced to involve an ever increasing legion of other teams from various Centers of Excellence in the client's organization. A simple upgrade turned into a project that ran for over half a year. And by agreeing to this approach, I probably cost the client around $300,000. Of course it was mostly their own organization that ran up the cost, and they asked for this in the first place, so they never gave me any grief.

      --
      If construction was anything like programming, an incorrectly fitted lock would bring down the entire building...
    7. Re:I'm retired now by Anonymous Coward · · Score: 1

      I'm sure by the time they were done cooking the books they were ahead by $300,000...

    8. Re:I'm retired now by AmiMoJo · · Score: 4, Funny

      I'm writing firmware today that stores the date as a 16 bit unsigned integer giving the number of days since 1/1/2000. When printed it is converted to an 8 bit unsigned year and formatted with %02u (2 digits). I'm well aware that this will fail on 1/1/2100, but... I'll almost certainly be dead and no-one will be running this code in 85 years time, surely...

      I'm starting to feel bad about it now.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    9. Re:I'm retired now by Anonymous Coward · · Score: 0

      I figured we could save a bit of money by only storing the year in our data records, and when memory became cheaper it'd be a simple matter of going in and changing it. No one would be so cheap, stupid, and shortsighted as to wait until the last minute to fix the program decades later, right? Boy, was I wrong!

      Slight correction needed.

    10. Re: I'm retired now by khellendros1984 · · Score: 2

      About 14 years ago, I used Linux for the first time, after having used various versions of DOS and Windows starting around 1993. There was so much different about how you use the system, how things get done, and new mindsets to get used to. On top of that, discoverability of device paths, standard Unix utility names, etc is pretty terrible. So yes, "Learning" seems like the appropriate word.

      --
      It is pitch black. You are likely to be eaten by a grue.
    11. Re:I'm retired now by Anonymous Coward · · Score: 0

      Worst mistake, well nothing major, but I have a bunch of little ones.

      Fried a CD-ROM by inserting the power cable backwards. (ATX power supply, so it should have been unidirectional, poor design on the CD-ROM manufacturers part) about $70 at the time.

      Other than that I have done various little mistakes. Once accidentally hooked an AVR micro-controller to a 30 volt line, instant dead, another time I flashed a pair of them with an invalid clock fuse and bricked the pair.

      Used a integer to hold a 80 bit floating point precision numbers exponent when writing my own ascii to float function in assembly, no cost but had to rewrite the function later so maybe an hours worth of time.

      Probably the funniest was when I was changing a kitchen dome light bulb and the base broke off, and while I was removing the debris my sister walked onto the kitchen and of course turned on the light. She saw me there and thought that having light would help... I managed to jump off the ladder without injury but it was a surprising jolt. (Now I tape off the switch or hit the breaker first, sometimes both.)

      Biggest data loss, was when I thought I could wait a week for a replacement drive before backing up a hard drive that was making clicking sounds while in use.... Lost all data on the drive. (Other than clicking while in use, it worked perfectly, so i thought i had time.)

      I have a much longer list, but these are the ones off the top of my head. I'm pretty sure all of us have quite a few of these.

    12. Re: I'm retired now by Trax3001BBS · · Score: 1

      Never heard of it huh? It's that operating system that runs the majority of the 'net. Everyone uses it daily even though they're not aware.

      I had a three month contract to install fiber optics and set up a new network, the person in charge of me was in charge of all of the computers, the main computer that accessed outside (a gateway if you will) was a Linux, he had no clue how to work on on it, and would touch it on a bet.

      I didn't see a problem with that :)

    13. Re: I'm retired now by Rei · · Score: 1

      Yep, exact same situation for me. Learned DOS on a 286 and when I was 16-ish a friend started telling me about this neat new operating system which nobody else I knew had heard of, called Linux....

      --
      Dear Lord: One of your creatures may be hurt tonight. Please let it be the other creature.
    14. Re: I'm retired now by Rei · · Score: 1

      Indeed - but I was newly come from Dos / Windows 3.1 / Windows 95 where you're always root and hadn't yet fully groked why it was such a big deal to not do everyday activities as root. ;)

      --
      Dear Lord: One of your creatures may be hurt tonight. Please let it be the other creature.
    15. Re:I'm retired now by Anonymous Coward · · Score: 0

      If it makes you feel better, it's not unlikely that they wanted exactly that and knew it would turn out like that - to use up their budget so their allocation was calculated off a higher baseline for the next year.

      It ought to be criminal in my view, I'd fight any case I knew of, but it happens every day.

    16. Re:I'm retired now by Wolfrider · · Score: 1

      --Don't feel too bad, I did a similar thing working on my dad's ancient 500MHz XP PC back in the day. Was trying to DD write to floppy and mistyped it as /dev/sda... Lucky he wasn't using it much, I think we ended up selling it or giving it away to a friend

      --
      .
      == WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??
    17. Re:I'm retired now by AmazingRuss · · Score: 2

      The moment I hear "Center of Excellence" I run for the exit.

  2. postage by Anonymous Coward · · Score: 0

    $32,000 in paper and postage

  3. $24,000 by Anonymous Coward · · Score: 2, Interesting

    I was in charge of ordering a leak correlation system for a water utility that I work for. The system I choose was not quite what we needed, but worked. One week after the warranty expired, I dropped the correction unit and it has never worked since. I found out the correlator wad unrepairable and we had to order a whole new system.

  4. Outage.. by steveb3210 · · Score: 4, Interesting

    I unplugged the wrong thing in a datacenter once which took 20k domains offline. Traced the cable from the machine to the wall 2 or three times before pulling too..

    They didn't have any cable management and only one border router..

    Didn't lose my job, I was a very young sysadmin who was learning but good at what I did.. everyone kinda shrugged it off as a lesson learned.

    1. Re:Outage.. by Anonymous Coward · · Score: 0

      Hod did unplugging just one cable do that? Wasn't there a backup route for anything required?

    2. Re:Outage.. by Anonymous Coward · · Score: 4, Informative

      DNS servers on the same subnet. You, know, the thing you aren't supposed to do, but everyone does anyway.

    3. Re:Outage.. by jellomizer · · Score: 2, Insightful

      As with most mistakes, it is part of a system that is faulty and awaiting one simple mistake to escalate.
      Any one human can make a mistake. However a good system should have built in methods to protect against this.
      Why wasn't their a backup system, why didn't it have have a fail over network/power, why wasn't there proper labeling.

      Chances are there was a culture of trying to save money: paying for a redundant system cost twice as much, or more. Having those network guys spend hours cleaning up and reorganizing where they can be working on more profit driven activities.
      They are too focused on being agile and quick, that they will let little things slip.

      For 99% of the failures and mistakes that happen it is the fault of the system, and not of the person who happened to make mistakes.

      Organizations need to prioritize these methods and follow to make sure they are worked. Not just write them down, post them on some intranet and blame people for not following them if it wasn't followed. It needs the full organization to make sure checks are in place.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    4. Re:Outage.. by sumdumass · · Score: 1

      Something similar. Took almost an entire ISP down. Had a few servers with about 200 domains running bsd located at thier "data center " which was more like a couple shelve and a long bench. Anyways, they where supposed to be running a script to verify two servers were mirroring the other two. I got lazy and stopped checking the logs for it and eventually they stopped running the backups or the script to verify it. One day a drive failed and about 50 domains were off line. I couldn't remote into any server and started getting a run around from their techs so i loaded up all the backup servers i had and a file share with copies of everthing and drove the 200 miles to the isp.

      Turns out one of their techs tried to fix the problem by pulling a good drive from one of the other boxes but wasn't the one mirroring the bad drive. This then caused issues in the raid for the good box which he tried to rebuild by pulling the a drive from the mirroring box and ended up breaking all the configs. The worse part is that he thought he had the right tools to fix everything at home and instead of going to get them, he loaded my servers up and took them home.

      So i show up, realize i have to start from scratch, set up a couple makeshift boxes that likely wouldn't survive a month, then i connected an old NetWare server. I enabled SMB on the two new servers and started transferring files from the NetWare server. Next thing i know, someone came in and started rebooting all the routers. I looked and jokingly said a reboot is not a fix.

      Well, this went on for about two hours with about half a dozen people working on it, making phone calls and claiming they were under some DOS attack. My file transfer was finished, i disconnected the NetWare server, and it all magically stopped. I had misconfigured the SMB and created a packet storm that their routers and modems gladly repeated and multiplied to the point it almost melted their network.

      My real servers finally showed back up so i loaded them up, built new ones and had a t3 ran to a commercial building near the house that became their new home. There was a lot of finger pointing and talk about compensation but it got dropped when i reminded them that the only reason i had access of that kind was because they failed to fulfill a contract obligations and then screwed the pooch trying to recover.

    5. Re:Outage.. by jon3k · · Score: 1

      My DNS servers are on the same subnet and there isn't one cable anywhere you could unplug that would take them both offline.

    6. Re:Outage.. by turbidostato · · Score: 4, Interesting

      "As with most mistakes, it is part of a system that is faulty and awaiting one simple mistake to escalate."

      Can't agree any more.

      "Chances are there was a culture of trying to save money"

      Sometimes the "cargo cult" is so ingrained that even the techs are unable to see it.

      Anecdote:

      Was in a hiring process, not remember if it was Google or Amazon. One of the questions (from a hands-on tech team lead) was about a single server that went crazy and couldn't spawn any more processes, so it was almost impossible to do nothing with the computer. It still was offering whatever services it hosted just OK.

      It went more or less like this:
      Me: Has this happened before?
      Recruiter: Nope.
      Me: So... Can I try this, or that, or this other one?
      R: No, because you can't run any new process.
      M: Ok, reboot it (I of course know saying somehting like that is taboo for a unix/linux sysadmin). Let's look at the booting messages to see if we get some clue and let's monitor it afterwards to see if this happens again. If that's the case, we will be in better position to diagnose, if not, we will put it on the "computer gnomes" account.
      R: Won't try to diagnose anymore before rebooting?
      M: Nope. My time is valuable and there will surely be more productive things on my to-do list.
      R: But the computer host a service that if turned off will cost the company a bazillion!
      M: Nope. If that were the case, the powers-that-be would have engineered the service with high avaliability in mind -which in turn means we could reboot the server without further hesitation. Since that's not the case, the implicit is that business already considered it not a critical service so point above about me costing money still applies.
      R: But, but, but...
      [...]

      Of course, I knew from the very begining the answer he wanted was to find a way to list the process list without spawning a new process so after a while I went throw that route -I vaguely remember there was some Bash built-in that would allow me to do it, but not exactly which one, but back in that time I wanted to see the culture of that place.

      There's no need to say I wasn't hired. But I didn't wanted to be hired either. Not within that team at least.

    7. Re:Outage.. by JSG · · Score: 2

      My DNS servers are on the same subnet and there isn't one cable anywhere you could unplug that would take them both offline.

      What about:

      * Router misconfig, takes out default gateway for a while for both
      * An extra cable is added and {MR}STP was disabled by accident or something like that.
      * etc etc

      Anyway, your proud boast may one day discover that people do the funniest things. If your DNS servers are in fact the same box with two IPs ...

    8. Re:Outage.. by Anonymous Coward · · Score: 2, Informative

      Be careful about criticizing others. Routers don't have default gateways, they have null routes. They can also be set up to be redundant gateways for others and have many redundant null routes themselves...

      Turning off STP on just one router would never be a problem. There are master and standby root bridges. Even if they both go down, others will step in to take the job. It would require a total network shutdown of all layer three equipment before it would be a problem and even then, ttl limits and excess traffic would cause the routers to drop one of the cables in the loop within seconds.

      This is entry-level networking knowledge.

    9. Re:Outage.. by jon3k · · Score: 1
      Read again carefully:

      there isn't one cable anywhere you could unplug that would take them both offline.

      I didn't say they were invulnerable. Calm down.

    10. Re:Outage.. by Anonymous Coward · · Score: 1

      Domain policies like requiring at least two DNS servers are there as a clue. But as you have illustrated, there are plenty of idiots that will do very stupid things.

      You'd be better off if one of your DNS was running on a DSL connection in your basement than having both on same network. I'll leave it as an exercise to figure out why.

    11. Re:Outage.. by steveb3210 · · Score: 2

      I unplugged the only border router.

    12. Re: Outage.. by Anonymous Coward · · Score: 0

      I wouldn't want to work with you. I just read you as, "not my problem", despite figuring out what went wrong is precisely your job.

      You sound like a dick answering a different question than asked and being smug about it.

    13. Re:Outage.. by Anonymous Coward · · Score: 1

      My DNS servers are on the same subnet and there isn't one cable anywhere you could unplug that would take them both offline.

      That's exactly what steveb3210 used to say....

    14. Re:Outage.. by steveb3210 · · Score: 1

      The problem had nothing to do with DNS servers, this datacenter had only one border router.

    15. Re:Outage.. by Anonymous Coward · · Score: 3, Interesting

      That lets me think about a cleaner who for some unknown reason had the keys to open all rooms including the server room. Around Christmas time she needed to find a wall plug for the Christmas tree. She found one in the server room with the switches/routers/ups/backups/aircos (why she had a key of the server room, nobody knows) and just plugged the Christmas lightning in an unused socket, between UPS and switches. Of course as usual, the Christmas lightning didn't work and short circuited the network, which shutdown the airco power supply. And she just left it there. It was winter, and the servers weren't heating up that much while just idling, but they started to heat up when work started again after the weekend and when they became under heavy load. One failure after the other, the servers started to shut down one of after the other, and it was over 50 degrees Celcius in the server room. I was a programmer, but was ordered to help in emergencies, like dragging new server hardware in and out the room, but spare aircos? That's something we didn't have. On top of that all the specialists of the aircos were on a holiday, those bastards could got the days of during the end year holidays, while the 'IT guys' always had to be present in case of failure. While the system administrators were close to get a heart attack, and already pulled out half of their hairs because they couldn't find the problem, and were like sweating like a horse (remember it was over 50 C in that room), I was the one who noticed the Christmas tree and followed the cable that went over the dropped ceiling into the server room and simply unplugged it. A few moments later the aircos turned on again, one after the other, and within half an hour the temperature went back to the 26-27 degrees and the system administrators could restart the servers again.

      I never told them what I did. I had some sympathy for the cleaner, she was a pretty smart Hungarian woman with a degree in Laws and philosophy that was useless in our country, and worked hard (16 hours a day) to give her only son a change to study in our country and get a decent degree and job. If I told, she would certainly be fired right at the time her son would need lots of money to spend on new books for the second semester. I told her of course that she should never enter the server room, and comforted her with the fact that I also was just a worker and didn't tell anyone.

      She was grateful for the whole time I worked there. I was eventually the one who got fired, for not wanting to create a Java Applet to power the client side of a web shop in 2011 (!!!!). Some marketing guy had read some completely outdated books about web shops (probably from the nineties) and decided that we also need such an advanced Java Applet based web shop.

      They actually wanted to do things with a web client, like editing photos with layers, like a mini Photoshop/Gimp, that could simply not be done with a webclient (maybe it could be done with some advanced Javascripting, but I was no expert in Javascript but it would still be overkill for a simple website). They actually found a fresh college 'Java expert' who was willing to pick up the job. The last time I checked their completely outdated web shop, the Java Applet simply could not be loaded because of security problems. The web shop was marketed to their customers so hard that it backlashed enormously. May customers ended up with malware because Oracle/Sun installed the Ask toolbar (most customers didn't have Java yet) and still couldn't run the Java Applet. So recommendation where done like using XP with IE 6 to run the webshop, and that was in 2013 when the webshop was finally ready.

      Ultimately the business went bankrupt because once you go the online service way, customers will find other services when yours sucks

      My failure in this was that I could not convince marketing people that they were wrong and I was right. I was fired and found a new, more interesting, higher paying job while they ran their business into the ground in jus

    16. Re: Outage.. by jellomizer · · Score: 1

      Well it depends. Sometimes you want someone who will be a cowboy and solve and fix problems on the fly. Other times you want someone who be proactive and give you a safe solution.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    17. Re: Outage.. by Anonymous Coward · · Score: 0

      Then you didn't create the problem. The moron responsible for that decision did.

    18. Re: Outage.. by turbidostato · · Score: 2

      "I just read you as, "not my problem""

      Yes, that's the case... from a certain point of view.

      I usually respect enough others' work as to give them their due credit. In this case, it means I credit the system architect as being able to design the system properly. No high availability means it's not a critical server, so I adapt my procedures accordingly.

      "figuring out what went wrong is precisely your job"

      No, it isn't. My job is to produce the most value for the company within my assigned competencies. Sometimes it means scratch my head for hours to solve a problem. Some others it means reboot/destroy a server wihtout a second look then go to the next item on my to-do list. You know, servers are not pets but cattle.

      "You sound like a dick answering a different question than asked"

      In fact, I didn't. I was asked to solve the problem, not to diagnose the problem and solve it without rebooting the server, and I honestly gave the answer I considered to be the most effective. As it resulted, it was not the answer my interviewer expected nor wanted but I'm fine with that: in a hiring process the prospective employee is interviewing the employer just as much as the other way around.

    19. Re: Outage.. by Anonymous Coward · · Score: 1

      The fact that you said "turn off STP on the router" invalidates your entire response.

    20. Re: Outage.. by Anonymous Coward · · Score: 0

      To me it sounds like you know not to join a failing team. Not really a dick but pragmatic in an interview situation.

    21. Re:Outage.. by ultranova · · Score: 2

      Anyway, your proud boast may one day discover that people do the funniest things.

      Hmm...

      1. Create a domain.
      2. Have that domain host a single page saying "Nothing can take down this page."
      3. Have that page and DNS server hosted in a datacenter in an enemy country.
      4. Sit back and watch.

      Weaponized hubris - what could possibly go wrong?

      --

      Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

    22. Re: Outage.. by ultranova · · Score: 1

      figuring out what went wrong is precisely your job

      No, it isn't. My job is to produce the most value for the company within my assigned competencies.

      In theory, companies care only about profits. In practice, corporations are made of living humans, are thus living things themselves, and as such care mostly about homeostasis. Profit only enters the picture as food, and like humans whose imaginations they live in, corporations too tend to ignore long-term consequences for immediate gratification, especially since the law gives their parasitic load - the shareholders - control over their actions.

      So, as far as the company was concerned, you were carrying - and sticking to - dangerous ideas that could had resulted in changes to corporate culture - to homeostasis. You "tasted" wrong, so you were rejected. I wonder if the whole corporate world could be described in the terms of biology more accurate than in the terms of economics, and perhaps improved through its methods?

      --

      Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

    23. Re: Outage.. by jellomizer · · Score: 2

      The Job interview process is actually a two way process.
      The company needs/wants the resource, that is why they are open positions.
      The Person needs/wants a job or a better job, that is why they are applying.

      Now even in the height of the last recession and it was a big one. In America average Unemployment was under 10% of the population. While that created a market where employees had the advantage, it was only an advantage not supreme power.
      1. The employees wanted people who were currently employed (Using an outdated reasoning that if they weren't laid off then they must be good enough to have made it). So while these applicants may be looking for a better job, they have a job currently and is only willing to take a better offer.

      2. If your industry isn't offering the type of work people want to do for the money anymore, then people may make life decisions to go a different route. Go back to school and study a new topic. Use their skills in a different industry.

      3. High turnover: Turnover is really expensive on average it takes 150% of the salary to deal with an employees turnover, having to retrain new employees, catch up time etc... If your corporate culture is poison. Then you will have a hard time keeping employees.

      I have been on some job interviews where I lost my temper with the recruiter. One company had a very particular piece of software (Like so particular I couldn't find a relative match it with a Google search, except when I added the industry on it, then it was a few pages deep.) The recruiter kept on hounding me on this tool. I asked what does it do, where then I can at least give a general abstract answer to the questions. The they didn't know either. From this interview I got the following impression. The guy who worked on the software (Probably the guy who made it) left the company for a better job. They are trying to find someone with the exact skill sets and pay them as much as the guy who left for a better job. So they let a good resource leave, and they haven't learned from their mistakes and either realize that they will need to lower the requirements, or raise the salary and benefits.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    24. Re: Outage.. by turbidostato · · Score: 1

      "They are trying to find someone with the exact skill sets and pay them as much as the guy who left for a better job. So they let a good resource leave, and they haven't learned from their mistakes and either realize that they will need to lower the requirements, or raise the salary and benefits."

      Yes.

      Going back to the first post on this thread, all this means that, in a company, "the service" is much deeper and wider than thought at first glance and, say, a server breaking can have its root causes very far away from the server room.

      It pays to have an holistic view about the business, but very few companies pay attention to that or are even organized to facilitate such a way.

    25. Re: Outage.. by Wycliffe · · Score: 1

      especially since the law gives their parasitic load - the shareholders - control over their actions.

      I'm not sure you know the definition of a parasite. A parasite can't survive without its host. The shareholders are the investors and can survive just fine without the company but the company wouldn't even exist without it's shareholders/investors. Calling the shareholders/investors parasites is like calling the leaves on a tree parasites. Without the leaves, the tree has no energy(money) and dies.

  5. .07 per pill by Anonymous Coward · · Score: 0

    Was a bit shy to speak to my doctor about my ED, so.... Yeah...

  6. Intel CPU sockets are terrible. by Anonymous Coward · · Score: 0

    Biggest mistake is derping out and going sideway with the cpu while installing it. A bunch of pins in the cpu socket got bent. On a $300 motherboard.

    1. Re:Intel CPU sockets are terrible. by TWX · · Score: 1

      Heh. I sort of miss the days when CPUs had pins and the sockets were just a pattern of holes. The ZIF socket of the nineties worked quite well.

      --
      Do not look into laser with remaining eye.
    2. Re:Intel CPU sockets are terrible. by Anonymous Coward · · Score: 0

      what kind of oddball, obscure CPUs don't have pins?

    3. Re:Intel CPU sockets are terrible. by Bengie · · Score: 3, Informative

      Pretty much all modern Intel CPUs from the past many years.

    4. Re: Intel CPU sockets are terrible. by Anonymous Coward · · Score: 0

      Modern Intel and AMD CPUs use a Land Grid Array - https://en.m.wikipedia.org/wiki/Land_grid_array :

      "Unlike the pin grid array (PGA) interface found on most AMD and older Intel processors, there are no pins on the chip; in place of the pins are pads of bare gold-plated copper that touch protruding pins on the microprocessor's connector on the motherboard."

  7. "I broke Asia" by Anonymous Coward · · Score: 1

    I cost our Asian office a day's work after I failed to verify that a deployment completed successfully.

    The deployment was done on Friday evening US time, which would have been around 1 or 2am UK time. I couldn't be bothered to stay up for that so figured that I'd check in the morning.

    Naturally I forgot to do that.

    Throughout the weekend whenever I was out, I'd suddenly remember and think "I'd better check that when I get back in."

    Naturally, I forgot to do that.

    On Monday morning, I received a lot of phone calls and emails asking where I was and to get into the office ASAP. When I got in, I found out that the deployment had failed and the rollback scripts that I'd asked the team to run had not been run.

    After a lot of frantic phone calls, we found a DBA in the Asia office who still had database access to the Production servers and he rolled the changes back.

    By then however, Asia had lost a whole day of work and I was given a written warning by my manager.

    It's still a running joke amongst my friends that I "took out all of Asia for a day". And if I ever interview and I can see it's going badly, I tell this story in response to the "What's your weakest asset" question, just to see the look on their faces.

    1. Re:"I broke Asia" by tehcyder · · Score: 1

      if I ever interview and I can see it's going badly, I tell this story in response to the "What's your weakest asset" question, just to see the look on their faces.

      Um, I think you're supposed to say something like "I am occasionally impatient with people who are less intelligent and driven than me" not "I wasted our Asian operation a whole day's work and got a written warning for my incompetence".

      --
      To have a right to do a thing is not at all the same as to be right in doing it
    2. Re:"I broke Asia" by HornWumpus · · Score: 1

      I sometimes give stupid answers to stupid questions.

      --
      John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
  8. 250K USD by Anonymous Coward · · Score: 0

    Broke SLA shutdown wrong mainframe. Still have the job

    1. Re: 250K USD by Anonymous Coward · · Score: 0

      Got naked pics of the bosses wife, eh?

  9. My HD DVD player and collection... by Shabbs · · Score: 1

    Heh - would have to total all that up... sigh... but it still works!

    --
    Mark
  10. Improper use of systems by pierced2x · · Score: 4, Interesting

    I used a system improperly over the course of a month. It connected to some services that ran up a $50k bill. I was mortified when my boss told me, thought for sure I'd be canned on the spot. I was only 22 and it was my first job out of college, so the amount was nearly double what I was being paid. The boss basically took the heat for not having explained it to me better, and I was not reprimanded in any way.

    1. Re:Improper use of systems by Anonymous Coward · · Score: 0

      That is a solid boss. Not too common to find bosses like that. Did you keep in touch?

    2. Re: Improper use of systems by Anonymous Coward · · Score: 0

      You had a good boss. Good shit shield.

  11. doing windows phone by Anonymous Coward · · Score: 0

    uncountable losses, real and opportunity

    1. Re: doing windows phone by Anonymous Coward · · Score: 0

      Hello, mr. Elop!

  12. around 800 bucks. by Anonymous Coward · · Score: 0

    I dropped a dime into old AT one time and it hit the controller for a propitiatory SCSI controller. It all worked out though. We replaced it with a 100 meg ide and everyone was happier.

  13. Well... by Jethro · · Score: 4, Interesting

    I don't know what monetary cost they assigned to this, but this is the one I got in the most trouble for.

    Frankly, it was something I got blamed for. I guess I can take partial responsibility. You guys tell me.

    I was the only UNIX guy at this place. We were moving our Main Internal Server to a newer machine. I had set up a cron job to rsync all user data nightly, so that when we transition over the rsync would be faster.

    So, the big day comes. I come in on a weekend, do the final rsync, change some DNS entries, shut down old machine, bring new machine up. No problem.

    Next day everyone is working happily, everything is working smoothly, no worries.

    Or so I thought. Turns out the main developer wanted something off the old server, so he turned it back on to copy his files... and then left it up.

    So, during the night, the thing automatically rsyncs and overwrites an entire day's work for about 80 people.

    Definitely partially my fault for not disabling the cron job, but I was the only one who got in any kind of trouble at all for this (to the extent of almost losing my job, and frankly that was the catalyst for me leaving that place).

    --


    In the land of the blind, the one-eyed man is kinky.
    1. Re:Well... by Anonymous Coward · · Score: 1

      I'd tell the developer to not touch the servers, and stick to his work. He wants to access the files? You start it up, and shut it down. In the loop at all times.

      Or was he someone that was supposed to have unrestricted access to any servers on the premiss?

    2. Re:Well... by drinkypoo · · Score: 5, Insightful

      Definitely partially my fault for not disabling the cron job,

      Or pulling the network cable. You have to plan for idiots, because there will be idiots. And odds are, they will outrank you.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    3. Re:Well... by Jethro · · Score: 1

      Good call... this was about 20 years ago, and it's not likely that I used rsync (not sure I knew how to do that back then).

      My memories of the event are not... perfect. But it's likely that I just used scp to dump entire directories. Couldn't have been using rsync because, as you say, it wouldn't have one as much damage.

      --


      In the land of the blind, the one-eyed man is kinky.
    4. Re:Well... by Jethro · · Score: 2

      They weren't supposed to, but the head developers were like gods at that place. They had the root passowrds and I wasn't allowed to restrict them in any way.

      It stemmed from them being among the original 10 people when the company started, and even though the place was now a 200+ employee organisation, in some ways they still ran it like 10-person operation.

      I did vocally complain about this. They quite often went in and overrode stuff I did.

      --


      In the land of the blind, the one-eyed man is kinky.
    5. Re:Well... by Jethro · · Score: 4, Interesting

      You know the old saying, "make something idiot-proof and someone will come up with a better idiot."

      They'd have plugged it back in. Again, the guy physically went into the server room and pushed a button.

      I certainly should've disabled the cron job or, better yet (as pointed out by AC down there) have known what rsync actually was and used that - I know I said I did in the original post but in retrospect I couldn't have as it wouldn't have overwritten everything. This was about 20 years ago...

      --


      In the land of the blind, the one-eyed man is kinky.
    6. Re:Well... by Anonymous Coward · · Score: 0

      Sry, I'd say mainly your fault... Put a landmine in the middle of the office and hope no-one steps on it...

    7. Re:Well... by Jethro · · Score: 1

      *laughs*

      This was 20 years ago, and in a company that still thought it was very small even though it was medium-sized. The devs ere gods. They outranked me in every way and had root access to all my servers.

      --


      In the land of the blind, the one-eyed man is kinky.
    8. Re:Well... by adolf · · Score: 1

      Everyone else has already told you what you did wrong 20 years ago. Here's my take: If you were actually rsync'ing all of the user data, then the developer wouldn't have known the difference and would never have had the inkling to turn the old machine back on.

    9. Re:Well... by Anonymous Coward · · Score: 0

      It is very hard to cast blame without knowing the specifics of how work was done at that place, how the servers were used, who were allowed to switch them on/off, what had been planned/communicated about the transition etc. It will also depend on what type of instructions the poster was given when he worked there, was he just carrying out instructions or did he have the full responsibility for the whole IT strategy and migration etc. Was there any review process of his decisions and work in place (or should there have been?). It is ultimately managements responsibility if an employee can single-handedly screw things up. But in many places this is a real possibility, but this is then a (hopefully calculated) risk taken by management.

    10. Re:Well... by Jethro · · Score: 1

      As I've mentioned, this was about 20 years ago, so I can't really remember it 100%.

      However, this was one of those shops that started with about 10 employees, and even though by then it was 200+, it still operated as if it was a small, small company. The head devs were part of the original 10, and they were like gods. They had full access to EVERYTHING. Including root access to all the servers. They were basically allowed to do whatever they wanted.

      If something went wrong where they and someone else was involved, it was never their fault.

      --


      In the land of the blind, the one-eyed man is kinky.
    11. Re:Well... by Jethro · · Score: 1

      I believe they were looking for old versions of some files, possibly from directories they never asked to be rsynced.

      And, again, 20 years ago. I have definitely learned my lessons AGES since then (:

      --


      In the land of the blind, the one-eyed man is kinky.
    12. Re:Well... by Kjella · · Score: 1

      Or pulling the network cable. You have to plan for idiots, because there will be idiots. And odds are, they will outrank you.

      Since this was a server unless he was at the console copying it off to a USB stick he'd probably hook the server back up to the network so he could copy it to his client.

      --
      Live today, because you never know what tomorrow brings
    13. Re:Well... by barc0001 · · Score: 1

      Uh... doesn't rsync have a flag to only sync files that are newer? If 80 people did their work and saved it on the new box, how did rsyncing their data from the old box overwrite newer files?

    14. Re:Well... by ArcadeMan · · Score: 1

      You have to plan for idiots, because there will be idiots. And odds are, they will outrank you.

      Is that a quote from somewhere? Who said that?

      In any case, I'm adding it to my list of quotes.

    15. Re: Well... by Jethro · · Score: 1

      Yup, and as I said to the other people who said that very sane thing "you are right, and I was likely wrong about using rsync. This was 20 years ago and I probably didn't know how to use rsync yet."

      --


      In the land of the blind, the one-eyed man is kinky.
    16. Re:Well... by Anonymous Coward · · Score: 0

      That was almost entirely your fault. Setting up a system that can wipe 80 man-days of work at the flip of a switch was not a smart thing to do. You didn't think it through, and probably caused a developer some stress. There's also a management problem in that the developer had access to that switch, but you knew your environment.

    17. Re:Well... by R3d+M3rcury · · Score: 4, Funny

      Can't speak for a cost, but I thought this one was funny...

      A company I used to work for used Lotus Notes. For some reason, and I don't remember exactly what the reason was, I set up my e-mail to copy my mail to another account. I think it was just a "hey, I can do this" thing, playing with the e-mail system. Unfortunately, I made a typo in the name of the account to forward to.

      When I came in the next morning, the e-mail system was running really slowly. Everyone was complaining about it. I logged into my e-mail and, low-and-behold, there's all sorts of e-mails in my account complaining about how it couldn't send this message to the other account and, of course, the contents of the e-mail was a message that it couldn't send this message to the other account, and the contents of that message was a complaint that...you get the idea.

      I turned off the script and deleted all the e-mails. And, suddenly, from the office next door, I hear, "Hey! E-mail is working again!"

      Shhhhh...

    18. Re:Well... by radarskiy · · Score: 1, Offtopic

      " And odds are, they will outrank you."

      No, the odd are they *are* you.

    19. Re:Well... by drinkypoo · · Score: 1

      Is that a quote from somewhere? Who said that?

      I'm pretty sure the last part is something I read someplace, if not verbatim then next door, and attached to a similar sentiment. There Will Be Idiots is my motto these days, so it crept in there. I can't find anything, either. Whatever it originally was, I probably read it here.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    20. Re: Well... by Anonymous Coward · · Score: 0

      Sadly some ticket systems do exactly that due to poor ticket watermarking and relying too much on just the subject line of an email chain.

      So some manager decides to cc a new support group address which silently autoreplies and may even taint the subject line with its cryptic ticket number. The original chain sees the message as new and whatever ticket system was already in play fires off a new ack email with its own ticket number. This is sticky because to date nobody has added a way to silently and permanently drop someone off a chain... random recepient worth an old version of the email will reply-all and and readd the bad address.

      The problem will keep going until at least one of the two ticket systems has blacklisted replies from the ticket system on the other end. A few hours later you have to clean up the thousands of new tickets... check that nothing new and important fell through the cracks among all the inbox noise. You also have to saw together the various chain forks as legit responses to the original issue land on various versions of the chain. I am lookinh at you, Kayako 4.

    21. Re:Well... by petermgreen · · Score: 1

      It does but it isn't always practical to use it.

      If all your users do is create and edit files then sure you can use the --update flag and omit the --delete flag making the rsync operation a lot safer.

      but if your users are more active that is not so practical. Assuming this storage is used as a work area by developers they are likely to be doing things like deleting files and sometimes even deleting files and replacing them with a copy of an older file (for example deleting a dirty copy of a source tree and replacing it with a clean one). So to copy all the changes you need to use rsync in a far more agressive mode without the --update flag and with the --delete flag.

      It was probablly a mistake to put the agressive rsync in a cronjob, it would almost certainly have sufficed to use a less agressive rsync in the cronjob and only use the agressive one manually for the final sync but I can see how someone inexperianced would fail to think of that.

      It was also of-course a mistake not to defuse the old server when decomissioning it. Ideally by BOTH disabling the cronjobs and disabling the credentials that allow the decomissioend server to talk to the active servers.

       

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    22. Re:Well... by JustOK · · Score: 1

      Groo sounds like a mendicant.

      --
      rewriting history since 2109
    23. Re:Well... by kilodelta · · Score: 1

      Developers are the bane of system administrators. I had one developer who hose the entire crontab not just on the box but the one in the backup too.

      Then there are stupid user tricks - like jamming an RJ-45 connector into an RJ-11 jack.

      But my best - I was administering a Data Genral MV9600U running AOS/VS II. They had previously been using async terminals but switched over to an IP stack and Pacer terminal on Macintoshes.

      So one day I'm cleaning out the old async cabling - no need for it anymore. Suddenly I hear the system console beeping - not a good sound to hear. I come around from the back of the system to see the system losing all it's data volumes. WTH! I look at the disk array and it's powered down. Cycle the switch, nothing.

      Of course my boss flies into the room, big old knot on his forehead. I trace the power cable from the disk array - it's a Hubbel connector and I guess when I was pulling one the async cables out it rotated the power plug just enough to break contact. Twisted it back in and the disks all came up. Rebooted the system and all was fine.

    24. Re:Well... by Jethro · · Score: 1

      > Then there are stupid user tricks - like jamming an RJ-45 connector into an RJ-11 jack.

      That's... actually impressive...

      --


      In the land of the blind, the one-eyed man is kinky.
    25. Re:Well... by PatientZero · · Score: 1

      I'm gonna have to go against the chorus and lay the blame at your feet, honestly. You left a booby trap for whoever rebooted that server at any point in the future. Had you removed the hard drive or put the machine into a donation pile, I could understand.

      Say you get hit by a bus the next week and they hire a new sysadmin. A few days later he's asked to setup a new service and decides to repurpose that unused server. He connects it to the network, boots it, installs updates and new software . . . and then gets pulled onto some other task that takes a day. That night disaster strikes. Is it his fault for not ensuring there were no dangerous cron jobs left on the machine?

      Perhaps, but it's much easier to disarm bombs you've designed rather than force the job onto some poor, unsuspecting sap. :)

      --
      Freedom to fear. Freedom from thought. Freedom to kill.
      I guess the War on Terror really is about freedom!
    26. Re:Well... by Jethro · · Score: 1

      > I'm gonna have to go against the chorus and lay the blame at your feet, honestly.

      That pretty much HAS been the chorus.

      And I never said I wasn't (at least) partially to blame - I definitely had a blind-spot.

      Also, had a new sysadmin been hired, he'd have no reason to turn the old machine on. Other people WERE aware of what was going on, including the people who would've trained a new guy. What's more, that machine was leased and would have been returned within a week or two, so he couldn't have repurposed it. And he couldn't have pulled any storage from it because that was part of the lease. And even if he could, it was a Sun box and the new one was an AIX box, so stuff wouldn't just run.

      And here's another thing... say the new guy gets hired, never touches that machine because he's been told it's being returned in a few days. And then one of the devs turns it on and a week's work gets erased. They would've still blamed the new sysadmin even though he had nothing to do with it.

      If you want funny, I actually knew the guy who ended up replacing me through a local Linux user's group. I know he was plagued by the same kind of crap. He tried to update the remote connections to use ssh rather than telnet and almost got fired for THAT.

      This was not a good SysAdmin environment.

      --


      In the land of the blind, the one-eyed man is kinky.
    27. Re: Well... by Anonymous Coward · · Score: 0

      Sorry about your meltdown, Knight.

    28. Re:Well... by well_in_theory · · Score: 1

      You rsync without --update (-u)?

      If you were expecting your post-transition rsync to be faster then I presume you were doing something like either --ignore-existing or --update, in which case the files wouldn't have been overwritten, right?

      What happened?

    29. Re:Well... by Jethro · · Score: 1

      Like I said in response to many... many other comments, this was about 20 years ago, and I'm likely wrong about using rsync - it's just that that's what I'd (obviously) use now. Chances are I just didn't know about it back then and was doing a straight scp dump.

      --


      In the land of the blind, the one-eyed man is kinky.
    30. Re:Well... by well_in_theory · · Score: 1

      for file in /* ; do scp local remote ; done

      Yeah, that one's going to bite you hard.

      The sad part is that you even remember this event 20 years later. I bet the guy who booted up the old machine to do what he wanted doesn't.

    31. Re:Well... by Jethro · · Score: 1

      I probably went "scp -r directory new_server:direcory/"

      Yeah, the devs had full run of that place. In fact, a LOT of people at that place had root access. To the point where I would go around the place when I'd stay late and pull the damn post-its with the root password off peoples' cube walls.

      And yeah, I remember it. Because, first, it WAS a defining moment. And second, I kinda remember a LOT of stuff.

      --


      In the land of the blind, the one-eyed man is kinky.
  14. Patent filing missed. by Elf+M.+Sternberg · · Score: 2

    In 1993, I failed to file the US Patent on "A means of accessing a relational database via the Internet." If we'd known we could do it, CompuServe might still be around.

    1. Re: Patent filing missed. by Anonymous Coward · · Score: 1

      And that would have been a bullshit patent, Elf.

    2. Re: Patent filing missed. by TheReaperD · · Score: 1

      Yea, it would have been a bullsit patent. Could have still made millions from it though. Sadly, that's how our patent system works(?).

      --
      "Be particularly skeptical when presented with evidence confirming what you already believe." -
    3. Re: Patent filing missed. by Elf+M.+Sternberg · · Score: 3, Interesting

      No kidding. I'm glad we didn't. It means I can look at myself in the mirror. Career-wise, I've done okay without it. But it would have been a completely legal patent through which CI$ would have raked in millions and mililons of dollars. And, as far as I can determine, it would have been completely legal. There was no MySQL, no Postgres; OraPerl had *just* been released and was barely stable on SunOS, and there were no known instances of a CGI / OraPerl gateway on the Internet until Pacific Power & Light asked us if it was possible to connect their consumer-oriented energy savings database to that new thing called "the world wide web."

  15. Around $2 Trillion by Anonymous Coward · · Score: 1, Funny

    About $2 Trillion.

    I worked for the Florida Electoral Commission back around 2000.

  16. Is it purly your mistake. by jellomizer · · Score: 1

    I have been part of of a large mistake costing hundreds of thousands of dollars.
    However most mistakes are part of a chain of events of little mistakes, where they all combine to a big mistake. For example, if someone happen to trip over a plug that unplugged a production server. Then questions on why was the cable was out where it can be tripped, who decided that it wasn't worth the money to put time, to get a better system of cable management...

    Normally a person will get fired for a mistake if it was due to intentional misconduct or it happens to get political and needs someone to blame, however if it happens you need to be sure that you put the blame back on the system (not an individual), then you will need to follow up to fix the system so it doesn't happen again.

    Most of the most expensive mistakes, are often due to a huge chain of events. A good system should be in place to stop a simple mistake from escalate into big ones.

    --
    If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    1. Re:Is it purly your mistake. by Anonymous Coward · · Score: 0

      The way it's supposed to work -

      Every incident has an investigation meeting. If a problem was not wilful, it is recorded and a recommendation on avoidance (or acceptance that this is luck and within the risk profile of the application we' running). If there are any negative findings about individuals, it's included as monitoring on their performance and therefore private to their boss. If the manager of a team accepts team responsibility he can provide a process or monitor improvement to mitigate.

      What really happens - senior managers who should know better ask for a head.

  17. $480 Phone bill by Anonymous Coward · · Score: 2, Funny

    When I was 12 years old and hanging out on BBSs in 1989, I didn't realize dialing Gilroy from San Jose was long distance (Both were 408 area code). My parents were not pleased at the nearly $500 phone bill.

    1. Re:$480 Phone bill by ITRambo · · Score: 1

      At least your $500 downloaded porn was virus free back in the day.

    2. Re:$480 Phone bill by Anonymous Coward · · Score: 0

      Ditto, dialing Sydney from Melbourne back in the early 90s. The bill was roughly $400.

    3. Re:$480 Phone bill by Anonymous Coward · · Score: 0

      Beat ya! There was a BBS in Darwin and another in Sydney - both were STD phone calls. I got a bill for $2k in the early 80s.

  18. Well... by Anonymous Coward · · Score: 1

    As High Proctor of Fahz, I once led my whole species into unrelenting suicidal despair when during the Chinz-Rahl celebration I passed our Ultron onto Chief Groo, who was not prepared to hold such a heavy object and dropped it.

    My Mask of Ultimate Embarrassment and Shame is not enough to express the deep chasm of depression into which I sink.

  19. Tech mistake by hcs_$reboot · · Score: 1

    I maneuvered downward the left button of the mouse attached to the computer I was working on which pointer was right on a small gif saying "Send" that technically sent a message I should never have sent. Cost me a lot.

    --
    Slashdot, fix the reply notifications... You won't get away with it...
  20. Whole computer. by o_ferguson · · Score: 1

    Not me, but a friend. In high school the best computer in the school was a 386SX. They decided to upgrade it to a DX by adding a maths co-processor to the main board. So the ordered one, and when it arrived, they gave it to my friend to install for some reason. Now, the chip had one corner cut, which you are supposed to line up with the cut corner on the socket, so you know it's seated the right way. Of course, my friend put it in completely backwards (because it fit an any direction.) So he tries to boot up the computer and nothing happens. So he looks at it again, and realizes the chip is in backwards. So he turns the box off, pulls out the co-processor, rotates it 180 degrees and puts it back in the socket. Unfortunately, misfiring it in the wrong direction had toasted the chip completely, and when he put it into the socket in the correct orientation, the socket locked itself shut, as it's supposed to do. But, since the chip was fried, this effectively locked the motherboard in an unbootable configuration with a dead shop. Sigh.

    --
    - In Soviet Korea, only old people loose all their bases to Natalie Portman's petrified hot grits overlords.
    1. Re:Whole computer. by tehcyder · · Score: 1

      Since when did schools let their pupils perform hardware upgrades?

      --
      To have a right to do a thing is not at all the same as to be right in doing it
    2. Re:Whole computer. by o_ferguson · · Score: 1

      Right? He was in a special needs class, and I think they thought it would make him feel good about himself (Which it totally didn't.) You have to remember that this was an era when they had only one computer tech for the whole 1500 person school, and he was also a shop/electronics teacher, but there were tons of kids runniung around who knew a lot about computers.

      --
      - In Soviet Korea, only old people loose all their bases to Natalie Portman's petrified hot grits overlords.
  21. $40k SGS by gpmidi · · Score: 1

    Dropped and broke a $40k USD Symantec Gateway Security Appliance

    1. Re:$40k SGS by Anonymous Coward · · Score: 0

      You did the company a favor.

    2. Re:$40k SGS by Anonymous Coward · · Score: 0

      I know of the SGS you're talking about. Grrrr.... The corner was all bent up. Damn you guys! We thought it was just poorly packaged. Hah! You didn't think you'd meet the guy you pissed off on here??

    3. Re:$40k SGS by gpmidi · · Score: 1

      Can't say that I did. Lol. Suppose I shouldn't be surprised though.

    4. Re:$40k SGS by gpmidi · · Score: 1

      As someone who worked with the damn things way more than I care to admit, yes, you're 100% right.

    5. Re:$40k SGS by KGIII · · Score: 1

      I hope you capitalized on it by setting it alight and dancing naked around the blaze. It is the only correct thing to do at that point.

      --
      "So long and thanks for all the fish."
    6. Re:$40k SGS by gpmidi · · Score: 1

      We did end up burying one when we finally stopped supporting them.

    7. Re:$40k SGS by KGIII · · Score: 1

      Definitely close enough. However, immediately setting it ablaze in the workplace would have made a much more interesting story. I suppose you have more of a point on this planet than doing things for my amusement though. It would have made a hell of a funny story. Burying it is pretty good as well.

      --
      "So long and thanks for all the fish."
  22. $10k. .... per day by Anonymous Coward · · Score: 2

    I made a calculation error that cost $10k per day. Took 9 months to straighten things out.

    I later won an award for outstanding work.

    1. Re:$10k. .... per day by binarylarry · · Score: 1

      Oh the joys of working at IBM.

      --
      Mod me down, my New Earth Global Warmingist friends!
    2. Re:$10k. .... per day by yakumo.unr · · Score: 1

      I guess you're a banker..

  23. Software bugs by nodan · · Score: 2

    Some bugs I've been responsible for, although it's hard to tell exactly what they did cost:
    - rounding error when programming a timer in an embedded system, resulting in a baud rate to be 10% off, causing problems with several units shipped to customers
    - overflow of an 8-bit counter, resulting in a serial protocol failing

    Plus tons of other errors I forgot or haven't been aware of. Total damage for sure thousands of Euros. However, that's probably little for a 25+ years career mostly in software development.

  24. Most expensive mistake ever. by Anonymous Coward · · Score: 0

    I failed to found Facebook before Zuckerberg did. Cost me billions.

    1. Re:Most expensive mistake ever. by SharpFang · · Score: 1

      Do you happen to work for RIAA? They tend to sue people for causing them losses like these.

      --
      45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
  25. A Photographic Slide by trabby · · Score: 2

    Lost a slide for 3rd party client that was to be featured in a skateboarding magazine.
    I think one of the coworkers stole it as I did not get along with them.

    Insurance claims for that kind of thing can involve the cost of setting up the shoot again, whatever that entails.
    Was fired not long after.

  26. About $2M -- But not really a mistake... by jnaujok · · Score: 4, Interesting

    Our group at FedEx released code that I wrote on a Saturday night. This was two days before the Apple iPhone 4 shipped. The code worked perfectly, however, despite our repeated warnings about nearly doubling downstream traffic, the downstream systems (like billing and tracking) weren't ready for it.

    So, on the day everyone wanted to track their new iPhone, my code shut down all tracking on FedEx for about 12 hours before we could switch the config setting (10 minutes) and the downstream systems could catch up (11+ hours).

    Estimate of cost was around $2 million in lost time and revenue and extra calls to customer service. Luckily, since I wasn't actually at fault, and we had multiple email chains backing up the volume estimates and warnings, we didn't get the axe.

    --
    Life, the Universe, and Everything... in my image.
    1. Re:About $2M -- But not really a mistake... by radarskiy · · Score: 0

      "since I wasn't actually at fault"

      You a) released code to production on a Saturday night, b) doubled traffic, and c) issued an ultimatum to infrastructure teams instead of *negotiating* an SLA change.

      Why do you think none of the fault lies with you?

    2. Re:About $2M -- But not really a mistake... by Tablizer · · Score: 3, Informative

      The poster was not the boss. The boss calls the final shots. The technician's job is to present the risks (trade-offs) as accurately and clearly as possible. If the boss(es) then choose to ignore the risk warnings, the blame falls on them. If you usurp their power, you are out the door (unless it's a legal matter).

      Incidentally, I was in a somewhat similar situation where marketing planned to release about 30 websites for satellite offices all at once along with a press release about the new sites. I pointed out our "budget-oriented" infrastructure may not be able to handle such a sudden load, and suggested staggering the releases. Other technicians agreed with my warning, but the marketing chief was really disappointed, saying something like, "It's better P/R to have one big release. Staggering the releases takes the punch out of it."

      I was tempted to respond, "30 crashed sites is not good P/R either", but smartly bit my tongue (based on prior experience with "reality" statements). He was a true P-H-B, always looking for a cheap short-sighted shortcut, but tried to blame us when his paper tigers got eaten. He drove one guy to retire early. Later he was under investigation for giving contracts to his buddies instead of basing them on merit. Not surprising, his buddies were also idiots.

  27. Two incidents... by Anonymous Coward · · Score: 1

    First one, I was lucky... there wasn't a switchover to a new database yet, and I made sure to schedule a large downtime window, because I try to do like Scotty... take the time I think will fix something at the worst, then double it. If the PHB gripes, start into detail. A side effect is that users tend to be happy when stuff is back up earlier than planned.

    Well, this was a two node HA cluster back in the day where a certain vendor had a passive node and an active node configuration selling for an insane amount. They were connected via serial connections for heartbeats.

    Well, it was time to do a simple update of the machines. I staked out 24 hours, just because I wanted to do backups first.

    Well, I did the sysbacks, so I had two tapes of the entire boxes.

    Ran one set of updates on both machines, rebooted... all fine. Noticed there was a drive array microcode update... just a 0.0.x update. Well, I tossed that on and rebooted... Well, both boxes blew their kernels. All the data on their drives was gone, because the microcode patch got the array in such a state that one machine started writing garbage to all drives.

    At least I was able to restore both machines and build the shared data.from the tapes.

    The second one would have been just as bad. I was cleaning out source code tree of .o files and executables... came to found one dev had libraries that were only present in binary only format, and whose only backup was in the tree (where the backup program excluded all binaries for space sake.) Thankfully, the tree was on a NetApp, and a simple copy from a snapshot fixed everything. Were it on another server, I'd have Hell to pay.

  28. Fried an early... by michael_cain · · Score: 2

    digital signal processing chip from TI. The $750 (in 1986 dollars) wasn't the big deal. That the parts had serial numbers hand-lettered on them and I had to go back on the waiting list to get a replacement was.

  29. $40,000 - $60,000 by GovCheese · · Score: 1

    A long time ago on mainframes. IBM 3083's and VAX's. I was running analysis on some waveform data, took probably about 20 reels of mag tape. Fucking marine seismic data. I sent the big deck of cards down to the floor on a Friday. 1st thing Monday, I had to go the VP's office. He explained that Monday morning, the fucking job was still running. Turns out, instead of sampling the data every 4ms, I accidentally sampled it every 2ms. Back then, you didn't own your mainframes, IBM leased it to you. The VP explained that I cost the company anywhere from $40-60k. Nice guy actually. Texas engineer, cowboy boots and a suit. He politely asked me, "Son, you probably won't be making this mistake again, will you?" I stuck around for another couple of years. Goddamn it took an army to process data back then.

    --
    "He's using a quantum encryption scheme! That'll take hours to break!"
    1. Re:$40,000 - $60,000 by dbIII · · Score: 1

      Funny thing is today someone is probably reprocessing the data from the area next door at 2ms and happy they don't need to redo your stuff. There is a lot of reprocessing of old data going on and some of it is even off the original reels because nobody has format shifted it.
      Interesting how seismic data from the 1970s can be read with current software by MS Office documents only a few versions back have problems.

    2. Re:$40,000 - $60,000 by Anonymous Coward · · Score: 0

      The punch card girls finished a massive database and everything was loaded up and transferred to tape.
      My job was to check the data on a terminal referring to the original forms as verification. Everything was ok and verified, then I inadvertently entered the code to end the session.
      The 2 days later I was interviewed as the tapes were blank and all data lost. Evidently I sent the delete code instead of the end code.

    3. Re:$40,000 - $60,000 by NicBenjamin · · Score: 1

      Tells you a lot about the design goals of the people who make the program.

      MS wants to sell you a new version of Office, so the file format is always in flux and you buddy with a brand new machine makes documents you can't read until you upgrade.

      Siesmologists need to do really long term studies, so they wouldn't even consider making a program that couldn't read the old format perfectly, and they'd probably stubbornly resist a new data format even if it was a good idea.

    4. Re:$40,000 - $60,000 by dbIII · · Score: 1

      You again? First point, yes, but you are incorrect with the second point - it's all about published standards to get things done (eg. SEGD) and new standards DO come up all the time and they are not "stubbornly resisted" because they are ALSO published standards and can be easily included in the software along with the old formats.

    5. Re:$40,000 - $60,000 by KGIII · · Score: 1

      Somewhere on this planet there needs to be a "Greybeard Bar & Grill." Unfortunately, that place would probably end up being somewhere in Silicon Valley.

      --
      "So long and thanks for all the fish."
  30. Didn't break but helped to fix... by Anonymous Coward · · Score: 0

    ...a VERY large (but nameless here) grocery chain here in the US after an EMC engineer decided it was perfectly fine to stick his hand inside the array that supported ALL the chains warehouses WITHOUT an anti-static wristband.

    One 36 hour conference call later and we were all finally back online. I've no clue what the overall cost was but it was measured in not only in hardware and manpower, but lost sales as NONE of the 2,000+ stores could be resupplied while the requisite warehouses were down.

    And yes, this was a MAJOR chain here but many many years ago.

    1. Re:Didn't break but helped to fix... by KGIII · · Score: 1

      I only know of two such instances where this happened or something similar happened. One was only about five years ago and the other was longer - it made the news. Assuming it was the latter then that grocery store chain either begins with an S or a K? I can not recall which one it is but I do recall hearing about a computer mishap that took out warehouse access for a major grocery chain. The more recent one was due to a malware infection that spread across their network (as I recall) and its primary goal had been collecting credit card data but it had spread much further. That one was covered in eWeek and noted, by me, simply due to its proximity to me.

      --
      "So long and thanks for all the fish."
    2. Re:Didn't break but helped to fix... by Anonymous Coward · · Score: 0

      Assuming it was the latter then that grocery store chain either begins with an S or a K?

      You may think that but I couldn't possibly comment... :-)

    3. Re:Didn't break but helped to fix... by KGIII · · Score: 1

      It is all good. I can not blame you for not commenting. You may well still work there or still be covered by some sort of contract such as an NDA. I wouldn't recommend violating any such things - a job is not worth losing for idle banter with random pixels nor are said random pixels worth a court case.

      --
      "So long and thanks for all the fish."
  31. Lost opportunity by Anonymous Coward · · Score: 1

    Long before Amazon was ever more than a bookseller in the mid 1990s, a friend and I had this idea of a website that would allow for comparison shopping pulling data from other sites allowing folk to buy the cheapest electrical items possible

    We never progressed because we couldn't see any way for it to make money. We had no idea that was the absolute last thing we should have cared about.

    So now I'm here, an anonymous coward posting about our total lack of foresight and imagination, and not some rich fecker who owns real-estate like /Slashdot

    1. Re: Lost opportunity by Anonymous Coward · · Score: 0

      Is your first name Siggy by chance?

  32. Took an online trading company offline for a day by Nonesuch · · Score: 4, Interesting

    I was hired as a firewall admin at an online trading company, then quickly discovered the director of IT was insane, but kept management happy because he made his numbers by keeping his team constantly understaffed; I was told to work on not just servers, but installing Sun servers in racks, running cable, and fixing just about anything plugged into the network.

    I made the mistake of showing competence in networking, so was asked to "expand my role" (new title, same salary), and start working on the switches themselves, including executing an "upgrade" to stacked HP ProCurve switches with VLANs (replacing a hodge-podge of random manufacturer switches). The actual upgrade went fine, basic testing (ping) showed everything stable, but as soon as trading opened the next day, everything went to hell, performance dropped through the floor and customers started calling in about trades timing out. Long story short, turned out that Solaris HME cards were unable to negotiate properly with ProCurve switches, half the machines were dropping packets due to duplex mismatches. There's a reason people call the Sun interface cards "Happy Meal Ethernet"

    Cost the company approximately $180,000 in direct and customer exodus losses, and was likely a factor in their eventual collapse. I wasn't fired, but management never trusted me again so I saw the writing on the wall, and quit to do consulting work at a (also doomed) dot-com online supermarket.

    On the upside, I was able to make thousands in consulting income from installing those same "lock speed to 100 and duplex to full" Solaris scripts on servers for various customers who also had performance issues plugging in Sun servers to cheap switches.

  33. coleco vision by known_coward_69 · · Score: 1

    i used to insert the cartridges too hard and broke it to the point where i had to spend 15 minutes playing with it every time i wanted to play a game

  34. I killed three networks, but that was planned. by swschrad · · Score: 2

    obsolescence, I got the task to shut 'em down. I also forced a worldwide recall of PC card disk drives in the switches that were the backbone of the Internet when we kept the vendor engineering on the phone all day for a failed switch... and read the duty cycle of the drives to them, like 5 minutes a shot, 10 minutes an hour, when they were running read/write continuously.

    but I got a haircut indeed when we had to get out stuff out of a colocate that was shutting down. built a mirror data system for that in the new place, had the trunks up, costed over the traffic. then it was time to demanage and power down the old shelf. telcordia assigned a code to the new unit that was one letter different than the old one.

    the good news is I got the new one back up in 20 minutes and they didn't stake me out over an anthill.

    --
    if this is supposed to be a new economy, how come they still want my old fashioned money?
  35. My $5 million bug by llib_xoc · · Score: 2

    We were writing a Unix program to parse transactions from some specialized terminals that read customer invoices and the checks that accompanied them, writing the transactions to digital tape to carry over to the mainframe system. During testing our tapes were compared to tapes generated by the legacy IBM system. Our team lead got a call from the customer liaison *early* on morning saying "Do you realize one of your batches was 5 MILLION DOLLARS SHORT - yes, she was shouting. Turns out that the $5 million transaction was the largest we'd ever tested with so far. All others were less than $999,999. It was my bug - I'd put the sign nybl (half a byte) on top of the most-significant digit of the packed-decimal payment-amount field on the test tape, dropping that digit from the field. Trivial fix - I had just been auditing the relevant code the previous day.

    1. Re:My $5 million bug by baegucb · · Score: 1

      If this was in the 1970s and involved the International Travel Association iirc, I was probably the person who discovered this.

  36. I wonder... by waspleg · · Score: 4, Insightful

    How many people will refrain from posting because the statute of limitations hasn't run out yet?

    1. Re: I wonder... by Anonymous Coward · · Score: 0

      A lot.

    2. Re:I wonder... by Anonymous Coward · · Score: 0

      :raises hand:

    3. Re:I wonder... by dcollins117 · · Score: 4, Interesting

      How many people will refrain from posting because the statute of limitations hasn't run out yet?

      Well, I'm certainly not going to admit to the most costly mistake as it appears no one realizes it was me and what I had done. So I'm not gonna do it; wouldn't be prudent.

      The most embarrassing mistake was I inadvertently brought down the clients' network (a major hospital) during the middle of the day. Didn't realize what I had done until about three minutes later when about a dozen IT guys flooded the computer room paying particular attention to the area I was just working in. It appears I made an error. To this day I am likely persona non grata in that computer room.

    4. Re:I wonder... by Anonymous Coward · · Score: 0

      Here I have about a year or two to go

    5. Re:I wonder... by Anonymous Coward · · Score: 0

      I'm fairly certain I don't have to worry about that, so let me give you a few from over the last decade that I ran into (not strictly my own work/fault).

      The first one was while I got just out of college, a young electrical engineer that they somehow put in charge of doing some maintenance work at an external facility. We go there in our van, with about 400 000 USD of test and measurement equipment in the back. (This is normally the part with a 24 year old shouldn't be driving I suppose.) We manage to get there without any accidents none the less. So we enter the production building and notice the elevator is out of service, so we start dragging everything up there by hand. We noticed that the facility's cabling and piping was a bit of a minefield on the floor, so we grab a few lights and plug them in everywhere thinking: problem solved. Then a rather clumsy colleague fell over a pipe, and managed to break the cable running between a clean room's A/C system and its PLC controllers by dropping a heavy steel box on said cable and on his feet. (We're still wondering to this day how he managed that one.) An ambulance and a quick fix by yours truly the estimated cost was around 40 000 USD in lost time and about 10 000 USD more to get everything fixed (colleagues toe's included). New guideline: Take a health and safety person along to every new site.

      A year or two later I was working for another company. I had to service a low power radar system which had been acting up, my background being high frequency electronics this was a job I was looking forward to. Straight away we noticed the issue was a noise source in the receive chain, so we go to our truck and grab our rather expensive spectrum analyser. I go to the adjacent building, uncouple everything of the transmitter to avoid blowing up our equipment. We climb the tower and hook it up to the receiver. I forgot the schematics in the adjacent building, so I send the technician in training accompanying me to grab them while I started tracing down the source of the noise with a can of freeze spray. (Turned out to be a small $200 amplifier that was causing the issue.) He calls me over the radio that he probably found the issue, "some idiot unplugged everything". Before I had the chance to tell him not to, he managed to get the transmitter going again, resulting in one very fried spectrum analyser and a slightly microwaved electrical engineer rapidly descending down a ladder to shout at someone. Total cost ranged somewhere past $100 000, luckily no-one got fired and we managed to fix it using the old machine we still had in the truck as backup.

      Later at the same company I finally managed to destroy something myself, while working on a RIE etching machine's driver board I managed to kill an oscilloscope of $30 000 because the differential probe I was using wasn't rated for the voltage I tried to probe. They never really made a problem about that one for some reason, mostly because they managed to charge it to the customer who was happy someone was still willing to fix the machine.

      Being sick of doing repairs I switched to digital hardware design in VHDL a couple of years after that incident. We were drawing up an ASIC for "some security application" (not allowed to go into more detail), we had to finish layout and send it to the fab by the end of the week. So we cut a few corners during the DRC because insufficient time was available to run the full sets of tests. We sent the data to the fab and got back our chips a while later, instantly when we looked on the probing station we noticed something was very very wrong. Turns out that we managed to mix up metal layers on the IC while talking to the folks from the fab. Against all astronomical odds this faulty build-up managed to pass every sort of DRC the fab ran on the design files they received. Other than missing a deadline, and costing us a huge sum of money (couple of hundred thousand) everyone had a good laugh about it, we switched the metal layers to their correct positions, and we went into product

    6. Re:I wonder... by AK+Marc · · Score: 1

      The only mistake I made that cost money, nobody ever knew about. Had to put $5k on a personal credit card to re-buy an ISDN card. I was out of the country doing an install (I ordered the gear) and found out the hard way that the world is not uniform in ISDN standard. S/T vs U. Oops. Buy card locally, expense card. Leave both cards installed in router. Nobody noticed or cared enough to ever say anything about it. Relatively minor, but was a direct cost to a mistake.

  37. 2400 (thanks HP) by Anonymous Coward · · Score: 0

    My biggest mistake was buying a hp 4020i cd burner that was so flawed i understand americans received compensation for it. Traditionally a big fuck you went to europeans

    1. Re:2400 (thanks HP) by Pubstar · · Score: 1

      Cant be worse than the Kenwood TrueX DVD-ROM drives. Those things were fast as hell, but notorious for dying.

  38. Click of death by Wowsers · · Score: 4, Interesting

    My worst IT disaster was suffering from a hard drive failure, click of death. I had warning of a few days of it, and I deliberately kept the pc on 24/7 instead of normal switch on/off, to make sure the drive stayed alive until its replacement arrived.

    Obviously I had to turn the pc off to change the drive, it was not hot-swapable. When I powerd the pc up, the old hard drive failed, didn't work at all. I was faced with losing all the data on it. I left the drive alone for months wondering what to do, reading different ideas online, some of them weird.

    Eventually I decided to try the least distructive idea first. I put a sheet of paper on the failed drive to make sure the label doesn't come off, and heated up the clothes iron, then applied the iron directly onto the top of the hard drive. When the drive casing was wam enough (not so hot as to make it hard to carry), I took it to my pc, and powered up.

    The failed hard drive came to life, and I managed to grab all the files on it onto the new hard drive, uncorrupted.

    Out of interest, the failed drive failed about three months before I do forced drive change as a backup / failure prevention. I got lucky.

    --
    Take Nobody's Word For It.
    1. Re:Click of death by BlackPignouf · · Score: 2

      Wait, what?

    2. Re:Click of death by Anonymous Coward · · Score: 2, Informative

      Heating it up causes the metal to expand which can unjam a stuck head in some circumstances.

    3. Re:Click of death by Anonymous Coward · · Score: 1

      I just had a laptop with a dying drive, needed payroll data off it brought it home power up nothing happens, gave er a good whack right above the drive she hummed to life and i managed to copy... we'll see how the restore goes

    4. Re:Click of death by Anonymous Coward · · Score: 0

      He smoothed out the bad bits with a clothes iron.

  39. Not sure how much $$$ by minimum · · Score: 2

    I used to work as a SDH/DWDM admin. In early 2000's, while my colleague screwed up a major firmware update on a STM1/4 ADM and I as senior (haha - I was in my 1st half of 20ies) admin had to drive up to site (since the affected node was unresponsive to management system). After many unsuccessful attempts to recover it, at about 3 am. I decided to hard reboot the node, which caused it to boot up from corrupt firmware bank (it had two of those); which in turn just erased all the configuration, including traffic connections (which is built very robust btw). Since the site was on a (relatively small) island and had only 2 ADM's at the time, I more or less cut off the entire communication with mainland. For morning, I had managed to get my colleagues to ferry me another, fully fitted ADM (our last resort backup scenario was to replace entire node) - but as it turned out, it was in a hurry fitted with cards with different firmware (entire network was in middle of upgrade process) which resulted in same kind of useless "brick" I had already at hand. Although it was very cool to fly ~200km/h to port and back in my sporty car, to pick up the spare (not many police on the island and I had a very good excuse). By the afternoon, my higher-up manager had mobilized a helicopter to personally deliver me fully functional ADM, which we promptly replaced and restored configuration from backup. I still have copy of the local newspapers front page, praising how our company heroically saved the day to restore connection with outer world.
    At that time I was already able to make up excuses that would have made BOFH proud, which saved my ass.

  40. Other way round for me by Anonymous Coward · · Score: 2, Interesting

    I let a vendor sell me a product without really testing it. Turns out it didn't work (at all) and we lost €50k on license fees for a product we could not use.

    I was able to lay the blame on an accountant who had locked us into a 5-year contract in exchange for a minor discount. So I didn't get fired.

  41. F-16 panel flew off in flight by YrWrstNtmr · · Score: 4, Interesting

    Some other fool did not install the panel properly, and left one of the three nuts off. Distinctive nuts, used in only one place.
    Someone found it overnight, and held it up at the morning meeting. "Anyone know where this goes?" Unfortunately, I did not recognize it as a part one of my systems.

    Aircraft flew, panel breaks off, punching several other holes in the side as it departs.
    Training mission aborted. much sheet metal work needed.

    Actual repair cost? Unknown, but easily 5 figures if not more.

    1. Re:F-16 panel flew off in flight by Tablizer · · Score: 1

      Was it too late to re-inspect when mentioned in the meeting? You perhaps could have said, "I don't recognize that nut, but I'm willing to go in and look around."

    2. Re:F-16 panel flew off in flight by Tablizer · · Score: 1

      Look on the bright side: if it were an F-35, the panel would have 13 nuts instead of 3, and all be different.

      http://tech.slashdot.org/story...

    3. Re:F-16 panel flew off in flight by Anonymous Coward · · Score: 0

      AC for obvious reasons, but military leaves room for some costly mistakes. Mine was in international headlines and required a visit from SecState plus a call from POTUS to smooth over.

  42. Power cable mistake by Anonymous Coward · · Score: 2, Interesting

    Working for a desktop publishing house in it. Spent just under $4000 on 36 inch flat panel displays. Accidentally plugged in printer power cable. Immediately fried monitor. My boss was not happy. The internship did not go well the rest of. The summer.

  43. McAfee -$12.6K by Anonymous Coward · · Score: 1

    McAfee on a mass spectrometer data acquisition system. System control would be periodically lost. Cost over $12.6K in lost instrument time and labour to determine that McAfee was blocking serial comms to the instrument (but only when it felt like it).

    Lesson learned: never run McAfee or Norton on a mission-critical data system.

  44. 2 million by krray · · Score: 1

    I let a upgrade bug slip by me during a software upgrade for the accounting software. In retrospect it should have been caught before it got out of hand. It got out of hand in about 3-4 seconds and had a cascading effect bringing down the whole datacenter for the company.

    It happened when a "guaranteed" bid was due for a 2 million dollar job. We had nothing. Not so guaranteed...

    Fortunately (?) I had a ownership stake in the company; so I also screwed myself too. Figuring ~12% profit on the job was typical and 10% of that was mine ... it cost me personally over $20K on that mistake.

    Ooops.

  45. ~$60k by fox1324 · · Score: 2

    I was working as a Jr. Network admin, helping to install some new cisco PoE switches to facilitate our building's move to VoIP phones. I aligned a brand new 48-port poe switch slightly off when inserting it into the chassis, and bent the insanely-complex connector at the back of the card, rendering it unusable. Fortunately, we had a ridiculous service agreement with cisco, and a new card arrived at our office within 4 hours. I distinctly remember buying burritos and beer for me and the Sr. admin to help make up for the fact that neither of us got to sleep that night.

  46. moved a computer and swapped power cables by Anonymous Coward · · Score: 0

    Two wall warts used on this computer were had the same size and shape plug, but very different voltages. I did not know this, and I put the wrong one into the powered usb hub. The computer had a ton of USB items on it, slide scanners, wireless keyboards (early days) flatbed scanners, a really nice giant ink jet printer, a 11x17 laser printer, serial interfaces, I could go on. Shorted the hub, and fried everything plugged into it. Thankfully I did this before I plugged it all into the computer, the smell of smoke was the dead giveaway. About $1500 worth of damage.

  47. NASA ouchie by CrudPuppy · · Score: 1

    I was on the NASA Genesis price team. Only a few hundred million lost on that one when it crashed into Earth...

    --
    A year spent in artificial intelligence is enough to make one believe in God.
    1. Re:NASA ouchie by Anonymous Coward · · Score: 0

      Feh. I was on the software team that implemented "throttle up" for Challenger.

  48. Worst was not a technical mistake by Anonymous Coward · · Score: 0

    It was publishing it on Slashdot and costing the company a lot of canceled orders the next month as word went around.

  49. 8*200*7 by Anonymous Coward · · Score: 0

    I printed out an older draft of an IF-RF board spec. We were developing a high data rate fixed point to multi-point RF comms system. We hooked up the bench power supplies, set the voltages right and the IF CPU did stupid stuff. 8 engineers spent 7 days at 200 UK pounds per hour, trying to fix it, combing over every detail.
    I printed out the up to date spec and saw the CPU rail spec had changes to be 0.5 volts higher. We (myself and one other engineer) turned the knob and the system sprang to life. We agreed to claim victory and not go into to many details about what was wrong.

  50. Just my time by corychristison · · Score: 2

    Six or so years ago I was using a (fairly cheap) Virtual Private Server as a dev/testing box for a pet project of mine.

    The VPS company was bought by a larger company, and prices were to double on the next billing period. I hastily chose a new provider without doing any research. I paid for 3 months of service in advance, got the container set up the way I like, migrated all of my data over, and was up and running.

    2 months in the new provider vanished, along with all of my data. I wasn't very concerned about the months worth of money I had lost by not getting the 3 months I had paid for, I think it was only about $15. "Okay," I thought. I'll just pull my data out of my nightly backups and move on. It turns out I forgot to adjust my local cron script that pulled the data over rsync to the new IP address. My backups had not been pulled in over 2 months.

    Luckily it wasn't very important, as it didn't make me any month and was mostly just for fun. I ended up starting over from scratch and ended up with a better system anyway.

    I learned my lesson, though.

  51. $1 BILLION DOLLARS (Puts pinkie to mouth) by perry64 · · Score: 1

    Not me, but my thesis adviser became the Technical Director for JSIMS, which ran through +/- $1B before the pentagon pulled the plug. He is not shy about mentioning that fact.

    http://www.nationaldefensemaga...

  52. Biggest tech mistake by rossdee · · Score: 0

    In 1985 I bought a "Fat" Mac for NZ$10K
    (The kiwi $ was worth about 44cents US at the time)

    1. Re:Biggest tech mistake by Anonymous Coward · · Score: 0

      In 1991, my desktop cost $5K US... screaming 12MHz Turbo 386 with a bitchin' 15" color monitor. 2 months salary that was - also wasn't a mistake, easily increased my productivity $10K over a few months.

    2. Re:Biggest tech mistake by KGIII · · Score: 1

      I spent about 32,000 USD upgrading to CD-Rs in ca. 1995. The worst part is that only covered eight of the computers in the office. At the end of the year there was an offering from HP that was under 1,000 USD. By the following summer they were half that. At the end of that year they were half again. Then, not more than a year and a half after that I could find SCSI CD-Rs for near 125 USD. Blank CDs were something like eight bucks when you bought in bulk... My mistake was adopting the tech that early. We were using large data sets (for the time) and the idea was portability. It worked, it *sort of* paid for itself. It would have paid much nicer to wait. I can not say that it lost us money but I can say it sure as hell did not make us any.

      --
      "So long and thanks for all the fish."
  53. Re:Took an online trading company offline for a da by Anonymous Coward · · Score: 1

    Oh yea, the "HME lock speed and duplex to full scripts". New some admins at a financial services company that didn't remember to run that on building the servers. Servers made it through testing, got turned on in production. The next day was ugly until we looked at the change management book (was really a paper book) and saw the new servers. 5 ethernet cable disconnects later we were back up our original capacity until they sorted it out.

  54. The Final Nail by Dartz-IRL · · Score: 4, Interesting

    The total cost was actually weet FA in numbers terms, but I think I put the final nail in the company's coffin.

    My first 'job' was a jobbridge internship with a 'small' company. Small enough that I was literally person number three on the employee roster. The company worked in the renewable energy sector, and had been hammered pretty hard over the last few years by The Recession as domestic and corporate purse strings were pulled tighter and tighter.

    I was taken as an Engineer, but rapidly found myself wearing a wide range of hats from Sales, to Customer Support, to System Design, to Project Management, web development in PHP, and finally, IT Support.

    Because, one day, I managed to figure out why one of my colleagues couldn't log in to the server upstairs, and corrected the problem.

    I will say, the Server was the problem.

    It was a dinosaur. It was 14 years old - twice as old as the company - and had been bought second hand. It was a monstrous beige tower with a pentium II processor and God Knows What else inside. It ran Windows Server 2000, and was solely dedicated to serving the company accounts and acting as a networked file storage. Inside the case where four HDD's.... A pair of 9GB ones for the OS and programs, and a pair of 32GB ones for files. Both pairs were mirrored in RAID 1. It had a pair of lockable Zip disk drives still fitted though the keys long lost, along with a floppy drive and a CD Drive with no write ability. Or ability to read DVDs.

    It creaked as it worked, then fumed, whuffed, whirred and occasionally burped. And it sat there, creaking away for years without thought or consideration to its well being or security. Until I came along.

    By this stage, it was obvious the company was dying - the Titanic had hit the iceberg a long time ago, and everything that was happening was just a desperate attempt to bail it out. We might've slowed the sinking - from two months, out to six, even buying a full year - but the abyss of liquidation always loomed.

    So, any suggestion of upgrading the server hardware was met by 'With What Money?'. At the same time, everybody knew the server was the lynchpin. If it broke, that was it - company gone. A suggestion that I use a spare computer from home was quietly discouraged - in case the company went under by surprise and someone decided to liquidate it to pay a creditor rather than give it back to me. Or we turned up to find the doors locked.

    The best I could do was schedule a backup of the accounts and a few other critical systems, and have it go somewhere offsite. I asked our webhost if we could use our spare space for it, and they were happy to let it happen, provided we didn't cause them problems. So, I set it to run the backup every Sunday morning - 1am or so. Each successive backup would overwrite the previous because there just wasn't the spare space to hold two (No money to pay for it)

    I figured even if the server went pop, or we had a building fire or some other catastrophe, at least those copies would survive. I'd figure out what to run them on afterwards.

    Someone, somewhere, should see the potential problem in this. In my defence, I am not, nor ever was, an IT professional. The software education I have is more related to the engineering side of things - making machines and robotics work with a view towards industrial automation, rather than the maintenance and setup of IT infrastructure and data security.

    I just did what I thought I could to keep the Titanic afloat.

    So, one Monday morning, I come to the office and am met by shrill sound of metal screaming against metal and a high speed. There's a heart-in-mouth moment as I realise that it's coming from the server cabinet.

    But, we have backups, I assured myself. The disks are mirrored in RAID 1, so if one drops out, the other should still be clean and working. If that fails, I've my own little backup too....

    Unfortunately - that only works if the damaged disk decides to drop out of the array.

    It didn't.

    I find th

    --
    So there I was, scribbling down some notes off the PC screen by hand, when I reached for the keyboard and Ctrl-S'd.
    1. Re:The Final Nail by drinkypoo · · Score: 2

      There's a clawing feeling that it was somehow 'My Fault'.... and it probably was. With hindsight, maybe I should've set it to run the backup while we were in the building, rather than at home over the weekend. I could've used an external drive to keep one locally too. There were probably a dozen things that I could've done that'd stop it.

      Only one thing which really mattered... verifying your backups. If you don't do that, there's almost no point in making any. (It gives you something to pray for...)

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    2. Re:The Final Nail by Anonymous Coward · · Score: 0

      There's a clawing feeling that it was somehow 'My Fault'.... and it probably was. With hindsight, maybe I should've set it to run the backup while we were in the building, rather than at home over the weekend. I could've used an external drive to keep one locally too. There were probably a dozen things that I could've done that'd stop it.

      Only one thing which really mattered... verifying your backups. If you don't do that, there's almost no point in making any. (It gives you something to pray for...)

      Heh..the only verified backups being done at my current place of employ are the ones I'm doing of my work..I'm not the IT guy there, but when I found out what the SOP was apropos backups I made damn sure I had verified onsite and offsite copies of my stuff..I'm not about to lose five years work in a hurry.

    3. Re:The Final Nail by Tablizer · · Score: 3, Informative

      Databases should be backed up with a text-dump (such as an SQL INSERT list), not the actual database file, because of the internal pointers that are fragile. A text-dump "flattens" the pointers. If you do use the actual database file as a backup, shut all DB writing off first, during the backup. And keep multiple generations.

    4. Re:The Final Nail by Dartz-IRL · · Score: 1

      I wasn't the I guy either. I just had enough of a head to google shit that borked and try figure it out and make it work again.

      --
      So there I was, scribbling down some notes off the PC screen by hand, when I reached for the keyboard and Ctrl-S'd.
    5. Re:The Final Nail by Dartz-IRL · · Score: 2

      I honestly had no idea how it actually backed up, it was a function within the accounts application itself to generate the backup. Which it did, to a local disk. I then had an automatic scheduled upload of that backup to the server.

      Ultimately, like I said, I'm not really an IT guy - I was the one with google and enough patience to fuck about until things worked again. We didn't have one. We did pay one company a hundred quid a month for a while in case something went TU, but we stopped paying him six months before the final death just to make the dead plane glide those few hundred yards further.

      The most IT thing I've done is run a simple website off my own desktop at home, and maybe the whole make a datalogger work with remote internet access.

      --
      So there I was, scribbling down some notes off the PC screen by hand, when I reached for the keyboard and Ctrl-S'd.
    6. Re:The Final Nail by Tablizer · · Score: 1

      I'm not blaming you personally, it was just some "side tech". If an org or situation puts people into positions outside of their specialty, bleep is likely. That's just the way it is.

    7. Re:The Final Nail by Dartz-IRL · · Score: 1

      Never said you were. And such is the way in small companies. You have to do work outside your specialty. That's part of the fun.

      --
      So there I was, scribbling down some notes off the PC screen by hand, when I reached for the keyboard and Ctrl-S'd.
    8. Re:The Final Nail by KGIII · · Score: 1

      That was beautiful. I chuckled in the real world. Lessons learned and, really, no harm done. It was also well written. Even though I suspected the ending it was still enjoyable to read all the way through it. It read like an original BOFH type of story only you did not cause anyone any harm and, well, he would have been making fun of you.

      --
      "So long and thanks for all the fish."
    9. Re:The Final Nail by Anonymous Coward · · Score: 0

      A colleague at a different site managed to do the same with exchange. He diligently backup up exchange every night. Overwriting the previous nights backup, so there was only ever one backup.

      Of course, like you a drive failed - because he was using RAID 0 - took down the whole array. And the backup regime was - wipe backup, do backup, verify backup.

      We then went through a round of "redundancies", one of my techs* was kicked out and my role was given to this guy** - even though I kept the job title (and pay) - my responsibilities went to him as he was on "more money" and better situationally placed, so I had to become the site technician.

      * Don't feel sorry for him, the company screwed up the redundancy process, identifying the person, allocating new positions with no consultation so that it was obvious what was going to happen.
      ** Yea organisational change - no it can't be you, no it's not for you, no it's going to the other guy. Oh, we have to consult? Of course you have a chance of getting the job, and we'll make sure it's completely fair and above board. Yea, you didn't get the job er because er he's had more experience of server 2012 (even though he'd never used it ...)

  55. Not my mistake, but my boss' by whoever57 · · Score: 2

    Not selling the company for $250M because he wanted $300M during the dot-com boom. My boss personally owned about 30% of the company at this point.

    --
    The real "Libtards" are the Libertarians!
    1. Re:Not my mistake, but my boss' by Anonymous Coward · · Score: 0

      Tom, Dennis, Keith, Rich, was that one of you?

  56. I didn't get some contractors fired soon enough by plopez · · Score: 1

    Two totally incompetent twits from a populous south Asia country. Cost about $32k in salary and 4 month schedule slippage. Another contractor, who is competent, said she suspected they gave 'ghost' interviews, a common practice n her country. I heard managers say the same thing, that the two who showed up for work were not the ones they phone interviewed. They did not know command line basics in either bash or Windows, how to use remote desktop, J ava, unit tests, and other things we required.

    Oddly enough of the 4 foreign contractors we used recently the two women have been competent, the two men useless.

    --
    putting the 'B' in LGBTQ+
    1. Re:I didn't get some contractors fired soon enough by NicBenjamin · · Score: 1

      It doesn't surprise me.

      A woman who has gotten through college and gotten a job in a male-dominated culture has done so by being really really smart, and if she comes to the US it's probably partly because she's sick of saying a smart thing in a meeting, and being ignored till the some guy repeats her. So you're almost certainly dealing with someone who knows what she's doing and wants to be helpful.

      Guys, OTOH, are much more likely to be in it for the paycheck and the "I worked in America" resume line.

  57. Bug by Anonymous Coward · · Score: 1

    Havent caused errors with a quantifiable dollar-amount loss. But have been involved with several errors in various systems, as I suspect is the case for developers who write code that actually goes to production ;)

    For an embedded hardware/firmware module for use by a backend application, I made a bug causing the module to reboot if a given parameter passed from the application was missing in certain circumstances where it was supposed to be present. The application wasnt supposed to call with this combination of parameters, and unfortunately the test harness didnt test for this case either. And in fact the application didnt usually call with the wrong parameters. But due to a database crash and associated data integrity error (which turned out to be a bug in the DB software itself which was later fixed) the column corresponding the parameter in question actually became NULL for a few users in the database- And since the application didnt check the validity of parameters but just passed on whatever it got from the DB, this resulted in the firmware receiving the illegal NULL value thus causing a reboot whenever one of these users logged in. The module brought itself up quickly after each reboot and there was redundancy so there wasnt any user impact, but a lot of warnings and alarms went off every time and it took some time to figure out how the error could happen.

  58. Killed a project by ordering a code audit by Dracos · · Score: 1

    I was brought onto a small web startup project as a co-lead. By this time the project was already 2.5 years old and had been rewritten at least three times by progressively less lousy developers. The final iteration was built on CodeIgniter (MVC framework), a decent choice in 2013.

    My first day I'm browsing the codebase to see what's what, and a grep finds something like "UPDATE my_table set foo=" . $_POST['bar']. Not in a controller... not in a model... in a view.

    So I immediately told the other leads that we needed to do a security audit on the entire codebase; it took a few days for the owners to consent. The audit revealed three different mechanisms for database queries (the standard CI driver and two other crude home-grown libraries, all used inconsistently) and that one of the devs, who not conicidentally had resisted the audit, was actually AFK for 20%-50% of the hours he billed every week. It took two months to do the audit and resolve the redundant code (no one was full time, mind you). Finally the owners told us "give us two weeks to decide whether or not we want to proceed". After six weeks of silence they pulled the plug and abandoned it entirely.

    1. Re:Killed a project by ordering a code audit by radarskiy · · Score: 1

      What part of this was a mistake that you made?

    2. Re:Killed a project by ordering a code audit by khallow · · Score: 1

      Existing in this reality apparently.

  59. Multiple multi-million dollar satellites. by GrantRobertson · · Score: 1

    I had a friend who's job it was to find a way to break satellites. She said she was quite often successful.

    (Hey, the OP didn't say it had to be an accident.)

    1. Re:Multiple multi-million dollar satellites. by bunratty · · Score: 1

      So once she tried to break a satellite and she fixed it by mistake? Oops!

      --
      What a fool believes, he sees, no wise man has the power to reason away.
    2. Re:Multiple multi-million dollar satellites. by Greyfox · · Score: 5, Funny
      Funnily enough at the satellite company I worked for that one time, one of the older guys there mentioned how he almost lost a satellite once by logging in to his own account and issuing a maneuver command to the satellite. Problem was the satellite was expecting times in GMT and got them in MST. Took them days to get it oriented correctly again.

      Now the programmers in the audience could probably think of like 10 different specific things that could be coded into the system to prevent that from happening, but this company didn't. Which really isn't too surprising. I asked one of the devs on the ground systems team if the ground systems was using GMT or UTC. His answer was "What's the difference?" I was able to infer from his answer that it was most likely GMT, and that did appear to be the case. Somewhere deep in the bowels of the system there was presumably some piece of code written by an Indian contractor with a math degree adjusting times for leap seconds, but it wasn't in any code that anyone knew about.

      The early history of that company read like a Monty Python sketch. The first satellite exploded on the launch pad. The second satellite fell over and then exploded. The third satellite burned down, fell over, exploded and then sank into the swamp. The forth satellite got into orbit and was promptly bricked by sending the wrong version of Windows(!) to it. To be fair they only had to do that because they launched it with the wrong version of Windows(!!) in the first place. One would think that ANY version of Windows would be the wrong version of Windows to shoot into space, but that's why you're not the head of a billion dollar satellite company.

      --

      I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

    3. Re: Multiple multi-million dollar satellites. by GrantRobertson · · Score: 2

      Wow. Just, wow.

    4. Re: Multiple multi-million dollar satellites. by bitingduck · · Score: 2

      I talked to someone recently who lost a day of science data from a UAV because the Windows system driving the instrument decided to auto update while in the air with something like a 56kbps data rate.

      I recently built a field instrument and made it Linux based specifically to prevent things like that, as well as to keep power and latency down by being able to kill unnecessary background tasks.

    5. Re: Multiple multi-million dollar satellites. by freeze128 · · Score: 1

      Issuing a kill command in a UAV may have a completely different effect than what you expect.

  60. We let contractors fuck up all the time by fustakrakich · · Score: 1

    We get big discounts that way.

    --
    “He’s not deformed, he’s just drunk!”
  61. Cost me my JOB! by Anonymous Coward · · Score: 0

    I developed a system at work to transfer specifications from the customer to the software engineers that bypassed me.

  62. Rain by ouachiski · · Score: 1

    I left the cover off of a $40,000 stabilized vsat antenna in a rainstorm once, That did about 10k in damage to the electronics inside. That's nothing compared to what our customers do though. Lets just say communications systems don't belong IN the ocean.

    --
    sorry for my comments, I'm drunk
  63. 1.2m usd by Anonymous Coward · · Score: 0

    Not my own personal screw up, but did watch a Coworker Torch the version control system that about 1000 people were depending on. It was such an epic torching that it was down for 3 days - bug was also deleting the backups the moment they restored. It turns out that 1000 engineers playing solitaire for 24hrs is a huge bill

  64. Powerpoint presentation by Anonymous Coward · · Score: 1

    I prepared a powerpoint presentation, where we could see small black dots. These were dirt marks on the lense of the camera.

    But I thought it was missiles with nuclear warheads or chemical weapons, and presented that theory to a bunch of idiots. Next thing I knew, we were invading Irak!

    - Colin

  65. Not me, but I got fired over it by Badlight · · Score: 1

    I got hired with a local ISP/network service group, and my first assignment was to go install a new frac-t1 router in a new client's office (yea, this was ~15 years ago, cheap t1 routers were still ~$1k). So the boss takes me back into the storeroom, digs out a router from a pile, and grabs a random power supply by comparing the size of the plug to the hole in the router. I actually bother to check the rating, and find that the power supply is 24V, and the router wants 18V. The boss tells me to plug it in.

    Me: "Um, I don't think this is the right power supply."

    Boss: "It'll work, come on, we're in a hurry."

    Me: "But this is a 24V supply, and the router wants 18V"

    Boss: "I said plug it in, what are you, deaf?"

    Me: "OK..."

    BANG! Fizzle-smoke-spark!

    Boss: "What did you do that for?"

    Shortest job I've ever had.

    1. Re:Not me, but I got fired over it by ganjadude · · Score: 1

      sounds like it was a good thing you got fired.

      --
      have you seen my sig? there are many others like it but none that are the same
  66. $80k-$100k by Anonymous Coward · · Score: 0

    Not sure how much the repaired metal layer cost. Totally my fault, I busted a key piece of a metal rom mask. About 200k of code got shifted late by 4k bytes (a late req from a team member) and all that code was worthless as it was linked for the original layout map.

    I did not get fired but the processes were tightened up. I wrote a tool to confirm an elf32 files preloaded contents were where they should be. Never had this issue since.

  67. BGP4 by Bookwyrm · · Score: 1

    During an acquisition, the company being acquired helpfully passed along the list of AS they used in their BGP4 configurations in their core routers.

    They helpfully had included the ones from other networks they provided connectivity to as well, but just had sent the AS numbers over in one big list, unlabeled, along with the AS their network originated: "Do these."

    So during the network integration I dutifully entered the entire list of AS into the core routers as AS to be originated. Needless to say, hilarity ensued.

    So perhaps not entirely my fault - though I should, in hindsight, have asked for more clarification or done more investigation rather than blindly trusting the information I had been given. This was a couple decades ago, and I was not cynical enough yet.

    1. Re:BGP4 by Anonymous Coward · · Score: 0

      A good friend of mine was at the center of AS7007. (Oh, just Google it, you damn kids.)

  68. Surrendered three letter .COM domain by west · · Score: 1

    Got this domain "hsa.com" in the *very* early days of the Internet (pre-web). Decided that since we were a Canadian company, I we should have a Canadian domain, and surrendered it and got hsa.on.ca. (we weren't allowed to have hsa.ca, since all our offices were in Ontario...)

    A three letter .com address would probably have been the most valuable asset of the company :-).

    1. Re:Surrendered three letter .COM domain by aaarrrgggh · · Score: 1

      I can one-up you there... CIO let a 2-letter domain name expire in 2010, due to a merger and re-branding. Helped sign up for it in '95.

    2. Re:Surrendered three letter .COM domain by greenreaper · · Score: 1

      Even more so since HSAs are now the equivalent of health-oriented RRSPs in the USA. Man, that could have been golden. Of course it's just parked now because nobody wants to pay.

    3. Re:Surrendered three letter .COM domain by Trax3001BBS · · Score: 1

      Cyber squatting, did I ever miss an opportunity...

    4. Re:Surrendered three letter .COM domain by Anonymous Coward · · Score: 0

      Ouch! I had a two-letter .COM domain that I registered in '93, and sold in 2011.
      The taxes alone were well into five digits left of the decimal point.

    5. Re:Surrendered three letter .COM domain by west · · Score: 1

      In 2010 ??

      Ow!

      At least I let mine go before they had any commercial value.

  69. Not mistakes at all. by Anonymous Coward · · Score: 0

    I've found 99 times out of 100 catastrophies were caused by deliberate acts. Usually there is one or more arrogant sysadmins, who are much better at writing root cause analysis. Of course, if you are senior architect who has designed your system to allow on3 of these bozo's to actually touch it, you're screwed. Then of course you have users. If you have users, no amount of planning can stop the carnage about to be unleashed.

  70. Out-of-sync DB entries for CC payments by Zapotek · · Score: 1

    Worst thing (so far) has been formatting a PHP date() DB timestamp wrong for entries associating users and payments. I think it was something like accidentally using 'M' for both month and minute.
    At the same time, there was a bug somewhere that periodically caused only one of the 2 tables to be written to, when we noticed that the tables were out-of-sync we immediately jumped to the timestamps to make some sense of the situation, which of course didn't work in this case.

    Took only a few hours to sort out since we could use other available information to fix it, but it was my 1st or 2nd real job at around 18 so I figured I was canned; I wasn't though, it was one of those "lesson learned, watch out for it next time" situations -- my boss was really frustrated though.

  71. During a planned power outage ..... by liamo · · Score: 1

    ... plugging a kettle into your 6-hour UPS is not a recommended way to make a cup of tea. This, however, is exactly what I did a long time ago. 10 or so seconds later, I had still-cold kettle of water and an entirely drained UPS. Oooops !

  72. About $1,000,000 by Anonymous Coward · · Score: 0

    Bug in the routing protocol in a custom X.25 network for a major stock exchange. in the 1980's. Killed the entire net for a day. Client estimated the cost at a minimum of a million dollars. Found and fixed in 24 hours. The client actually thanked us because their earlier custom network vendor had done much worse.

  73. Payroll by Anonymous Coward · · Score: 0

    Back in the pre-Y2K era I designed and wrote a system used by a major transcription company. The transcriptionists were hired as contractors and paid by the word. My system was supposed to do a word count, then pay the transcriptionist on the number of words. Easy, right? Only the word count was more than a little selective. Headings, for instance, were not a part of the count. Certain phrases were exempted from the paid word count, as were other characters and whole words.

    Well, in the middle of all this I screwed up the word count process and ended up over-paying the transcriptionists by about 15%. The system ran for over a year before I accidentally caught the error. I think the phrase, "Oh Holy shit, look at this crap." came to mind. After updating my resume, I took the whole mess in to the general manager's office along with a bottle of bourbon and told him what was going on. I finished up by telling him I wasn't going to say anything about it to anyone.

    He agreed. I quietly fixed the problem, being careful to drop the cost a point at a time over a three month period so that no one would notice, then left it alone.

  74. Sligthly over 12 million USD - for couple hours by Anonymous Coward · · Score: 0

    My first summer job during studies - 20 years ago.

    We were developing and testing new payment transfer system at one of largest global banks. I had access to production for monitoring
    and to test for testing.

    Somehow I made a mistake and run a test batch on production
    instance.

    The batch was monthly payment of a large airline for on of larger airports - slightly over 12 million USD.

    In 2 hours reports in mgmt went red and my manager got a call from higher managemt.

    My boss was able to negotiate with the airport to reverse the transaction - he knew the manager at the airport as we were training their staff as well.

    Nothing happend to me - I just got chastized - my boss got some heat but everything ended fine.

    1. Re:Sligthly over 12 million USD - for couple hours by Anonymous Coward · · Score: 0

      You could have simply deleted those rows the same way you inserted them?

    2. Re:Sligthly over 12 million USD - for couple hours by jonwil · · Score: 1

      Not if that database insertion caused money to be moved somewhere else and database entries existed on a system belonging to someone else.

  75. Well... by JustAnotherOldGuy · · Score: 1

    I once forgot to open a water valve before turning on a laser in the lab.

    The low-pressure safety switch for sensing water flow had been bypassed (not by me) and the laser tube immediately cracked and broke due to the instant heat buildup. Total cost, about $4000.

    --
    Just cruising through this digital world at 33 1/3 rpm...
  76. Big Report by Anonymous Coward · · Score: 0

    I once mis-printed a 50000 page sales report.

  77. Mars '98 by supertall · · Score: 1

    I worked on both the Mars Climate Orbiter and the Mars Polar Lander, though not on software related to the failures. I did fry a $12k damper during testing though due to a misunderstanding with the thermal engineers on hardware placement (I didn't lose my job, I was fresh out of college). Due the fact that the capillary pumped loop heat pipe thermal system didn't work, they ended up cutting it off and adding extra heaters/sensors at the last minute. Looming launch deadlines make for crazy times ...

  78. Whoopsies! by Anonymous Coward · · Score: 0

    Rebooted the wrong IBM mainframe - twice!
    No idea of true "damage" cost but frustrated a lot of users. Figure $100k or so given the number of users and their hourly salary.
    Didn't lose job but boy was I made fun of for a couple of weeks . . . Then made a systems programmer.
    Go figure.

  79. $0 by Anonymous Coward · · Score: 0

    I stripped the thread from my pedals by incorrectly using a cotterless crank puller.
    I think that was the biggest tech mistake I ever made.
    It didn't cost anything since I haven't had to replace the cranks on that bike yet.

  80. $980,000 in about 20 seconds by Anonymous Coward · · Score: 0

    I work in finance and there was a confluence of a bug, a human error, and a poor process that lead me to lose almost a million dollars in about 20 seconds after an economic event. It had been working on the front desk for about a month at the time and I was literally seeing fog for the rest of the day trying to put together the tick by tick scenario of what happeed. I wasn't fired, and cleaned up next time the event came along. But I was scared I had just ruined the best job I even had.

  81. Crashed the Uni Mainframe Once by Greyfox · · Score: 1
    Was curious what an apparently undocumented feature on the login page did. Turns out what it did was crash the mainframe. Go figure. You'd think they'd take that shit off the login page, but apparently no one had ever been so curious as to explore it before. Which says a lot about that uni, now that I think about it. Also, once trash talked a uni in a story on a news blag website. Yeah, those were the days...

    Mostly I make my career out of fixing other people's tech mistakes. Which is not something that uni taught me how to do. Man I'm glad I got out of that place before I ran up any significant student debt. Did I mention I trash talked a uni on a news blag website?

    --

    I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

    1. Re:Crashed the Uni Mainframe Once by rrohbeck · · Score: 1

      LOL. When I started our Uni mainframe was, umm, not very secure (ICL 1906 with GEORGE 4, yay.) We crashed that thing every few weeks. Whenever you did something naughty and the terminal displayed a flashing status at the bottom saying it was waiting for a reply it was time to run from the terminal room because two minutes later one of the operators would come in and look who sat at terminal number X.

  82. Mmmmm. Tastes like by Anonymous Coward · · Score: 0

    burnt electronics.

  83. $180K mistake by Anonymous Coward · · Score: 0

    I worked at a company in the valley and had to correct another engineer's mistake in an offline log parser that calculated per-customer bandwidth usage of our platform from httpd server logs. Customers got billed on a combination of things, with one component being bandwidth usage (X GB per month free then additional charges per GB). This surfaced when a customer complained that they were being overbilled. Turns out that the log parser was not distinguishing HEAD requests from GET requests, so if a HEAD request for a 1MB file came in and resulted in an HTTP 200, the code incremented "bandwidth used" for that customer by 1MB. I fixed the bug, then we had to pull the previous 2+ years of logs from S3 and run a massive offline job to compute how much we overbilled. For many customers it was a non-issue, but there were a few that had been overbilled by a lot. We ended up issuing about $180K of refunds (which was a lot for a company with a few million in revenue per quarter).

    The engineer that wrote the original code only worked there for about 6 months and was terminated after failing a PIP, but nobody ever did a thorough audit of his code after he left, so it was used for the next 2 years until the complaints made us look at it.

    1. Re:$180K mistake by greenreaper · · Score: 1

      I don't think that counts: a) it wasn't your mistake; and b) the company should never had had that revenue in the first place, so it wasn't a "loss" but a restitution.

  84. FinTech by Anonymous Coward · · Score: 0

    I always liked FinTech place's ability to forgive these things. When I had my million dollar loss, I thought for sure I was going to get fired, but they just wanted accountability, to know what happened EXACTLY, to have the issue fixed, and new procedures in places to make sure it never happened again. I too was priase for the being able to do the intricate debugging, picking though packet captures, etc...

  85. Several million USD... by Anonymous Coward · · Score: 0

    Fucked up the coordinates on a fuel rod matrix/grid for a nuclear plant and it had to be scrapped. (a huge part several meters in diameter precision milled out of a single piece) I did not get appointed employee of the month that month.

  86. Should also include near misses by Anonymous Coward · · Score: 0

    At a startup - I was the new and first sys-admin they had had, they had two developers running amok before I got there. So I had implemented nightly backups and gotten things under control - on the backup server, which was also the development environment, I wanted to hook up the CD audio cable so I could play music... so rather than waiting to do a clean shutdown and interrupt the developers I decided to do it hot. Needless to say I bumped the scsi cable and put the raid into an wedged state. It took a call to the manufacturer of the box and one of their systems engineers to get the magic command to force the hardware raid card to a "good" state, and the day was saved.

  87. About three days work, but PITA by Kjella · · Score: 1

    Basically an loading tool with a bug I knew from testing, you could set it correctly once in production but if you set it twice every user was f*cked up and could only be fixed from the web interface by about 5 clicks per user, no programmatic solution. And of course we had an error in the production setup, I altered that part - which I could - but forgot to take out the "you can run this only once" settings. Hundreds of users borked and the vendor support would take forever or claim there's no other way, what do?

    This was a consulting company, trying to bill this would look bad on both our vendor and ourselves and it pretty much broke everything so we gave a benched consultant the assignment from hell. Click here, here, browse, pick, save in this somewhat less than instant web interface. Now do that all day, every day for all users until you're done. Personally I'd be ready to jump off the roof after an hour, but apparently she stuck to it for three days and finished. I don't think we won any popularity points with her though.

    --
    Live today, because you never know what tomorrow brings
    1. Re:About three days work, but PITA by Anonymous Coward · · Score: 0

      Out of curiosity, why didn't you own up to your mistake and do it yourself?

  88. Melted down 200K worth of power supplies by Anonymous Coward · · Score: 1

    Melted down a couple of LARGE high end power supplies (worth about 200K - I think the repair was about 50K). Did I lose my job? Nope, not even really called on the carpet. I had a triple redundant fail safe system, approved by management (in writing), and reviewed by both levels of client, and ALL THREE systems failed! (1 software, one independently developed firmware, one mechanical). Failure analysis on just the last one (the mechanical) was it was a once in over a million chance of it failing (yes, we did a failure analysis). Something (surge?) fried the computer, the firmware controller, AND welded the mechanical contactor closed (LOW duty cycle - close at start of test, open at end, 3x safety factor on ratings, something welded them during the test - aka I watched them close, visually inspected, and went home for the night, as per SOP)
    One of those freak things, but we changed to a carbon contactor so it could not weld, and changed the firmware unit to a more robust unit, and did some other isolation. As far as I know, never happened again

  89. a bug i found once by superwiz · · Score: 1

    was created by my boss. I fixed the bug instead of reporting it. The boss was incompetent and was costing the company millions in missed opportunities and in increased turn over of really good people. He couldn't see when his successes were pure accidents and when his mistakes were entirely foreseeable and preventable. I had a few opportunities to get him fired when fixing his messes. I wasn't ruthless. It cost a number of good smart people their jobs and cost the company millions (in fixes, unnecessary delays and missed opportunities). I'd put the dollar figure at around $10mil. But it may be much larger if some of those missed opportunities were first-to-market.

    --
    Any guest worker system is indistinguishable from indentured servitude.
    1. Re:a bug i found once by Anonymous Coward · · Score: 0

      That sounds like the CEO at my work, although his boss in the parent company is his father so when he fucks up, he tells his dad that someone else did it and we cop the blame. Some of us (i.e. me) were fired over this guy's ego (he felt that $1 more than minimum wage was suitable for 2IC in charge of a company worth a few million.

  90. On the plus side, it discovered life... by Minupla · · Score: 2

    ... too bad it was here :)

    --
    On the whole, I find that I prefer Slashdot posts to twitter ones because I don't get limited to 140 chars before
  91. Six figure accidents by sectokia · · Score: 1

    The two biggest I have seen: -Comms card slips out of box while being carried over to submarine. Worth about $220,000, fell into the water and had to be recovered by divers for security. -Electrician didn't test circuit was isolated, he went to disconnect 3 phase circuit and decided to start with neutral. He lifted the neutral off, putting up to 400v where there should have been 230v. This destroyed over $300,000 in components, and cost another $200,000 due to lost operations.

  92. $60K by Anonymous Coward · · Score: 0

    No one knows yet,
    I had hardcoded the path for database backup, so one customer had db1, db2, db3, ..., and only the 1st one got backed up. They deleted everything to install the systems fresh. No one knows why backups didn't work.FWIW, I had specifically told them to ask me if they ever wanted to reinstall the systems fresh. The bug was fixed in the newer version of the code.

  93. Brought a Tandem Non-stop to a halt.. by scsirob · · Score: 1

    Back in the 80's I worked for a field service organisation, fixing and maintaining PDP11 and VAX systems, but also CDC-9766 removable disk systems. Big 14" removable disk packs like you see them in old scifi movies. One of my customers had a string of 10 or so attached to a five-node Tandem Non-stop system.

    Each week they brought two out of ten off-line for me to work on. I cleaned the heads, then used a servo disk pack to realign those heads.
    To do this, I needed to remove the control cable from the string, and plug in an excersizer. One day I forgot to pull the control cable. So instead of moving the heads of my offline drive to a specific track, I moved the heads of *ALL* disks in the string! Without the O/S knowing about it

    Believe me, that will bring a Tandem Non-stop to a grinding halt. That was my last time on the floor for that customer, but I didn't lose my job. Cost? I don't know. Perhaps a weekend of data recovery for the operators?

    --
    To Terminate, or not to Terminate, that's the question - SCSIROB
    1. Re: Brought a Tandem Non-stop to a halt.. by Anonymous Coward · · Score: 0

      You stupid fucker. Tandem are never supposed to halt!

      (10 year Tandem developer)

  94. IBM mainframe disaster by send2cbd · · Score: 1

    Late 70's. Central datacenter for a state not to be mentioned. I modified the JES2 startup JCL. Our mean-time-to-reboot was typically 2 weeks. Because of important state business, we didn't get a chance to reboot for 3 weeks. So, we reload and JES2 dies for JCL error. Then, we realized that all of our daily backups have the same error. And our last 2week backup has same problem. Our next backup, monthly, is stored at a site that is 1.5 hours away. Meanwhile, programs like AFDC and prison support apps are not up. Governor starts getting calls from important folk - wheres the system? Governor calls DP director - wheres the system? I see the end of my career looming. Fortunately, my boss had an old SVS system on tape that was just enough to allow us to edit the JES2 deck. After this, we changed our backup policy and put in stricter rules on modifying production systems. I just retired after 46years in computer industry. Still remember the fear on that day.

  95. Posted this before by Oligonicella · · Score: 2

    But it's worth repeating in this context. Thankfully, it wasn't me.

    When I worked at a KC bank, we had a Wire Transfer team manager who loved golf. He was supposed to come in Saturday and test a firmware/OS upgrade, then restore. Nice, sunny day Saturday, so he decided golfing would be better.

    Came in Sunday. Installed firmware/OS upgrade. Tested fine. Forgot to reinstall previous firmware and powered up old OS.
    Incompatible. Froze the machine solid. He panicked and tried for maybe four hours to fix things himself. No go. Finally called Cupertino for help 4+ PM.

    The techs had to be found, gathered and flown out from CA to disassemble said machine and reassemble. No wires until 1 or 2 PM Monday. Much money loss for all customers.

    To answer the obvious question, no - beyond my understanding, he wasn't fired or even demoted.

  96. $$$ unknown by Anonymous Coward · · Score: 0

    Posting AC for obvious reasons:

        I was an electronic tech in the military. I dropped a blast proof panel onto a nuke and left a dent about the size of watermelon. Cost unknown, but I should have been wearing depends

    1. Re:$$$ unknown by Anonymous Coward · · Score: 0

      You're not a moron for dropping it, you're a moron for not knowing, then or now, that you can't set off a nuke that way.

  97. $8m UPS modification. by thegarbz · · Score: 2

    One of my first engineering jobs out of uni involved modifying a UPS. This UPS had a massive battery bank that was quite dangerous to load test and didn't have an automatic load testing function. I came up with a small design involving a contractor and some minor wiring changes and we were part way through implementing it on every UPS at this site.

    This UPS was part of a redundant pair that fed an emergency shutdown system at an oil refinery. In between the UPSs and the ESD system were about 120 circuit breakers, two for each circuit, and one of them was off. We modified the first UPS without issue then started the process for the second one. After calling the control room to let them know they will receive an alarm I switched off the UPS and was suddenly meet with a steam of profanities over the radio.

    We lost power to 80 field instruments which triggered a fail safe action on the shutdown system tripping 4 units at the refinery, one of them was the FCCU which is core to a lot of refinery processes. To add insult to injury the unit was unable to be hot restarted because of a stuck valve and then thermally contracted breaking of large chunks of coke from the overhead line which blocked the internal cyclones. The FCCU was down for repair for roughly 10 days, I had made a name for my self and was asked to display the cock-up award (a giant dildo mounted on a plaque) on my desk.

    Total cost of the outage was about $8million. Fortunately only partially my fault.

  98. My biggest mistake by Anonymous Coward · · Score: 0

    I used to work at a cloud infrastructure provider and we were in the process of decommissioning a zone(datacenter). We were currently using cloudstack as our orchestration and it worked fairly well. One of the sysadmins noticed it was going to TAKE FOREVER to delete all of these these templates(used for spinning up vms) via the GUI. So I volunteered to write a quick little script to delete the templates through the API.

    The sysadmin promptly gives me all of the template ID's that needed to be deleted. I plugged them into my script and away it went. I forgot one crucial detail when I wrote that script... I forgot to specify which zone these templates were to be deleted from. So it deleted the templates from all datacenters...

    Didn't cause any downtime but about $20,000 in hours to rebuild all of the templates... and no, we didn't have any backups... that was something management was to cheap to pay for 0_0. I didn't lose my job over it and there was a silver lining. We now had all of our templates up to date with the latest security patches :P.

  99. Not mine but I fixed it by Anonymous Coward · · Score: 0

    Test tech smokes a 80Mbyte hard disk when they cost $1500 each. Replaces and smokes the next one, hmmm something is wrong. Leaves that machine, goes to the next one, smokes a hard disk, replaces and smokes that one. Hmmm,, two machines in a row. Calls in the Engineering tech (me). I quickly determine the cable sets were wired wrong, 12V where 5V should be an vice versa. He never thought to look at that, took me about 30 seconds to figure out the problem.

  100. Fried Unibus adapter card on VAX by Anonymous Coward · · Score: 1

    In the very early 80's, I was tasked with getting a VAX 11/780 onto an internal Ethernet network using a proprietary Ethernet Unibus card (one guess where I worked). This VAX had a Unibus backplane in a separate cabinet cabled to a Unibus adapter board on the system bus in the "main" cabinet. The Unibus adapter backplane was wirewrapped and since this Ethernet card did DMA (it's been a long time, but I think that was why), it needed control of a bus line which was normally jumpered on the backplane bypassing each slot so "dumb" cards didn't have to deal with passing the signal along. Therefore, I had to snip this jumper on the backplane of the slot I was installing the card in.

    The VAX wasn't used by our group but was used by other departments during the workweek for some fairly important stuff and there was no backup system. The machine was given to me on a Saturday morning and I was admonished it absolutely had to be up by 8AM (IIRC) on Monday morning. No problem as I had studied the problem and had been in email communication with someone at another site who had performed exactly the same procedure.I had never physically touched a VAX before in person but there really wasn't anyone to help me with the task locally so I was on my own (in retrospect, maybe that wasn't the smartest decision) but, being young and brash, that didn't bother me.

    It didn't take me long to find the VAX once I got into the data center -- after all there was only one of them. I shut it down cleanly from the console. I set the switch on the main cabinet front panel to the OFF position (I don't actually recall how it was labeled), the lights on the front panel went off and I could hear the area around me got a little quieter as fans spun down (although there was a lot of other hardware around, so it just reduced the din slightly). I was well prepared and had just the perfect pair of wire cutters to do the job. I opened the Unibus adapter cabinet and put the card in. I then accessed the backplane, carefully identified, double checked, and triple checked, the slot and jumper that I needed to cut. In retrospect, maybe I should have paid attention to a rather obvious condition that was staring me in my face, but I had rehearsed this work flow in my mind and proceeded onward. I confidently stuck the wirecutters into the maze of wires, snipped the relevant wire, and everything was going very well.

    Then I withdrew the wirecutters from among the wire-wrap posts and was more than a little surprised as sparks arced from the wirecutters to wirewrap posts that they brushed against. Nearly simultaneously with the arcing, I noticed one little detail that I should have noticed earlier -- the fan in PDU or power supply in the bottom of the cabinet was still whirring away happily and the light showing it was powered on was clearly glaring at me. Ooops...

    Well, I thought, hopefully, no harm done and I closed the cabinet. It was around then that I noticed a very concerned look on the faces of a couple of FEs who were working on an adjacent machine. I walked over to them and their concerns quickly became mine -- turns out they were "downwind" of the VAX and the distinctive odor of scorched electrical bits was strong around them. I guess I made someone happy that day though - they were very relieved that it was my machine, not theirs, that was emitting that lovely unmistakable fragrance.

    Unfortunately, although the VAX seemed to boot, a bunch of stuff didn't work... Ooops...

    We had 7/24 support with DEC so I called service out and watched a completely incompetent service guy (he was our PDP-11 repair guy who apparently was stuck on call supporting hardware he knew nothing about) fumble around for hours and concluded that the Unibus backplane had been fried and initiated getting a new one counter-to-countered to us (fortunately, that got blocked by someone who knew what they were doing somewhere). The guy didn't even know how to run diagnostics on the VAX and refused to attempt to do so.

    In the end, the machine was not up a

  101. Sega Dreamcast HW devkits were easy to kill by Anonymous Coward · · Score: 0

    Back in the day, the HW development kit for Sega Dreamcast cost upwards of $10k and would fry if you plugged a receiver into the RCA video-out port while the system was on. We had a dev do this. So we had a good laugh, shipped the unit back, got a replacement, gave it back to the dev and the first thing he did was plug his receiver into the unit while it was running...

  102. Regexs..... by kc8apf · · Score: 1

    I missed one character in a regex in a monitoring system that would cause it to think all the hard drives in a machine had failed when the machine was booted. Since it only happens on boot, it wasn't noticed until there was maintenance work that powered off an entire datacenter. When they turned the power back on, ~5000 machines all decided their hard drives had failed simultaneously. Took 2 days to clean up the mess.

    --
    kc8apf
  103. Largest IT Mistake I ever witnessed by Anonymous Coward · · Score: 0

    The largest IT mistake I ever witnessed happened at healthcare software company I used to work for not too long ago. The Database team was going through a lot of change at that time and they were bringing on a lot of new people. The senior DBA was our wondering some mountains in South America for his vacation (the dude is awesome). So it was left to one of the more mid range dbas to hold the fort for the next month.

    So the perfect storm happens: The senior is out, the backups had been failing due to lack of equipment and a place to house them, a new team of dbas in the environment, and developers who saw their chance to push things at a faster rate because there wasn't someone there with enough experience and balls to tell them no.

    The dbas and developers spot checked the script and everything looked green. They ran the script in prod and accidentally dropped over a terabyte of data in the companies largest database which housed data for their most profitable product. The dba saw that something was wrong within a fraction of a second and tried to cancel it but it took time for ctrl-c to register. Dropped over 400 tables in a 1200 table database. The dba just sat there, speechless with water in his eyes because he knew how bad this was.

    It took two days to get the thing back up, the company lost over 2.5 million dollars from it and they tried to fire the dba who ran the script. The senior who was in the mountains, cut his vacation short and flew back to Pittsburgh. The senior threatened to quit if they fired the other dba and pointed out all of the flaws in the process. He also showed them all the documented attempts to try to get management to purchase equipment and infrastructure necessary to backup the companies most profitable data.

    They didn't fire him then but they definitely wanted to. They treated the all of the new dbas like children after that which eventually lead to THE ENTIRE database team leaving the company. Management would of been hard pressed to of handled that situation any worse than they did.

  104. Millions by maiden_taiwan · · Score: 1

    About 15 years ago, a QA engineer in my office (a large Wall Street financial form) placed a fake trade for 1,000,000 shares of company stock in one of our test systems. The test order somehow got out to the New York Stock Exchange and actually moved the market. Backing out that trade was reportedly quite expensive.

    The engineer didn't get fired, because he had done everything correctly. The system infrastructure had been set up wrong.. wasn't his fault.

    1. Re:Millions by Anonymous Coward · · Score: 0

      This happened at a place I worked about five years back. One of the ops delisted a large cap from our production server rather than QA. A client transacted about £100m on the ticker and moved the market. The client lost a lot of money, but no one got fired.

      At least we didn't cause the flash crash.

  105. Underestimating users ingenuity by Anonymous Coward · · Score: 0

    I rewrote a system for checking out clients, upgrading the backend code (Things like an actual relational database and events to pass messages between systems for viewing updated data). But I had to maintain the work flow and one of the things the system had was everything is put in a current invoices table, at the end of each shift, they'd shut down every computer and run the end of day, it would print out a list of transactions for them to count their drawer and move everything to old invoices.

    The old system did no checking to ensure they didnt have the system up, so i added a tool to lock the system and prevent any new machines from coming up then added a tool to send out an event to all connected machines and ask them if they're still on, connected machines would reply and abort the end of day, if no replies it proceeded. Eventually the users figured out there was no check that the system was locked, so they started launching the end of day without and while it confirmed the invoice report was printed properly they'd leave it sitting there then pull up other machines, the system was built to roll back if there were any inconsistencies in terms of database relations but it also used exclusive access for the end of day so another system up would cause data loss, since i only tested against inconsistent data and the other outcome was impossible right? users were instructed to do it this way system asked if anyone was on it shouldnt be possible right?

    Well they did that once, called me in i look at it they've already closed the error and the entire system, i tell them i need the transaction number it gave them so i can roll it back, they go "well it put something on the screen about a transaction number but i didnt think it was important so i closed it" so they have to stay late and re enter the days worth of sales.
    A week later I haven't yet figured out what they're doing to cause it and they run into it again, and they've closed the error message again and once again they have to re enter the whole days of sales.
    A week later it happens again, call me as luckily im in the building, i run to the machine and as i get there the receptionist is moving the mouse to the cancel button i tell her to stop she says "but it wont let me do anything else if i dont" I tell her to stop anyway. I sit down and go about unrolling the transaction and when i look over someone has the system up, i ask her why she has it up during the end of day she says she started it after the error, i pull the logs and i see when the end of day started i see when she pulled up her machine and i know when i sat down in the chair, she pulled it up well before i sat down in the chair (5-10 minutes or so) but well after the end of day started.

    They still try to find tricks to get around it, i have a daemon that monitors the end of day now and looks for them trying to get around the lockout on it and kick them off if they do.
    Everyone argues non stop that its their god given right to screw stuff up however they please because they don't see how its bad. Told one girl 3 times and when i caught her the 4th time she said "well i don't see how it affects me so i don't care"
    Its literally everyone at the counter so firing offenders isn't an option, working on a training regiment and going to put in policies to make it their problem (ie the shift that screws it up is the one that has to fix it, no leaving it for the afternoon shift)
    and working on a replacement system that maintains a similar work flow but more or less simulates the whole end of day thing so they cant mess it up anymore.

  106. Please don't whip me by Tablizer · · Score: 0

    I confessed, I worked on Slashdot Beta ;-P

  107. Warship Anyone? by GumphMaster · · Score: 1

    Mid 90's. Spent a lovely weekend below the waterline on a frigate updating the ship's maintenance system with a new data picture of its systems. All went wonderfully well and I walked ashore late afternoon on Sunday and flew back to my home city. Fast forward to 4pm Monday and we get a call from the ship at sea saying the maintenance system no longer functioned: get your butt out here and unf*ck it. So, in the car, 3 hours drive to where the ship anchored for the night, RHIB ride out to the ship, up the rope ladder, about 10PM... fix it, you have until 6AM or you are sailing with us (for a week). That, my friends, is great motivation to work fast. To cap it off, there was a small fuel leak in the space outside the computer room: wonderful aroma to deal with. Tried to work out the obscure linkage between existing maintenance jobs and the system description that was causing the issue. Ultimately had to roll the database back to the pre-update state. Off the ship at 6 along with many bags of oil-soaked rags used on the fuel leak. Ship lost a few days of data and a day at sea: captain not happy... and we had to do the whole exercise again later.

    Tape for data, $100, Airfare and and accommodation, $600, warship all at sea, priceless.

    Not entirely my doing (what is these days) but I was the man that delivered the fun. No names, no pack drill over this.

    --
    Patent litigation: A doctrine of Mutually Assured Destruction... in which everyone seems willing to push the button
  108. 80% of the source code.... by Anonymous Coward · · Score: 0

    Upgrading from Netware 2.something to 3.1. Did backups, did more backups, tested the backups, retested the backups. Went ahead with the upgrade. That process wiped out everything to re-install. Started reloading from the backups. 1st one failed at the end of a volume, 2nd one failed at the same place. Started worrying. Re-installed everything. Backups still failed. Never managed to restore anything past the end of the volume.

    After much to and fro with Novell, we were told that we had found a new bug. A file over 256 megs as the last file on a volume corrupted the rest of the backups.

    Handed in my resignation shortly thereafter and NEVER EVER worked with any shit from Novell....

  109. Red Ring of Death by Sarusa · · Score: 1

    No, not me, but it's worth noting that the XBox 360 Red Ring of Death was (according to EE Times) caused by someone at MS who thought he could save a couple million bucks by doing the graphics ASIC work in-house instead of paying someone with experience like ATI to do it. That cost $1.3 billion. As far as I know nobody involved in deciding that or doing the ASIC work has ever been named (and I wouldn't blame the poor ASIC guys), but I can only imagine it would be like to know that was you.

    1. Re:Red Ring of Death by Anonymous Coward · · Score: 0

      Sorry for being so jaded with cynism, but someone has got to tell you the truth.
      If it hadn't been done in-house, it would have been ATI's green drive-led of eternal happiness instead.
      There is no such thing as competence when deadlines are coming.
      People are lazy and they get lazier when they are told to work harder.
      The amazing people who do 3 point shots while typing code at inhuman speeds while some Indian manager yells ship ship ship at them?
      They are responsible for Google's KitKat crapware.
      Valar Morghulis.
      Valar Stupidis.

  110. Fried voice coil by peterofoz · · Score: 1

    I fried a voice coil on a fairly expensive Hitachi 2.2 GB optical drive back in the late 1980's with a QA stress test while working for FileNet. This led to engineering improvements and I got to keep the burnt out coil as a trophy.

  111. responsible for a new building by Anonymous Coward · · Score: 0

    Not me, but a former boss worked for ACDelco at his first job in the late 1970s. He worked on the first generation of CPU-based engine controllers for General Motors, apparently he had a bug that was discovered just as it was entering full production. GM was making so many of these that when they discovered the problem they couldn't just stop production and throw away all the existing modules, so they constructed a new building just to house the defective parts until a workaround was found. And apparently the first few years their ECUs were electrically and RFI-wise very noisy, and GM got the reputation of having car radios with terrible reception as a result. Ironically less technologically advanced competitors without digital ECUs were favored by consumers as a result.

  112. Spending $400 instead of $4,000. by aaarrrgggh · · Score: 1

    Bought a Buffalo Terrastation. Went on vacation a year later to a country with limited internet access. On trip, one-year warranty expired and it died the next day, taking all data with it.

    Fortunately, I had a copy of the server with me on a portable hard drive, so I could work remotely. That was our only backup. Sending the accounting database back to the office via GPRS was a lot of fun, but mailing that drive back to the office (after duplicating it of course) scared me to death.

    The solution at the time was the right one; we didn't have the money for anything more. Ever since we have a hot backup server synchronized to the primary, for a small business. Like most screw-ups, what is important is how you move forward.

  113. This one time... by wbr1 · · Score: 1
    In 1998 i was working for an ISP in their NOC. One of our main AIX servers was filling. It housed home directories (and hence mail stores) for most of our customers. The engineers added a new array. I was supposed to write a script to move the directories to the new drive and change the home path in the passwd file.

    I flubbed the script and while there was no data loss, i, by myself on the night shift broke about 25k email accounts. I had a long night fixing it.

    I still remember the frantic calls from the help desk as I was in panic mode trying to find out how bad it was.

    --
    Silence is a state of mime.
  114. Bricked a Samsung Galaxy by hambone142 · · Score: 1

    Bought it on eBay. Had crappy Verizon firmware on it that wouldn't allow any kind of audio streaming (web page streaming or TuneIn). Loaded Cyanogen on it and it worked fine but still wouldn't stream due to some remnants of Verizon FW.

    Backdated Cyanogen to older mod and that mod was corrupt. It destroyed the boot loader so I couldn't flash another copy of non-corrupt OS.

    I still have the phone but no way to get an OS on it without a boot loader on it.

  115. Re:Took an online trading company offline for a da by AmiMoJo · · Score: 2

    I knew a guy who did support for a multi million pound company. They had many problems, mostly due to the fact that he was too scared to reboot their servers because he did all the support remotely and it would be a 100 mile trip up to their office if the machine didn't come back up. They insisted that he do maintenance in the evenings or at weekends to avoid disrupting their work.

    So their terminal server was still running IE 7, because he was too afraid to update to IE 9 as it required a reboot. Someone actually got fired because they infected the server with a drive-by. Their mail server had a dodgy network card, but it took nearly a year to diagnose because he was terrified of updating the driver in case it didn't come back up, so that was just intermittently not responding or dropping incoming connections for over a year. The driver update fixed it in the end.

    --
    const int one = 65536; (Silvermoon, Texture.cs)
    SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
  116. not me but i worked on it by Anonymous Coward · · Score: 0

    The company I worked for was contracted to work for a company that dealt with credit cards - not the big banks who ran their own services but everybody else with a via or mastercard. We did this at a knock down rate looking for more work at the company.

    we were certainly the first in the UK to automate balance transfers via the software the call workers were using. massive commercial advantage. Every couple of days I'd see some sniffers from big banks looking around and asking questions. things were looking good

    until the company who originally contracted us went... uh uh uh! by the terms of this contract we own the IP. and its true. the moron development manager probably had no idea what he was signing. i certainly lost confidence in him when i asked for pay rise and he couldnt work out what it would actually cost him with the agency fee on top...

  117. Threw away the wrong phone by Theovon · · Score: 1

    Well, one time, I had a problem with my land line, and I erroneously accused the wrong phone and threw that one out instead of the one that was causing the problem. Then I ended up throwing away two phones.

    Since then I've solved the problem more generally by not having a land line anymore.

    1. Re:Threw away the wrong phone by Trax3001BBS · · Score: 1

      Well, one time, I had a problem with my land line, and I erroneously accused the wrong phone and threw that one out instead of the one that was causing the problem.

      I went for a swim with MyTouch cell phone in my back pocket. You could use it as a level by the water inside the glass.

      I immediately went out and purchased a new phone being so important I couldn't miss one that may come my way.

      Wrapping the insides of the MyTouch in toilet paper, and shoving it into the middle of a pan of dry rice for a few days fixed it right up.

  118. THe biggest tec mistake I ever saw... by VAXcat · · Score: 1

    Wasn't mine, but it's too good not to share. Back in the mid 80s, I was working at (let's call it) SuperBigCorp's IT department. There was a fellow there who maintained the programs that handled the savings elections for employee 401K funds. One day, while making some changes to the COBOL programs that sent which funds to what investment vehicles....he made a little mistake. He got confused in a conditional statement, and all the funds that should have gone to stable investment selections went to the highly speculative vehicles, and vice versa. Even more unfortunately, this area of activity was not supervised and audited half as well as it should have been....by the time it was noticed, several months had gone by, and the stock market had suffered a bit of a setback. Millions of dollars were lost by SuperBigCorp getting it straightened out. They had to let the poor fellow go, in disgrace. The Chief of IT was reported to have said, that if the market had just moved the other way, the programmer would have been a hero...

    --
    There is no God, and Dirac is his prophet.
  119. Rounded down GPAs by Anonymous Coward · · Score: 0

    I worked for a college and recent graduate job search site. Users who hadn't logged in in a long time were archived. The code I wrote that re-activated those users if they happened to log in truncated (rounded down) their GPAs. Those users went from having a GPA of, say, 3.9 to 3.0, for several hours. When I found out I restored the GPAs from backup, but the intervening time was a very harrowing, thinking that I could have potentially impacted the careers of lots of people.

  120. The story of David Alexander. by VAXcat · · Score: 1

    Back in the mid 80s, I was fortunate enough to get my first programming job. I worked with an incredibly capable programmer, let's call him Dr. Bob. I learned a great deal about programming from kindly Dr. Bob - he was a whiz at PDP11 and VAX assembly coding, and a great mentor. One day we came back from lunch and he picked up his mail and messages from the department secretary on the way to his desk. He opened one of the envelopes he'd gotten, read the letter within briefly, then started cursing like a sailor and threw the letter in the trash. He stalked off in a rage. I retrieved the letter and saw it was a page from a phone book, with the name "David Alexander" circled. After a couple hours, when Dr. Bob had calmed down. I told him he had to tell me what was going on. It turns out that his very first assembly language programming gig had been at the local University. It involved managing the data for a planned 50 year long psychology experiment, tracking the names, addresses, and project info for all of the participants over time. Now this was the mid 70s, so there was no database, just a bunch of tape files and MACRO programs to do the updating and reporting. Dr. Bob really liked the work, and the folks in the Psych Dept were really friendly, it was a great atmosphere. One day, Bob made....the Big Mistake. Due to some typoes, he inadvertently replaced the name and address info in every record in the files with the data from the first record....David Alexander's. This was a tape database and it only went back a few tapes worth....by the time it got noticed it was too late - all the good data was gone. The long range experiment was totally destroyed since they couldn't track the participants. He had to quit in disgrace - he said what really upset him was the way the Psych Dept folks were so nice about it and didn't want to fire him. Anyway, that's bad enough...but when his "friends" caught wind of it, they started popping up David Alexander references everywhere they could - they'd leave him phone messages from David Alexander, they'd get mailings sent to his address to David Alexander, and so forth. By the time this event I saw occurred, it had been going on for years (for all I know it still is). Anyway, due to kindly Dr. Bob's David Alexander mistake, I always check my code just a leetle more carefully than I otherwise might be bothered to - I personally don't ever want to make my Big Mistake....

    --
    There is no God, and Dirac is his prophet.
  121. interesting synchronicity by epine · · Score: 2

    Just fifteen minutes ago I realized that my script to refactor the primary file server (newly converted to ZFS) into more sensible datasets had an irritating detail wrong (a path element was being duplicated in some paths).

    I said to myself "oh, I'll just roll that whole thing back to the snapshot I made 30 minutes ago".

    Then I go "zfs list -t snapshot" and discover that my snapshot was holding onto 0 GB because I forgot the -r switch to make the snapshot recursive.

    Oh, well. By some impossible-to-separate mixture of good management and good fortune, it turns out I had a set of (different) snapshots from the last two days covering all datasets in questions. I lost very little work (only scripts were executed against these datasets and I still have all the scripts).

    My real screw up?

    Back in my second co-op workterm job, I managed not to notice that a system I was backing up changed the order of the listed drives between two very similar screen requests that I made almost immediately one after the other. Unfortunately, on the second pass I selected the active system drive as the recipient of the system backup, picking from the position in the menu where the desired destination drive had appeared moments before.

    I had become accustomed to my home system being deterministic in the order it listed things. My bad.

    This is back at the very beginnings of the 4.77 MHz era, so my PC was actually not yet what we now know as a "PC" (its father had an S-100, and its mother had a itty-bitty CRT).

    Thirty years later I still can't type dd of=/dev/ada3 without making three trips to the metaphorical bathroom.

    Whenever I type a disk-level dd command, I leave the sudo off, until after the third proof-read and several console consultations in which at least two different programs give me the same view of the drive name.

    In dollar costs I couldn't say. In psychic cost, it's indelibly etched onto my permanent record.

    I had a co-worker once (EEng) who claimed that as a junior intern during the late 1990s back when laser gear for fiber optics was all the rage, he routinely fried extremely delicate $2000 DUTs while the old hands just shrugged their shoulders. Dotcom dollars. Who really gave a fuck? It was considered barely worse than ruining a nice chair.

  122. does crashing the work car count? by Anonymous Coward · · Score: 0

    Crashed the work car, got fired.

  123. $22M - 6 hrs of downtime by Anonymous Coward · · Score: 1

    $22M - 6 hrs of downtime for 1 application due to a corrupted DB. I typed what the vendor told me to type into sqlplus. The vendor was clueless, obviously. Took about an hour to determine the root cause, took another hour to find a real DB (on staff) then some more time to bring him up-to speed and restore from daily backups.

    Over 20K workers couldn't do anything that day.

    The lead technical architect (hired gun), my team, and the direct business clients who knew protected me. S-VPs in the client organization all wanted to fire someone - me. They never found out who to fire. However, I've been stuck in the same position the last 8 yrs. No promotion since.

    1. Re:$22M - 6 hrs of downtime by retchdog · · Score: 1

      sounds like that $22M is a total bullshit figure, unless those 20K workers were each costing ~$300K/year and working solid days without wasting any time on coffee breaks, web browsing, etc.

      get another job and quit falling for bullshit.

      --
      "They were pure niggers." – Noam Chomsky
  124. RF Wattmeter Sensor by Anonymous Coward · · Score: 0

    First job out of school, working as a line tech testing/tuning RF amplifiers. Forgot to put in a 2nd 30db attenuator in front of a HP 435 watt meter. Put 1000 watts into a 1 mW (0db) sensor. I felt terrible - the sensor was $1500 at the time, about 2 weeks of my pay. I offered to pay, but my boss said, just don't let it happen again. (I worked there for 5 years, and I never let it happen again). I wish all of my bosses were like him (I was spoiled early on in my career). Thanks, John!

  125. Dropped a RAID by Anonymous Coward · · Score: 0

    Back in the day when drives were expensive I dropped a 16-drive FC RAID. It was supposed to lock when I pulled it out of the rack but it didn't. At least I jumped back quickly enough so it didn't drop on my feet. Must have been between 20 and 30k.

  126. Fun with lasers by NormalVisual · · Score: 1

    My personal best was when I was writing the firmware for a customer's laser marker system. It was a big industrial machine that moved the laser head on a very expensive gantry using 15-pound servos that could generate ungodly amounts of torque. I had a bug in the code that drove the servos, and I issued a command to home the gantry, after which the X-axis went zipping across as fast as it would go. Wouldn't have been a problem except there was a faulty limit switch on that end of the axis, so the 25-pound laser head got slammed into the stops at what we estimated was about 100 inches per second. Totally destroyed the laser head (there's nothing more disheartening to hear than the tinkling of broken steering mirrors and seeing a cracked flat field lens as a bonus), and caused some severe mechanical damage to the rest of the assembly. Fortunately the motors shut down automatically when the temperature sensor tripped, but it wasn't fun explaining to the boss that we had to replace about $30,000 of hardware.

    My favorites are those I thankfully had nothing at all to do with - where I am now, we write and maintain the warehouse management software for a very, very large snack food vendor, and we have a VPN link to all of the plants to maintain and monitor what's going on. It's happened before where co-workers haven't paid close enough attention and have connected to live plants instead of the test systems, and accidentally shut down the warehouse, which means production gets shut down too since there's nowhere to put those thousands and thousands of bags of chips until the warehouse system comes back up, and it takes them hours to get stuff restarted and settled once that happens. I don't know how much it costs, but it can't be cheap. I'm also not sure why we don't have some kind of two-factor system with a unique key for each plant to keep that from happening. [shrug]

    --
    Please stand clear of the doors, por favor mantenganse alejado de las puertas
    1. Re: Fun with lasers by Anonymous Coward · · Score: 0

      Why is this connected to the internet??????

  127. hundreds.... by Anonymous Coward · · Score: 0

    of millions of $US dollars, who know how much world-wide. Huge mistake;
    I'm sure it claimed the lives of many and even some domestice cats as well...

    I was the person responsible for green-lighting the Windows 8.0 release for x86_64 processors...

  128. I nearly cost my company millions by PhilHibbs · · Score: 2

    I nearly cost my employer several million by fixing a bug.

    The first task I was given in my new job was to look at an old system that printed labels to be put on containers of car parts. A message would come in on a serial cable saying what part was going to be needed within a few hours at a car assembly line, the parts were packed into stillages (a frame designed to hold a certain number of a certain part, like bonnets, bumpers, doors panels, etc.) and when a stillage was full, or when a certain amount of time had passed since the first part was picked, then a label was printed, applied to the stillage, and it was dispatched over the road to the factory.

    Every time the serial number rolled over 9999 to 0001, the system would go wrong and stop working. This happened about once a month, and the help desk had a sheet of instructions on how to fix the problem. Some of the staff knew the fix off by heart.

    I looked at the code, found a roll-over bug, and fixed it. Everything was fine, and a couple of years went by with no problems.

    Then, at 3 in the morning, the help desk called me and said that it had happened again. They didn't have the sheet of paper any more, and no-one could remember how to fix it. I rubbed the sleep from my eyes, and tried to get my brain into gear and remember what to do. It took me about an hour talking with a couple of help desk people, and between us we figured out what the fix was, and they called the warehouse and talked them through it.

    The next day I talked with my colleagues, and found out that we had come within a few minutes of triggering a penalty clause for halting the production line that could have run into millions of pounds. This was back in the '90s when millions of pounds were a lot of money!

    I looked back over the code, and found that there were actually two very similar bugs in the code, one of which happened fairly regularly, and one which only happend much more infrequently, but the same fix worked for both of them.

    Back when I first started working in IT, my boss told me, "One day, you will probably make your million pound mistake. In our business, we build systems that, over the course of our careers, will save millions of pounds in lots of small ways. Eventually you will make a mistake, and one of those systems will go wrong, and it might cost millions. Your employer will bear the cost of it, which is why we don't earn those millions ourselves. You have to be prepared for that eventuality. If it happens while you're working for me then I will kick your arse, and maybe I will fire you, but I'd be wrong to do so, that's just the nature of the business that we are in."

  129. 3k to replace a motherboard by Trax3001BBS · · Score: 1

    Not sure if it counts as it was an Amiga 3000 and they came to my house to fix it for free.

    I had a "friend" who brought over a new hard drive to get working on the Amiga I did my best then the system just quit, He then says yep, did the same thing to mine.

  130. sudo poweroff by Anonymous Coward · · Score: 1

    Oops! Wrong terminal!

    I was sshd into a production server and did a poweroff. Meant to run it on my own box. I didn't have authority with our host to ask them to turn it back on and those who did already left for the day. Probably didn't cost the company much since it was a small saas product, but if I pulled that stupidity elsewhere it could have.

  131. Back in the 1970s by Anonymous Coward · · Score: 0

    Coworker was using a dialup (110 baud) modem to a computing service to do his university home work with the bosses permission. This was the way smaller businesses used computers back in the 1970s. It was also very expensive, I did a simulation of air flow through roof structure for energy recovery to heat the building and in 3 hours coding, compiling and and running the simulation the bill was 40 hours pay. Coworkers use was usually cost was less than 1 hours pay, but the program did not stop when he logged-out so it kept running until completion and the bill was 2 years pay.

  132. tl;dr: tore up a ptz with an infinite loop by Anonymous Coward · · Score: 0

    My first job out of college involved work for a private company performing work for the Navy Research Laboratory on a secret project that is now declassified. We were working on a pan-tilt-zoom camera system with thermal imaging to track a Navy Seal diving in heavy fog conditions. The team at the NRL attempted to track him with hydrophonic sensors (Optical fiber acoustical) to detect the popping of microscopic bubbles in closed circuit rebreather equipment. We were competing against big companies like Northrup Grumman for the contract, and we were working on a demo to show off our capabilities.

    Our first demo to them of our camera system destroyed itself spectacularly by spinning uncontrollably at a high speed until it ripped its worm gear apart and caused untold amounts of financial damage to the several hundred thousand dollar unit. I was new to threading at the time and our senior developer was not very good at it either, and we had no unit testing or continuous build system of any kind.

    He had decided to use a lock with a loop that had a condition that never released the lock in the laser range finder (LRF) code that tracks distance to target. I had warned him I didn't think we needed the lock as long as we allocated new memory for the data and passed it back on the main message queue, which was true. At the time I thought it was a minor inconvenience that he insisted on adding that piece of code. How wrong I was.

    It was tragically the only part of the entire program that he wrote. I ended up writing the entire software, which was silly considering I was right out of college. And, being inexperienced at the time I didn't identify that his code would have seized up our entire system. We had no chance to test his piece of code before the demo because he took too long to finish it. He added it in at the very last moment.

    In retrospect I should have designed the plugin framework so that an infinite loop in one of the plugins wouldn't have been able to grind everything to a halt. But, as it was, we were pressured for time and I was inexperienced. We went back to the drawing board, fixed the issue, resubmitted our demo to the NRL and won the contract because the Northrup Grumman system failed every demonstration. However, my company ended up asking me for my resignation instead of firing me in the end because I criticized management. That salaried job only lasted me only a year and it was a lesson to me in more ways than one.

  133. Fumbling around by lucm · · Score: 1

    USB connectors also fit neatly in RJ45 ports, and this too can lead to interesting side-effects.

    --
    lucm, indeed.
  134. Those were the days by Anonymous Coward · · Score: 0

    In 1996 I spilt a pint on top of a running 486 computer with an open case! Cost me about a grand which was my entire worth. :(

  135. Lightning fried network switches $8k by zerofoo · · Score: 1

    I temporarily ran a copper network cable out of a window to another building while our building to building fiber was being installed.

    Over a weekend we had huge lightning storms. The voltages induced in the unshielded twisted pair cable hanging outside 3 floors up fried both switches on either end of the cable.

    That was an $8000 mistake.

  136. dropped stores of retailer from database by Anonymous Coward · · Score: 0

    Some half-experimental but in-production code (back in them cowboy coder days).. had a little "logical fault" one morning and dropped a significant number of stores' data from a retailers database. Fortunately easily fixed, but confidence was shaken and all the morning reports were screwed as I only recovered the data around lunchtime. Cost ... ?? $20k maybe?

  137. Hmm yes. by Falconhell · · Score: 1

    Back in the early 80's, I took off a little too fast in my company station wagon, and $10k DTS Data Terminal hit the road hard. Ooops.

  138. Telecoms classic by Falconhell · · Score: 1

    Not me this one, but a classic.

    One Friday afternoon Telecoms tech was checking a remote unmanned exchange, one of the checks was to measure the levels on the analog multiplexer for the trunks to the main exchange, which acted as the brains for the dumb remote.

    The procedure was to plug a 6.5 mm phone jack, attached to a large fixed meter into each channel at a time. Unfortunately, this chap grabbed the wrong hanging jack, this on having 50v exchange battery on it. He then proceeded to plug into each channel of the carrier system, and was mystified when there were no reading. As he plugged in the last channel, the exchange went totally silent. Whole exchange was down for 2 days.

  139. That's nothing... by Anonymous Coward · · Score: 1

    What about the guy who sold Slashdot to Dice? :)

  140. Everyone makes $1,000,000 mistakes by NothingWasAvailable · · Score: 2

    During a panel discussion with very senior technical leads, the question came up: "How many of you have made a $1,000,000 mistake?"

    Every single one raised their hand. This was a very large semi-conductor company, and everyone had been involved in at least one instance where bad masks were made because a check was skipped or step was botched in the design flow.

    I worked on a chip design where it took six design revs to get clean masks. All five of the prior revs had avoidable (human) errors during the design and build process.

    Pay me now (in time running checks) or pay me later (in nre: non-recoverable expense) for bad hardware.

    1. Re:Everyone makes $1,000,000 mistakes by Macman408 · · Score: 1

      SIX revisions? Hopefully only metal layers, or were some a full base spin too?

      Where I work, we usually go into production on the second revision. Occasionally, the first one is good enough (usually if it's similar to a previous chip). The one I worked on most recently was a brand new design from the ground up with a new team of people, so we shipped the 3rd version (both spins were just metal layers). We (almost) never change the base layer - the case I heard about was when somebody in Marketing told someone in Engineering that there was no way they'd ever want to market a specific part to use >n MB of memory (probably 512 or so), because it was a low-end part. So they put enough address bits on the part for 512 MB - and then not too long after making it, Marketing decided that they needed a 1 GB version too. Then it just became a question of "is it worth a million dollars to be able to sell it with 1 GB?"

      I'm in verification, so my whole job is to make sure we haven't made any million dollar mistakes. I produce no useful output, other than a thumbs up to management right before they start producing wafers. Some mistakes still get past us, but when a million dollars is on the line, some creative changes (often just in software) can help us keep the problem at bay.

      And any time a big mistake gets by, another item gets added to our checklist. Being the first guy to make a particular mistake is usually professionally survivable; everybody makes mistakes sometimes. But being the second guy to make the same mistake does not bode well for your future...

  141. 30K in 30s by VictorTango · · Score: 1

    I once wrote a temperature monitoring system for a cargo airline flying 747s. The system would read the loadplan to determine if there was temperature-sensitive cargo onboard, then after takeoff, would send an ACARS message to an aircraft asking the ECS what the temperature was in each section of the aircraft. The rules table could be set to a different frequency of monitoring based on the exact cargo, so AVI (live animals) would be monitored every 5 minutes, pharmaceuticals every 10, etc. Once the temperature report came back, the system would compare that to determine if the temperature was within limits of the cargo onboard. Anyway, accidentally put zero in the frequency table, and basically DOSd 5 aircraft that were in-air carrying perishables. Realized the error pretty quickly when the monitoring system freaked out, but the data charges alone where about 30k in 30 seconds. ARINC was very nice and waived the fees though - thanks guys!

  142. I blew up a Prime mainframe by Anonymous Coward · · Score: 0

    Back in the mid to late eighties I was taking a basic higher education qualification, they were teaching us COBOL using hard copy data sheets which would be entered by data entry clerks. We learned a little Pascal, some DB2. I was already coding in 6502 assembly language at this point so I thought it was a little backward. I was writing self modifying code and they wanted me to write out programs with a pencil on data entry sheets.

    Second year of the course we got a two week work placement, they put me with a financial services company that specialised in COBOL. They wouldn't let me anywhere near the mainframe, the code but they were magnanimous enough to let me read the report outputs but not the actual results. I made a mean cup of tea and fetched a lot of lunch till they asked me to wire a plug for an extension, I had never wired a plug before and long story short I wired it wrong and blew all the fuses on the mainframe and most of the IT section. I was sent home and asked to never return.

  143. Multiple... by xploraiswakco · · Score: 1

    Warranty work: In the late 90's I was repairing a beige desktop Mac (early PPC), I needed to remove the logic board, and while attempting to pry up the logic board I slipped with the screwdriver, which ripped off a resistor in the process. As it was warranty work on behalf of the manufacturer (I was working for a service agent), all parties agreed it was a mistake that could have happened to any technician, so it continued to be covered.

    Destroyed keyboard: I once spilt a Fanta on a white Apple keyboard, the clear plastic base with the full height keys, the last of it's kind before the current flat aluminium keyboards cam in.

    Almost lost data: I was click happy once during the process of backing up a laptop for a staff member (planning to upgrade the OS), and instead if hitting backup, I hit erase. I was able to restore the data thanks to hard drive erasing only modifying the first block or two on the disk, instead of going to the time and trouble of erasing the entire disk.

  144. $45,000 in downtime at a paper mill by Anonymous Coward · · Score: 0

    Clicked on a remote desktop shortcut that started a second session on a paper grading server. Software on this server crashed brutally when two instances of it were running. The resulting crash blanked setpoints in the control system for the paper machine causing it to go down hard. The control system is designed to feed from the grading system to maintain a consistent quality of paper.

    Three hours downtime from the crash and resulting startup issues. What made it worse was that earlier that day we'd had a similar failure and I'd given explicit instructions to other people not to do what I did.

  145. Not my mistake... by Anonymous Coward · · Score: 0

    But my boss's boss's boss.

    Wasn't gonna make his numbers ($$), so he decided that attrition was his only hope for a $250k bonus. So he decided to encourage it at his third largest site, by moving everyone there (180 people, 165 programmers) to other sites (one on the East coast, one on the West).

    Miscalculations: (1) two of his three most profitable products were centered there. (2) Most of the programmers were married... due to company rules, spouses would have had to transfer one to each coast. (3) A large fraction of the staff had voluntarily moved to this site for quality-of-life reasons.

    Result: 155 people (including me) quit and engineering continuity was lost on several products, including the two most profitable.

    I went off and found a job at another company. Two years later, my employer is bought by my old employer because they no longer had a competitive product. The week we were acquired, the old boss 'left to pursue other opportunities'.

    Cost? Certainly more than $20M. Probable cost? > $200M, and the company has long since been bought out.

  146. Powerful mistake by gtarthur · · Score: 2

    Back in the 70's when I was still a junior electrical design engineer working for a distribution transformer company, we used algorithms loaded into TI calculators to compute the electrical, heat, and mechanical stresses. I later got the task of modernizing those codes and merging them with a FORTRAN code that another engineer had written and abandoned because it was too expensive to run. Things went well at first, we saved a lot of time and used that as any good engineer would to optimize our designs using different parameters to reduce cost and improve efficiency, both very important to my company and its customers. Then one day we got a limiting case which we didn't recognize at the time. As usual, one of our engineering assistants used the computer generated design and the old methods to validate the design. The engineer always takes responsibility for the design. After the build, the unit, a 3 phase unit that had 76,000 volt inputs, was tested in our "hi pot" chamber - a voltage pulse of the rated voltage but with reduced current and only for a short pulse. The center core winding turned into shards of copper spaghetti in the 8 foot tall tank. It cost $25,000 to repair, and delayed delivery for 3 weeks. My heart rate hit about 200 when the engineering manager called me and my supervisor into his office. Then he explained that he had run the calculations also, and discovered that our methods had a flaw in the prediction of the axial forces on the center coil. It was a very subtle mistake, and he said it could have been much worse. We were able to revise the code within a few hours, and that incident led to further improvements in methods and automation. It also taught me my most important lesson about computers - human error is the greatest risk. Real tests of your code sometimes do "blow up".

    --
    Every change is not progress, but there is no progress without change.
  147. Comment removed by account_deleted · · Score: 1

    Comment removed based on user account deletion

  148. I broke ebay.com once by Anonymous Coward · · Score: 0

    Several years ago, i was deploying a new interface monitor to all of eBay's solaris database servers. In the code I did a "netstat -i" in order to enumerate the interfaces available to collect statistics from.

    Turns out, "netstat -i" reads the entire netstat table AND tries to do reverse lookups on all the IP addresses, and then just spits out the interface names. Oops.

    Withing 5 minutes of pushing out the new monitor, DNS for the entire company had rolled over, and all the application servers were no longer able to connect back to the DB's. It took a good 15 minutes to figure out to change the command to "netstat -in", roll up the change and push it out. Took another 10-15 mins for things to clear up.

    Total cost of the outage: Approximately $800k in lost revenue.

    The moral of the story: Always leave a note.

  149. got a router to hang ... by oneiros27 · · Score: 1

    I managed to flood it with enough data that it locked up, and required a manual reset. The second and third time that I did it, the network admins were getting much faster about fixing it, but my boss told me to stop doing it.

    I have no idea how much it cost ... but it was the router that fed NASA Goddard's active missions, and I was told that the Hubble folks were getting upset when it kept happening.

    I didn't get fired, as I was testing to ensure that we had sufficient bandwidth for SDO data transfers. (we didn't ... and I probably didn't need to run the additional tests to prove it). It did convince them to move us over to an isolated network when we moved offices, though.

    --
    Build it, and they will come^Hplain.
  150. Got Lucky by Drethon · · Score: 1

    I dropped a 50k sensor on the ground but it tested out fine afterward. It was used for development so if there was hidden damage it didn't really matter.

  151. You guys are all pikers by Anonymous Coward · · Score: 0

    In 1978, I made a programming error on a server for a bank's teller network. The day the problem was discovered the banks internal cash control accounts with amounts larger than about 2 billion dollars suddenly started displaying apparently random negative balances. I was late to work that morning which caused my bosses to suspect that I'd somehow stolen about 20 billion dollars (that was back when a billion dollars was real money). When I finally showed up around 10AM, my coworkers were trying to figure out how I'd done it and whether I had flown to Costa Rica like Robert Vesco.

  152. Mixup for bill of materials by Anonymous Coward · · Score: 0

    On of my job descriptions besides software development for financial systems is the design of the hardware infrastructure on which the system are going to be deployed. I end up doing also the initial bill of materials in which I have to give a first estimate of the cost of the final bill. Mind that this are always >$500K bill of materials and most of the time >$1mil. A lot of the projects happen in developing countries where the technical expertise is almost zero, where there are a lot of changes requested by the client or by the party who finances the project (USAID, World Bank, BERD etc.). We have to integrate the new hardware in the existing environment of the client so there are a lot of restrictions. There is always an army of "consultants" that always fucks up things with last minutes changes or badly made initial description of the existing environment. It happened several times to forget to mention some hardware in the initial bill of material or add some incompatible hardware that has to be replaced latter because some "consultant" made some copy paste document from another project listing non-existing equipment at the client site. You cannot imagine the sheer number of money that gets wasted this way in this big infrastructure projects. My record of "additional" costs that ended up being supported on a project by my company is around $100K. In total I think I exceeded some amount more than my last 10 years salary.

  153. I grounded a 747... by Anonymous Coward · · Score: 0

    ..for two days. The irate engineer on the phone told me it cost a million pounds a day. And all for the lack of a version check in the online maintenance manuals that we were delivering.

    AC, I think.

  154. Clicking hard drive by Anonymous Coward · · Score: 0

    I used to work for a small development company that did not have a proper backup policy. One day, the main hard drive of the development server started making clicking sounds. We took a backup onto another server in the network as soon as we heard the noise for the first time. The replacement disk arrived two days later. I took a fresh backup onto an empty drive that I had connected to the machine, removed the failing disk, connected the replacement disk and got ready to start transferring the contents of the backup. The supplier of the replacement disk took the clicking hard drive with them. Then it happened: instead of formatting the new hard drive, I formatted the drive that contained the fresh backup... I was facing the prospect of loosing 1.5 days of the development team's work. It was pretty scary.
    I quickly rang the guy who had taken the old hard drive away and asked him if he still had it. Fortunately, he had not done anything with it yet. He brought back the disk like right away. I transferred the data once more and we were back on track one hour later. This all happened around lunch time, so none of my bosses were there to stress me out while the drama unfolded. By the time they came back in the office everything had been sorted. I did tell them anyway. They were a little shocked but I wasn't told off or anything. The backup policy remained unchanged. :(

  155. Too bad, I burn hardware regularly by Anonymous Coward · · Score: 0

    It's all the hardware designers' fault. If your hardware doesn't include perfectly labeled crocodile clip sized pins and above everything else places direct-to-CPU pins next to the power supply, it's good as fried.

    I appreciate that the final hardware has to be tiny but don't expect the software guy working on the prototype boards to be some sort of manual ability prodigy.

    If he were, he'd be building premium automatic clocks instead of software to work around your cheap components.

    So, as someone tells you the software dude has driven 12V into the UART pin, *again*, be sure to remember it's all your damn fault.

  156. $100K in cabling that went unused by carlos92 · · Score: 1

    I was tasked with a fiber cabling project for a new upstream connection at a small ISP. I documented the requirements, placed a purchase order, interviewed contractors, recommended one of them and went ahead with the project. My boss was downsized during this process, and when I informed my new boss that the cabling was completed and that his signature was required in some document in order for the contractor to be paid, he said something along the lines of "did nobody tell you that the upstream connection will not use that kind of fiber?" I wanted to die at that moment, but the fact was that it wasn't my fault - it was a consequence of the massive layoffs, the resulting chaos, and the deficient flow of information.

    1. Re:$100K in cabling that went unused by Anonymous Coward · · Score: 0

      What happened to the cabling? They could have sold it on eBay or something : ?

  157. Valar Morghulis. by Anonymous Coward · · Score: 0

    Valar Idiotis.

  158. Convinced my boss not to use ColdFusion by Shag · · Score: 1

    ...by writing a simple page and putting it under load on a Sun E4500... which was the front end of our dot-com's website. We were only invisible to the rest of the world for a few minutes, thankfully...

    --
    Village idiot in some extremely smart villages.
  159. when copying 2Tb of .pst files to a usb drive by Anonymous Coward · · Score: 0

    /PURGE on your diskcopy is a bad choice when you've just sat the disk on top of an unstable rack shelf.... because *I* would be the only one in there and *I* would know that its there and *I* won't forget about it or *I* wont trip over the cable.. ummm... I was wrong... about a whole 1Tb of wrong.. :( data irrecoverable even by professional recovery crew.. not sacked just self-shamed. (did get budget to fix the problem - the usb was a "patch" because "there was no money".. hummm)

  160. Gatwick airport by Anonymous Coward · · Score: 0

    Around 1997 I was working for a subsidiary of British airways at Gatwick airport, UK. The lab they had their was a token ring network and the topology was completely flat - nothing was segmented.

    So it was a Friday afternoon, and we had received these printer adapters that you plug j to the network that turns a normal parallel printer into a network printer. We started plugging them in and went to configure them - but I and behold they had already picked up an IP address (no bootp or dhcp at work though). So we we were all mystified as to how they got configured - but didn't think too much of it and all went home for the weekend.

    Come Monday.......

    I rock up to work and there's this big hoo haa going on - apparently our little adapter things had managed to pick the IP addresses of the baggage handling systems at Gatwick. This caused delays of a few hours to all the flights - think this cost them in the region of several million pounds.

  161. New to SQL Server 7 by Anonymous Coward · · Score: 0

    Query Analyzer open all set to run a "DELETE FROM TABLE WHERE CONDITION", but didn't realize that the "DELETE FROM TABLE" was highlighted. Turns out SQL Server runs the highlighted part only. Nuked the sales table. Oops. Boss was good about it. He took half the blame for not having a decent backup routine. Setup a heluva backup after that.

  162. A hastily applied patch by Anonymous Coward · · Score: 0

    Cost $120,000 (AUD but it was about at USD parity at that time). All in just a few minutes. It did influence me directly in that my bonus was smaller, but markets were good and I made most of that back over a month, so didn't hurt too bad.

    I learned my lesson that thorough testing applies in every circumstance, even if the change was small.

  163. Oops... by Anonymous Coward · · Score: 0

    I once blanked out one of the big boards at a major US exchange in the middle of the trading day. I had no idea until I started getting angry phone calls from floor traders. Not sure the monetary loss, but I don't think it was too bad - I didn't get fired at least.

  164. Subprime Mortgage crash by EmperorOfCanada · · Score: 1

    I read an article years ago about a guy who developed the software that made transacting CDOs (Collateralized debt obligations) much easier. Basically that lead to the entire sub-prime mortgage industry which lead to 2008. So I think that he wins this whole discussion.

  165. I'm not sure by Anonymous Coward · · Score: 0

    Once upon a time, I worked for a defense contractor, which did some work for the NSA.

    Obviously it wasn't just me It wasn't just me working there, and we weren't the only ones working on this project and related projects. Still, I'm pretty sure that in my small way, I cost the civilized world a lot.

  166. Upgrading syslog by Anonymous Coward · · Score: 0

    I was tasked with updating rsyslog to rsyslog5 on a whole environment of RHEL based systems. Procedurally, I was not allowed auditing access to most of the machines targeted for the upgrade, and was assured that the test environment had the same basic deployment as the working servers.

    *Hah*. To update rsyslog to rsyslog5, you need to delete the old "rsyslog" package. On RHEL 5, if you don't happen to have 'sysklogd' installed , yum removes *every single package* dependent on on the "syslog" metadependency, so it takes out "yum" itself. Hilarity ensues, because it also takes out the daemons which have 'syslog" dependencies. It never showed up in testing because most of the servers had been hand installed or imaged by the "architect" who absolutely refused to document *any* of his procedures, because "last time he bothered, the Wiki blew up"

    It took down over a hundred servers and led to a lot of panicked restoration work.

  167. DNS for 10,000 systems due to bad source control by Anonymous Coward · · Score: 0

    I ran into the "Perforce symlink" error. A company, who shall remain named, had a very large network. The clown writing the DNS for all of this insisted on using a single large text file, whitespace instead of tab separated fields, with no verification step, When working with this file, I used it in a build system with symlinks to development or live code, as needed, so I could test it with other components.

    All well and good, but Perforce lied about symlink changes. I'm not sure if it still lies: when you *changed* a symlink in Perforce, the local copy would be changed, but it wouldn't get altered in the actual upstream source control. So if you checked out that workspace again, well, you had the original symlink. The only way to reliably change the symlink was to delete the link, commit that, then make a new link and commit the new one. The result was that I made a dev workspace, checked out a clean copy and edited what I thought were files in dev, but wound up editing files in production. And with absolutely no verification tools available for production, I made a mistake, and it got pushed to prod, and I got screamed at for touching the production code.

  168. Whoops by Anonymous Coward · · Score: 0

    Forgive me, father, for I have sinned.

    I was a small child in a local university's SUMMER FUN COMPUTER CAMP, eighteen or nineteen years ago. Most of the classes were taught by undergrads feigning enthusiasm over HTML and embedded Java applets, but I noticed that the computers (400 Mhz, if I remember correctly! State of the art!) had weird breadboards hooked up to a COM port. After the supervisors had all quit the lab for the day, I decided I was going to figure out how those breadboards actually worked, so I booted into DOS. There was no assembler anywhere to be found on the computer, so I started messing around with QBASIC's IN and OUT commands. After half an hour of effort, I managed to get some LED's on the breadboard to flash. Success!

    Then the computer shut down. Disappointed but not surprised, I tried starting it back up. Didn't work. I unplugged the computer and plugged it back in. Didn't work. Then I noticed a smell that I would, a few years later, learn to be the scent of magic smoke, so I very calmly stood up, pushed my chair in, and walked out of the lab so I could find a payphone and call my mom for a ride home. I have never told anyone this story before.

  169. I'm very lucky by Anonymous Coward · · Score: 0

    We launched a new version of a popular site at 4am and everything was working fine. It seemed a tiny bit slower than in testing, but nothing worth worrying about. However, as users woke up and started using the site, it began running slower and slower. Once things were in full swing around 9am, it was taking over 10 seconds for each page load. Our first thought was that the server couldn't hold up under the load, but it turned out it wasn't really stressed in the least.

    I can't actually remember how we discovered the problem, but it turned out I had left the wrong database connection string. It was pointing to the development database, which was hosted in a different company's data center than the live site, on the other side of the country, but fortunately had the exact same data in it that the live server had. This wouldn't have been a major problem in and of itself, but someone else on the project had left about a dozen queries on each page without WHERE clauses.

    So, every single page load was sending about 20mb of data between data centers, combined with about 50,000 users hitting refresh repeatedly because pages weren't loading. Transfer overages would have been something like $5000, and we had to rush and fix dozens of queries on a live site, and then figure out how to solve the actual problem of cloning the dev database over the live one (now that users had started filling it with real data) and start using the correct server. At that point I had already been awake for 24 hours.

    Fortunately, there were actually provisions in the hosting provider that allowed for misconfigurations in the first few days of service, so we didn't end up paying anything. The client was only a little bit upset that the first morning of the launch went poorly, but nothing major.

    Beyond that, any major errors have only really cost me my own time (fairly frequently). I haven't been involved in anything that actually caused financial harm to anyone except myself, so I count myself quite lucky, despite having to pull a few overnighters at my own expense.

  170. Wasted Day by Hardhead_7 · · Score: 1

    I worked for a 3PL (third party logistics) company. Years ago, they'd decided they were going to make $$$ with SaaS, basically selling our services to others. A huge undertaking had been embarked upon to make our system usable for other companies. They got a grand total of one client.

    A few years later I was working there, and we got a second client! Bad news was, literally no one was still working there that had been when the first SaaS client had been set up. So there was a lot of guesswork trying to recreate it. I was a Junior Developer at the time, and was tracking down why some data loading wasn't working right. I knew the issue was almost definitely a trigger in the database, so that day I made some changes, loaded the days's data import into the Test DB, and checked if my fixed worked. It didn't, so I cleared out the load, made another change, and did it again. OK, now it was kind of fixed, but there was a problem somewhere else. Wash, rinse repeat.

    I'm sure you see where things went wrong.

    About the sixth or seventh time I did this, I accidentally ran it against production. I distinctly remember the panic that gripped me the moment I hit the F5 key to execute that SQL statement - I realized what I'd done immediately. The drivers (this was a logistics company, remember?) had been out on the road for about two hours at this point, and all the sudden all their handheld devices just stopped working. Where's the next stop? As far as their handheld was concerned they didn't even have a route, much less anything on the truck. This happened for all of the Office Depot drivers in Florida. And we couldn't just reload the day either. After the initial import happened at around 1:00 am a lot of virtual paperwork was done by humans to optimize routes and such, work that couldn't be easily duplicated.

    I spun around in my cubicle and told him what I'd done immediately (I was told later I looked white as a sheet) and he assured me it'd be OK. An hourly snapshot was taken by the database. We'd lose a bit of data, but it wasn't the end of the world. He went to talk to the DB Admin.

    Those snapshots? It turned out six weeks ago they'd just stopped running. Why? I don't think we ever figured out for sure, but either way they weren't there. Now everyone was panicking a bit. This was a new client we'd just picked up and we didn't want to screw the pooch. In the end, they ended up doing an emergency purchase of some software that allowed them to roll the database back using the transaction logs. Fun times.

  171. Beware the killall command in AIX by supremebob · · Score: 1

    I was trying to fix a broken backup process on an AIX box, and found that there were a ton of stuck Legato processes on the system. Rather than kill each one individually, entered the killall command to get the correct syntax to kill all of the processes with legato in the name.

    In Linux, entering killall gives you the syntax on how the killall command works. In the old version of AIX this system was using, it killed EVERYTHING with no warning and basically rebooted the box. That's not usually not a big deal, except that this was the primary SAP database server for a Fortune 500 company. It took the DBA's about a day to clean up the mess.

    The system was clustered, thankfully, but it probably cost about 10K in labor to clean up the mess.

  172. Netware was evil by leonbev · · Score: 1

    I once built a Windows NT 4 system image that used an older version of a Novell Netware driver that was incompatible with the newer version of Netware that the file servers were using.

    It seemed to work fine on the master system that I built, but after that image got deployed to 50 classroom computers it flooded the network with garbage traffic and caused the entire University network (about 500 computers at the time) to crash. It took the network team about two days to figure out what the problem was.

  173. Fried two computers for the price of 1 by p0larity · · Score: 1

    When I was 12 I put the BIOS chip from one motherboard (it was still the kind of EEPROM with pins) into another in an experiment.

    Sadly I didn't know what the orientation of the pins was or what the little dot meant (pin 1) so I must have reversed them.

    Put the BIOS chips back but I had fried both boards.

  174. One-line classic Cisco network outage by Lorens · · Score: 1

    Working on Cisco command line, I was in the habit of typing "no " and doing a double-click-middle-click on the line I wanted to delete. Worked very well except for
    (IIRC)

            redistribute bgp 100 metric 100 metric-type 1 subnets route-map BGP2OSPF

    In this specific copying the entire line after "no " does not remove the line, it just removes the route-map limitation, and hey presto I was redistributing our full BGP into OSP. Clincher was that it took some 20 minutes for the network to actually stop working, so bu that time I had totally forgotten about it. It took an hour to find out what the problem was and to correct it, during which my ISP was basically of the network.

  175. A few interesting ones by nukeade · · Score: 1

    *I had an off-by-one error in a TopCoder problem (I used > instead of >= in a loop) that I didn't catch that cost me $3000 in prize money and a trip to the finals.

    *I was working at an observatory on campus and left the huge, Peltier-cooled CCD for the telescope on a table but still plugged into a computer and left for the day. When I came back, I found that someone had tripped over the cable, smashing the CCD on the floor. They then sat the broken CCD next to the computer without a note or anything. $7000 CCD destroyed.

    *Another time I was working with an AFM in a basement of the university, and left for the day. It stormed really hard that night, and when I came back the next day the basement had 6 inches of water in it. It turns out that the water had come from a leak directly above the AFM. I guess the AFM didn't like getting a shower in filthy storm water and it cost $20-$30K to replace.

    *However, my biggest save was probably more important than all of that combined. Without divulging too many details, I was writing some tests and caught a serious data-loss bug in production before any customers were affected by it. The bug actually made the news: http://www.theregister.co.uk/2...

  176. Approximately $80,000 by Funksaw · · Score: 1

    I'm the amateur programmer who first programmed the code for Lawrence Lessig's Mayday PAC. I don't know if you remember this, but the site went down on May 2, for about 8 hours, when we were raising roughly $10,000/hr. I had built everything on a LAMP stack and sent everything through a single MySQL database, which just didn't scale. (I was - and still am - an amateur). Luckily, pro developers stepped up and staunched the bleeding, and eventually we moved onto a Ruby-on-Rails system for the front-end and a NodeJS/Google App Engine solution for the backend.

  177. Expensive Mistakes by villageelder1 · · Score: 1

    Back when Linux was much more primitive I had to set the video monitor parameters by hand coding configuration files. And, by accidentally over-specifying the maximum sync rates, I "smoked" the flyback (horizonal output) transformer in a new 21" Sun monitor in short order. I typed in one wrong number and $$$.

  178. Packard Bell - Not even once by stolidobserver · · Score: 1

    Somebody unhooked the cable from inside a cabinet to a spectrum analyzer I was trying to use to monitor a signal I was setting up to a satellite. I thought something was broken and was messing around with the controls to see if anything happened. I finally found the cable wasn't connected about the same time the satellite controller came across screaming that I was about to burn out the satellite. I didn't, but it was a very close almost. When I plugged in that cable there was a huge spike on the screen.

  179. The billion dollar mistake that nearly killed UAL by Nonesuch · · Score: 1

    Three people, working independently, made errors in programming and website updates which nearly bankrupted United Airlines when the errors came together on September 8, 2008. "Shares fell to about $3 from more than $12 in less than an hour, wiping more than $1 billion in value before trading was halted.".

    When the market first opened that Monday, United Airlines was trading at over $12 a share. The public summary of the events state that Chicago Tribune re-indexed their archives, resulting in a six-year-old story about United Airlines bankruptcy to be re-posted on the Web site of The South Florida Sun-Sentinel without a date. Google picked up the "new" article, saw the missing date, and inserted the current date of 9/8/2008. That article was picked up by a research firm, Income Securities Advisers, which then posted a link to it on a page on Bloomberg News, which sent a news alert based on the old article. The news alert triggered automated trading systems to issue sell orders. Nasdaq finally ordered a halt in trading the stock at 11:08 a.m, but the damage had been done, United Airlines Stock had lost 75% of it's value.

  180. underestimates... by Creepy · · Score: 2

    Underestimating time needed happens all the time in the software industry. It probably is worse in the gaming industry where publishing deadlines often get set 6 months or more in advance, but I still get hit with guaranteed release dates for customer commitments at my job now where I've put in ~100 hour weeks to fulfill (telecommuting many of these probably saved my marriage, as I would work 4 hours after my wife went to bed). Still, it is nothing like the 160 hour weeks in the office for a game release crunch (and no, that isn't all work - I slept on beanbag chairs in the testing room and they catered in meals, but at some point you're just so burned out and stinking of feet that you need a night sleeping at home and a long shower).

    I can't think of any instance where I've cost a project, but I'm sure they exist. OTOH, I did have a workaround for a $5 million dollar contract where the customer was going to reject our Linux port due to a bug I found and reported. The developer and pubs person assigned the defect were laid off after 9/11 so the defect slipped through to the customer. Fortunately, I overheard a sales person talking about it and supplied the workaround, saving the contract.

  181. I closed the boarders of my country for 30 minutes by Anonymous Coward · · Score: 0

    A number of years ago someone else used my PC and opened a window to the production server using my test server background colours.

    I dropped an index on the production passport server by mistake.

    I paniced, and ran the re-creation script.

    That locked the table.

    For 30 minutes they closed all the customs and security stations in all the international airports in my country.

    I didn't get fired, but I did learn to lock my PC everytime I stood up instead of waiting for the screensaver to do it for me.

  182. Only $250,000 by Anonymous Coward · · Score: 0

    I work for one of those well known companies with many, many computers. I managed to turn about 130,000 servers into glorified space heaters for about a day.

    We fixed the problem, identified the real system issue that allowed a simple mistake to have such catastrophic effects, and wrote code to prevent it from happening again. Because that's what mature companies and teams do instead of firing the one who just happened to be the one pushing buttons that day.

    My second big one at a different shop was losing the private key for our internal CA for our financial software package. I had emphasized the consequences of losing the key, but failed to follow up with making sure my team took appropriate steps to protect it, including putting backup copies in appropriate places. Oops.

  183. Broke Google ads - bad ACL blocked BGP by Anonymous Coward · · Score: 0

    About 15 minute outage at roughly $75,000 revenue/minute. So about a $11.2 million dollar typo. Result - still working there and several promotions.

  184. Empty file by Anonymous Coward · · Score: 0

    When forced to pull new features on an all-nighter for the demo taking place the next morning, I accidentally committed an empty file to version control around 3am or so, which broke half of the application in front of the potential customer later that day.

    The potential customer didn't sign the 30k contract.

    I wasn't fired but left some time after that for a job with less all-nighters.

  185. Exploding Spaceship, $630 million in 2014 dollars by Anonymous Coward · · Score: 0

    So somebody forgot a hyphen in the "computer code instructions" back in 1962 and it cost NASA $80 mill back then, equal to $630 mill or so today. According to this site:

    http://priceonomics.com/the-ty...

  186. Hardware Identification Troubles by Anonymous Coward · · Score: 0

    I applied for and was accepted to my dream job. I had wanted to work there since as early as I can remember. I was told to follow the orders of the higher ups and to never rock the boat. My first team assignment was to help find a piece of missing equipment. We split up into groups and searched around asking people if they had seen it. We saw two guys with machines matching the descriptions and stopped to ask them. I was convinced they were the correct machines but my more senior co-worker talked with the older man and assured me they weren't. I thought he was wrong, but it was my first day so what did I know? In my defense, the protective gear we had to wear was very constrictive. I couldn't hear their conversation, but I could see the documentary crew across the street and didn't want to look foolish.

    To make a long story short, I eventually learned those were the droids we were looking for. I don't know the cost, but my mistake lead to the entire collapse of the Empire. All I can say now is that I'm really glad I was wearing the protective gear. The video was released but no one could identify me. The work assignments were later destroyed when the server overheated. Some idiot forgot to put a grill in front of the exhaust pipe to prevent any back-flow. I don't think anyone can identify me, but I've tried to stay out of sight anyway, that's why I'm posting AC.

  187. Two embarassments... by zaywot · · Score: 1

    When I was a junior programmer working on a mainframe, I was given a problem ticket for an intermittent issue. I stuck diagnostics into the code, but because my disk quota was far to small, I sent the output to a virtual printer that I looped back to my account. Unfortunately, after I got the whole testcase set up (couple hours) the mainframe crashed and I went for coffee along with the rest of the 300 users on the system, for the 10 mins it took to restart. After several days where I hadn't been able to make progress because of the suddenly frequent mainframe crashes, I got a message from the operator asking me to delete my large spool files, since the mainframe was crashing due to a lack of spool space. That's when the penny dropped that my testcase had been exhausting the system spool space, crashing the mainframe about 8 times. Probably $100,000 in lost labour.


    Years later, working on extending some high reliability software, I found some bugs in pre-existing code. The system had some internal checks and watchdog timers that would force a restart if it thought some code was taking too long. Both bugs would trigger the restart system by making something take too long and triggering the watchdog timer. One was in very complicated code, but explained some intermittent issues we'd seen over the years. The other was in a newly released, still unused utility, that didn't work properly on old HW, but would need to be re-written to fix. I only had time to fix and test one bug before going on a month long vacation, so I fixed the complicated one. While I was on vacation, an alpha release of the product went out, and promptly started crashing intermittently with stack corruption issues. I got back, to find six such tickets on my desk. In the meantime, the broken utility had acquired some users, so I decided to spend a couple of days fixing the utility.

    It turned out that the stack corruption issue was holding up the production release, worth many millions of dollars.

    Of course, I wasn't able to reproduce the intermittent stack corruption.

    I spent 3 weeks looking everywhere, trying anything to reproduce it, resorting to rebuilding the alpha load where I could sometimes reproduce it, but not if I loaded my diagnostics.

    Meanwhile, management was getting very antsy about the revenue implications.

    My boss was very good, and sheilded me from the flames, but I didn't like seeing him getting fried, as the release date kept getting pushed.

    I tried hunting around to see if anyone had been changing code in that area of the system, but of course, there were only my updates. I asked anyone I could find for suggestions, and nobody had any ideas until one person said it reminded them of one very old issue they'd worked on, and described the problem they'd had.

    I went back and checked my archived output. Sure enough, I'd been a bit careless testing the broken utility before fixing it. I only checked that my testcase triggered a restart, not why. It turned out that long before it could trigger the watchdog timer, the utility corrupted the stacks of other processes.

    I'd just spent 3 weeks holding up an important release, because I didn't realize I'd already fixed the bug.

  188. Re:Took an online trading company offline for a da by AK+Marc · · Score: 1

    Most people don't realize that 100/full on one side and Auto on the other should properly negotiate to 100/full and 100/half in a duplex mismatch. I've seen that problem many times.

  189. Re:Took an online trading company offline for a da by Anonymous Coward · · Score: 0

    Why was there no ILO/BMC/etc? Easy to fix remotely.

  190. 800 German Marks for pushing a button.. by MoarSauce123 · · Score: 1

    ...a second too early. Worked as broadcasting engineer and cut short a commercial by one second. Lucky me, that was in the middle of the night, so the damage was not that bad. As you may have guessed, that was in Germany and quite a while ago. When I watch commercials on US TV they get cut off constantly, seems as if the ad customers are more forgiving here. Working as broadcasting engineer was awesome except for the craptastic hours and the constant stress of not being allowed to make even a tiny mistake.

  191. Drilling Rig by RockDoctor · · Score: 1
    After 21 days on the job (24x7 cover, typically 20 hour working day) I had to identify one saple of dark grey claystone from one of two possible other types of dark grey claystone. I decided one way, then went to pack my bags to crew-change with my relief.

    The implications of deciding one way not the other were a million dollars worth of ironmongery (9.925in OD liner pipe) being run and cemented into the hole. That operation occupied a rig crew of 90-odd people for 8 days while I was on leave. When we drilled ahead, it became clear that I had been wrong. Total unnecessary cost was about 2 million dollars.

    These days, I don't lose sleep for less than ten million. The fact that I still do work for the client suggests that they figure it's better to have me around than not.

    A couple of years ago I got some grief for pointing out a problem on day 10 of a job, which people upstairs from me decided wasn't likely to be a problem. So they shelved the problem, told me in writing to shut up, and continued with the well. 3 months of work later, we'd made a beautifully-tuned geo-steered well ... and had to wait on weather for a major storm. And when we came back on location, the problem I'd been making a fuss about had come back to haunt us and forty million dollars worth of ironmongery and effort was junk. Several embarrassed faces upstairs, but all my fellow contractors knew who had said "We need to deal with this problem, now." when we were five million into the project. Who needs advertising?

    --
    Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
  192. Bug Hunting by Koutarou · · Score: 1

    Found a gaping goatse-sized security vulnerability in a package that had been outsourced and the original contractors long since gone.

    It was less expensive to just kill the product which we had been selling for about 3 years than to re-engineer the thing from the ground up with new staff.

  193. ticket printers, porn by Anonymous Coward · · Score: 0

    two stories actually:

    1. Half a class C was supposed to be scanned, a whole class C was scanned - knocked down ALL the printers in ___ National Railways. I took the phone call, she was hot, demanded her money back, they didn't get it. Needless to say, they did not renew.

    2. Someone downloaded porn using a customers network. I took the call. He was fired before he got back.

  194. I've done the opposite by phorm · · Score: 1

    I've never heard of somebody *heating* a drive to recover a stuck head, but I've done the opposite.

    Many a drive has been recovered by a day or two's stint in the freezer in deflated ziplock bag. I'd imagine the principle is the same.
    With cooling, you do have to watch out for condensation build-up as the drive defrosts. With the heating I'd worry about damaging the data on the disk (magnets in general do not like heat, so I'd imagine magnetic storage would similarly be a gamble).

  195. About a grand by Anonymous Coward · · Score: 0

    Assumed RAID 5 meant backups were unnecessary.

    I was wrong... ate humble pie and company paid around a grand for data recovery.

    Live and learn....

  196. Fried my family's APCO PC by Anonymous Coward · · Score: 0

    Back in the very early 80's I was the technical of my family, and my father bought us a very early edition APCO pc (Apple Compatible IIe) which had 64k of memory. We even upgraded it with more memory (which came in a plastic tube full of 16 pin chips),

    I mistakenly decided one day to open it up and remove one of the expansion cards without powering it down first. Shorted out the entire motherboard in the process, and lost all the BASIC and INTBASIC coding I had done for the past year (and with no one else owning anything compatible near me, this work was lost forever). Probably lost the family a good $2500 (not sure what the list price was back then).

    Needless to say, I was very mad (and as a 9 year old, this led to some very sad weeks) but my father came through and next month, we were upgraded to an IBM PC, and that was the last Apple product I owned until 2009.

  197. Therac X25 by Anonymous Coward · · Score: 0

    I am wondering why didn't anyone post about the Therac X25 radiotherapy machine. Is software calculated wrongly the radiation doses. Result: six people were killed as a consequence.

  198. Dropped a table in production database by Anonymous Coward · · Score: 0

    I once worked a co-op job at a company that had a mammoth database system that literally drove their entire business. Literally every line of business, from HR through to procurement, was custom built into a single mammoth database. And this was no small business, we're talking thousands of employees across 20-30 locations.

    Anyways, somehow, me, the lowly co-op student, managed to accidently log into the production database, and DROP a table.

    Needless to say, this threw me into a panic once I saw what I had done, and I immediately owned up to the error and ran to the DBA team to tell them what I had done. Thankfully, the company had a good DBA team, and they were able to perform some wizardry to undo the damage.

    I wasn't fired, and they actually praised the DBA team for being able to recover the issue so quickly. Needless to say, security was tightened up, and I was never allowed near a production asset again lol

  199. Firmware date code by Anonymous Coward · · Score: 0

    I'm writing firmware today that stores the date as a 16 bit unsigned integer giving the number of days since 1/1/2000. When printed it is converted to an 8 bit unsigned year and formatted with %02u (2 digits). I'm well aware that this will fail on 1/1/2100, but... I'll almost certainly be dead and no-one will be running this code in 85 years time, surely...

    I'm starting to feel bad about it now.

    I would not be so sure about the code not still running in 2100. Way back in the 70's I was updating existing code and realized that the coding for the date would not work in year 2000, so I modified the code to work in 2000, even though I thought it would be long retired by 2000, which seemed forever away then. Well the code was still running in Y2K and after. Also I happened to have the job of monitoring that code as well as many other programs on Y2K up all night, and it worked. I was glad I had taken the time to fix it.

    I am retired now though.

  200. Computer Numerically-Controlled Machine Tools by david_thornley · · Score: 1

    Since I write software that writes software for machine tools, I have extra opportunities to break things.

    There's a technology called Electrical discharge machining, which means putting stuff close together in a fluid, running current through them, and having sparks burn off little pieces of material until you've got what you want. One manufacturer makes machines that have sophisticated programming, but it's not at all safe. Once, with the support guy from the company we got these from looking over my shoulder, I made a slight mistake that caused the arm of the EDM machine to slam against the metal we were machining, for a $16K repair.

    Another time, a variable contained a Z level (height) that was used for two different things, but for everything we'd done up to then the two different things shared the same value. I was the guy who made the change that made the difference significant, and so some of our CNC mills thought the metal being machined was significantly lower than it was, so the setup moves for the machining that assumed the endmill was moving through air tried slamming through the metal. Some of the results were spectacular, although I never did find the cost.

    Fortunately, at least for my self-esteem, people more experienced than me were supervising each of these mistakes, so I didn't feel too stupid, and my colleagues were very understanding.

    --
    "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
  201. 'Coolest' mistake ever by RingDev · · Score: 1

    A co-worker of mine had just finished implementing a new caching system for a legacy app that interfaced between multiple systems and the mainframe to track progress and shipping of pilot production runs. Due to a bug in his code, in a very specific use case, one of the cached systems would not get flushed. This was identified a few days after the production release when the company (a multi-billion dollar food sciences multi-national corporation) received a phone call from a Pastor in BFE, Minnesota asking why we had sent him almost 500 gallons of ice cream. Apparently, his church's address was in the system from some charity event we had sponsored, since the ID and business type didn't flush from the previous transaction, when the pilot plant told the software to print labels for the next order, it pulled the shipping address from the wrong database and the ID just happened to collide.

    The cost of shipping the ice cream back for disposal was ridiculous. So the company told the Pastor to have a huge ice cream social.

    The responsible developer was not fired, but there were running gags about him being the Ice Cream Man for the next year.

    -Rick

    --
    "Most people in the U.S. wouldn't know they live in a tyrannical state if it walked up and grabbed their junk." - MyFirs
  202. Disconnect LAN by Anonymous Coward · · Score: 0

    While examining a new client's server and checkingn the network my hand slipped at just the wrong moment and I disabled the LAN on the server.

    Lost about an hour's work times 3 or 4 web-designers.

  203. my dad has me beat cold on this by Anonymous Coward · · Score: 0

    Dad never made good computer purchases:

    1. Bought a used TRS-80 with dual floppy drives with a lot of "software"- drives were NOT TANDY drive and eventually failed and all that software was labeled blank disks-the seller kept all the original disks

    2. His 50,000 sq foot retail store he decided to buy with advice from his major vendor of products which he ran the business under their trade name -a new computer system so he can do just in time inventory management back when that was all the rage in retail. He ends up with the register systems with laser scanner wands like JC Penney was using at the time. The computer was a WANG minicomputer of some kind and he contracted and paid $10 grand to a local programmer to code the software to make it all work together including the interfacing to his vendor's ordering system. You guessed it coder never got the software finished and in abut 4-5 years a higher end PC could do the work of the minicomputer for a few grand off the shelf.

    3. After retirement I gifted the parents a PC I built from castoff parts from my own gaming pc which dad promptly re-gifts to a local charity then proceeds to buy a new pc with Win8 on it and the tile screen acts like kryptonite on him and he cant get past the tile screen

  204. deleted VM instead of its snapshot by Anonymous Coward · · Score: 0

    Instead of deleting an old snapshot of a virtual machine I deleted the actual virtual machine datastore (they were named the same) - lost a day's worth of accounting data. The chief accountant wanted to kill me but I survived.