Ask Slashdot: How Much Did Your Biggest Tech Mistake Cost?

I'm retired now by Anonymous Coward · 2015-07-04 05:04 · Score: 5, Funny

But back in the 1960's, I figured we could save a bit of money by only storing the year in our data records. No one would use my program decades later, right? Boy, was I wrong!

Re:I'm retired now by Rei · 2015-07-04 05:11 · Score: 5, Funny

I don't have anything nearly that bad - my worst only cost me data. A friend taught me (while I was still learning Linux) a trick, how you could play music with dd by outputting the sound to /dev/dsp. But as I said, I was still learning Linux and hadn't quite gotten all of the device names into my head, and I mixed /dev/dsp up with /dev/sda...

--
Dear Lord: One of your creatures may be hurt tonight. Please let it be the other creature.
Re:I'm retired now by JMJimmy · 2015-07-04 05:42 · Score: 1

My worst was pretty tame in comparison. Over promised on some specs I couldn't deliver on in the end. Cost the client about $4k - oops.
Re: I'm retired now by Anonymous Coward · 2015-07-04 06:35 · Score: 1

Playing around as root is hazardous which I'm sure you're well aware of now. :)
Re:I'm retired now by JaredOfEuropa · 2015-07-04 09:53 · Score: 5, Interesting

I over-promised on a time estimate once, or rather: I let myself be convinced to pad the estimate. Not by a vendor but by the client! One of the client's systems was due for an upgrade, and between myself and the support guys in India I figured it would be a 19 man-day job. I would run it as a "small project" meaning that I could run it any way I wanted. However, the client asked me: "Can you make the estimate 21 days?" That meant it would be a "proper" project run according to the client's methodology, which the client preferred for budgetary reasons. I had nothing to worry about according to the manager, a PM would be assigned to me to take care of the project formalities. So I agreed.

At the time I was not aware of the unbelievable bureaucracy of large multinationals, and what this would do to my project. Normally I estimate the amount of real work, and add 20% for project management overhead. Maybe another 20% for red tape. But in this case, the PM was more or less forced to involve an ever increasing legion of other teams from various Centers of Excellence in the client's organization. A simple upgrade turned into a project that ran for over half a year. And by agreeing to this approach, I probably cost the client around $300,000. Of course it was mostly their own organization that ran up the cost, and they asked for this in the first place, so they never gave me any grief.

--
If construction was anything like programming, an incorrectly fitted lock would bring down the entire building...
Re:I'm retired now by Anonymous Coward · 2015-07-04 10:15 · Score: 1

I'm sure by the time they were done cooking the books they were ahead by $300,000...
Re:I'm retired now by AmiMoJo · 2015-07-04 10:40 · Score: 4, Funny

I'm writing firmware today that stores the date as a 16 bit unsigned integer giving the number of days since 1/1/2000. When printed it is converted to an 8 bit unsigned year and formatted with %02u (2 digits). I'm well aware that this will fail on 1/1/2100, but... I'll almost certainly be dead and no-one will be running this code in 85 years time, surely...
I'm starting to feel bad about it now.

--
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
Re: I'm retired now by khellendros1984 · 2015-07-04 12:35 · Score: 2

About 14 years ago, I used Linux for the first time, after having used various versions of DOS and Windows starting around 1993. There was so much different about how you use the system, how things get done, and new mindsets to get used to. On top of that, discoverability of device paths, standard Unix utility names, etc is pretty terrible. So yes, "Learning" seems like the appropriate word.

--
It is pitch black. You are likely to be eaten by a grue.
Re: I'm retired now by Trax3001BBS · 2015-07-04 13:23 · Score: 1

Never heard of it huh? It's that operating system that runs the majority of the 'net. Everyone uses it daily even though they're not aware.
I had a three month contract to install fiber optics and set up a new network, the person in charge of me was in charge of all of the computers, the main computer that accessed outside (a gateway if you will) was a Linux, he had no clue how to work on on it, and would touch it on a bet.
I didn't see a problem with that :)
Re: I'm retired now by Rei · 2015-07-04 19:48 · Score: 1

Yep, exact same situation for me. Learned DOS on a 286 and when I was 16-ish a friend started telling me about this neat new operating system which nobody else I knew had heard of, called Linux....

--
Dear Lord: One of your creatures may be hurt tonight. Please let it be the other creature.
Re: I'm retired now by Rei · 2015-07-04 19:50 · Score: 1

Indeed - but I was newly come from Dos / Windows 3.1 / Windows 95 where you're always root and hadn't yet fully groked why it was such a big deal to not do everyday activities as root. ;)

--
Dear Lord: One of your creatures may be hurt tonight. Please let it be the other creature.
Re:I'm retired now by Wolfrider · 2015-07-05 14:47 · Score: 1

--Don't feel too bad, I did a similar thing working on my dad's ancient 500MHz XP PC back in the day. Was trying to DD write to floppy and mistyped it as /dev/sda... Lucky he wasn't using it much, I think we ended up selling it or giving it away to a friend

--
.
== WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??
Re:I'm retired now by AmazingRuss · 2015-07-05 16:25 · Score: 2

The moment I hear "Center of Excellence" I run for the exit.

$24,000 by Anonymous Coward · 2015-07-04 05:08 · Score: 2, Interesting

I was in charge of ordering a leak correlation system for a water utility that I work for. The system I choose was not quite what we needed, but worked. One week after the warranty expired, I dropped the correction unit and it has never worked since. I found out the correlator wad unrepairable and we had to order a whole new system.

Outage.. by steveb3210 · 2015-07-04 05:09 · Score: 4, Interesting

I unplugged the wrong thing in a datacenter once which took 20k domains offline. Traced the cable from the machine to the wall 2 or three times before pulling too..

They didn't have any cable management and only one border router..

Didn't lose my job, I was a very young sysadmin who was learning but good at what I did.. everyone kinda shrugged it off as a lesson learned.

Re:Outage.. by Anonymous Coward · 2015-07-04 05:27 · Score: 4, Informative

DNS servers on the same subnet. You, know, the thing you aren't supposed to do, but everyone does anyway.
Re:Outage.. by jellomizer · 2015-07-04 05:32 · Score: 2, Insightful

As with most mistakes, it is part of a system that is faulty and awaiting one simple mistake to escalate.
Any one human can make a mistake. However a good system should have built in methods to protect against this.
Why wasn't their a backup system, why didn't it have have a fail over network/power, why wasn't there proper labeling.
Chances are there was a culture of trying to save money: paying for a redundant system cost twice as much, or more. Having those network guys spend hours cleaning up and reorganizing where they can be working on more profit driven activities.
They are too focused on being agile and quick, that they will let little things slip.
For 99% of the failures and mistakes that happen it is the fault of the system, and not of the person who happened to make mistakes.
Organizations need to prioritize these methods and follow to make sure they are worked. Not just write them down, post them on some intranet and blame people for not following them if it wasn't followed. It needs the full organization to make sure checks are in place.

--
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
Re:Outage.. by sumdumass · 2015-07-04 05:57 · Score: 1

Something similar. Took almost an entire ISP down. Had a few servers with about 200 domains running bsd located at thier "data center " which was more like a couple shelve and a long bench. Anyways, they where supposed to be running a script to verify two servers were mirroring the other two. I got lazy and stopped checking the logs for it and eventually they stopped running the backups or the script to verify it. One day a drive failed and about 50 domains were off line. I couldn't remote into any server and started getting a run around from their techs so i loaded up all the backup servers i had and a file share with copies of everthing and drove the 200 miles to the isp.
Turns out one of their techs tried to fix the problem by pulling a good drive from one of the other boxes but wasn't the one mirroring the bad drive. This then caused issues in the raid for the good box which he tried to rebuild by pulling the a drive from the mirroring box and ended up breaking all the configs. The worse part is that he thought he had the right tools to fix everything at home and instead of going to get them, he loaded my servers up and took them home.
So i show up, realize i have to start from scratch, set up a couple makeshift boxes that likely wouldn't survive a month, then i connected an old NetWare server. I enabled SMB on the two new servers and started transferring files from the NetWare server. Next thing i know, someone came in and started rebooting all the routers. I looked and jokingly said a reboot is not a fix.
Well, this went on for about two hours with about half a dozen people working on it, making phone calls and claiming they were under some DOS attack. My file transfer was finished, i disconnected the NetWare server, and it all magically stopped. I had misconfigured the SMB and created a packet storm that their routers and modems gladly repeated and multiplied to the point it almost melted their network.
My real servers finally showed back up so i loaded them up, built new ones and had a t3 ran to a commercial building near the house that became their new home. There was a lot of finger pointing and talk about compensation but it got dropped when i reminded them that the only reason i had access of that kind was because they failed to fulfill a contract obligations and then screwed the pooch trying to recover.
Re:Outage.. by jon3k · 2015-07-04 07:06 · Score: 1

My DNS servers are on the same subnet and there isn't one cable anywhere you could unplug that would take them both offline.
Re:Outage.. by turbidostato · 2015-07-04 07:07 · Score: 4, Interesting

"As with most mistakes, it is part of a system that is faulty and awaiting one simple mistake to escalate."
Can't agree any more.
"Chances are there was a culture of trying to save money"
Sometimes the "cargo cult" is so ingrained that even the techs are unable to see it.
Anecdote:
Was in a hiring process, not remember if it was Google or Amazon. One of the questions (from a hands-on tech team lead) was about a single server that went crazy and couldn't spawn any more processes, so it was almost impossible to do nothing with the computer. It still was offering whatever services it hosted just OK.
It went more or less like this:
Me: Has this happened before?
Recruiter: Nope.
Me: So... Can I try this, or that, or this other one?
R: No, because you can't run any new process.
M: Ok, reboot it (I of course know saying somehting like that is taboo for a unix/linux sysadmin). Let's look at the booting messages to see if we get some clue and let's monitor it afterwards to see if this happens again. If that's the case, we will be in better position to diagnose, if not, we will put it on the "computer gnomes" account.
R: Won't try to diagnose anymore before rebooting?
M: Nope. My time is valuable and there will surely be more productive things on my to-do list.
R: But the computer host a service that if turned off will cost the company a bazillion!
M: Nope. If that were the case, the powers-that-be would have engineered the service with high avaliability in mind -which in turn means we could reboot the server without further hesitation. Since that's not the case, the implicit is that business already considered it not a critical service so point above about me costing money still applies.
R: But, but, but...
[...]
Of course, I knew from the very begining the answer he wanted was to find a way to list the process list without spawning a new process so after a while I went throw that route -I vaguely remember there was some Bash built-in that would allow me to do it, but not exactly which one, but back in that time I wanted to see the culture of that place.
There's no need to say I wasn't hired. But I didn't wanted to be hired either. Not within that team at least.
Re:Outage.. by JSG · 2015-07-04 07:42 · Score: 2

My DNS servers are on the same subnet and there isn't one cable anywhere you could unplug that would take them both offline.
What about:
* Router misconfig, takes out default gateway for a while for both
* An extra cable is added and {MR}STP was disabled by accident or something like that.
* etc etc
Anyway, your proud boast may one day discover that people do the funniest things. If your DNS servers are in fact the same box with two IPs ...
Re:Outage.. by Anonymous Coward · 2015-07-04 08:16 · Score: 2, Informative

Be careful about criticizing others. Routers don't have default gateways, they have null routes. They can also be set up to be redundant gateways for others and have many redundant null routes themselves...
Turning off STP on just one router would never be a problem. There are master and standby root bridges. Even if they both go down, others will step in to take the job. It would require a total network shutdown of all layer three equipment before it would be a problem and even then, ttl limits and excess traffic would cause the routers to drop one of the cables in the loop within seconds.
This is entry-level networking knowledge.
Re:Outage.. by jon3k · 2015-07-04 09:13 · Score: 1

Read again carefully:

there isn't one cable anywhere you could unplug that would take them both offline.
I didn't say they were invulnerable. Calm down.
Re:Outage.. by Anonymous Coward · 2015-07-04 09:27 · Score: 1

Domain policies like requiring at least two DNS servers are there as a clue. But as you have illustrated, there are plenty of idiots that will do very stupid things.
You'd be better off if one of your DNS was running on a DSL connection in your basement than having both on same network. I'll leave it as an exercise to figure out why.
Re:Outage.. by steveb3210 · 2015-07-04 10:59 · Score: 2

I unplugged the only border router.
Re:Outage.. by Anonymous Coward · 2015-07-04 11:26 · Score: 1

My DNS servers are on the same subnet and there isn't one cable anywhere you could unplug that would take them both offline.
That's exactly what steveb3210 used to say....
Re:Outage.. by steveb3210 · 2015-07-04 12:06 · Score: 1

The problem had nothing to do with DNS servers, this datacenter had only one border router.
Re:Outage.. by Anonymous Coward · 2015-07-04 12:06 · Score: 3, Interesting

That lets me think about a cleaner who for some unknown reason had the keys to open all rooms including the server room. Around Christmas time she needed to find a wall plug for the Christmas tree. She found one in the server room with the switches/routers/ups/backups/aircos (why she had a key of the server room, nobody knows) and just plugged the Christmas lightning in an unused socket, between UPS and switches. Of course as usual, the Christmas lightning didn't work and short circuited the network, which shutdown the airco power supply. And she just left it there. It was winter, and the servers weren't heating up that much while just idling, but they started to heat up when work started again after the weekend and when they became under heavy load. One failure after the other, the servers started to shut down one of after the other, and it was over 50 degrees Celcius in the server room. I was a programmer, but was ordered to help in emergencies, like dragging new server hardware in and out the room, but spare aircos? That's something we didn't have. On top of that all the specialists of the aircos were on a holiday, those bastards could got the days of during the end year holidays, while the 'IT guys' always had to be present in case of failure. While the system administrators were close to get a heart attack, and already pulled out half of their hairs because they couldn't find the problem, and were like sweating like a horse (remember it was over 50 C in that room), I was the one who noticed the Christmas tree and followed the cable that went over the dropped ceiling into the server room and simply unplugged it. A few moments later the aircos turned on again, one after the other, and within half an hour the temperature went back to the 26-27 degrees and the system administrators could restart the servers again.
I never told them what I did. I had some sympathy for the cleaner, she was a pretty smart Hungarian woman with a degree in Laws and philosophy that was useless in our country, and worked hard (16 hours a day) to give her only son a change to study in our country and get a decent degree and job. If I told, she would certainly be fired right at the time her son would need lots of money to spend on new books for the second semester. I told her of course that she should never enter the server room, and comforted her with the fact that I also was just a worker and didn't tell anyone.

She was grateful for the whole time I worked there. I was eventually the one who got fired, for not wanting to create a Java Applet to power the client side of a web shop in 2011 (!!!!). Some marketing guy had read some completely outdated books about web shops (probably from the nineties) and decided that we also need such an advanced Java Applet based web shop.
They actually wanted to do things with a web client, like editing photos with layers, like a mini Photoshop/Gimp, that could simply not be done with a webclient (maybe it could be done with some advanced Javascripting, but I was no expert in Javascript but it would still be overkill for a simple website). They actually found a fresh college 'Java expert' who was willing to pick up the job. The last time I checked their completely outdated web shop, the Java Applet simply could not be loaded because of security problems. The web shop was marketed to their customers so hard that it backlashed enormously. May customers ended up with malware because Oracle/Sun installed the Ask toolbar (most customers didn't have Java yet) and still couldn't run the Java Applet. So recommendation where done like using XP with IE 6 to run the webshop, and that was in 2013 when the webshop was finally ready.
Ultimately the business went bankrupt because once you go the online service way, customers will find other services when yours sucks
My failure in this was that I could not convince marketing people that they were wrong and I was right. I was fired and found a new, more interesting, higher paying job while they ran their business into the ground in jus
Re: Outage.. by jellomizer · 2015-07-04 13:15 · Score: 1

Well it depends. Sometimes you want someone who will be a cowboy and solve and fix problems on the fly. Other times you want someone who be proactive and give you a safe solution.

--
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
Re: Outage.. by turbidostato · 2015-07-04 15:41 · Score: 2

"I just read you as, "not my problem""
Yes, that's the case... from a certain point of view.
I usually respect enough others' work as to give them their due credit. In this case, it means I credit the system architect as being able to design the system properly. No high availability means it's not a critical server, so I adapt my procedures accordingly.
"figuring out what went wrong is precisely your job"
No, it isn't. My job is to produce the most value for the company within my assigned competencies. Sometimes it means scratch my head for hours to solve a problem. Some others it means reboot/destroy a server wihtout a second look then go to the next item on my to-do list. You know, servers are not pets but cattle.
"You sound like a dick answering a different question than asked"
In fact, I didn't. I was asked to solve the problem, not to diagnose the problem and solve it without rebooting the server, and I honestly gave the answer I considered to be the most effective. As it resulted, it was not the answer my interviewer expected nor wanted but I'm fine with that: in a hiring process the prospective employee is interviewing the employer just as much as the other way around.
Re: Outage.. by Anonymous Coward · 2015-07-04 18:27 · Score: 1

The fact that you said "turn off STP on the router" invalidates your entire response.
Re:Outage.. by ultranova · 2015-07-04 22:03 · Score: 2

Anyway, your proud boast may one day discover that people do the funniest things.

Hmm...
1. Create a domain.
2. Have that domain host a single page saying "Nothing can take down this page."
3. Have that page and DNS server hosted in a datacenter in an enemy country.
4. Sit back and watch.
Weaponized hubris - what could possibly go wrong?

--
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Re: Outage.. by ultranova · 2015-07-04 22:37 · Score: 1

figuring out what went wrong is precisely your job

No, it isn't. My job is to produce the most value for the company within my assigned competencies.

In theory, companies care only about profits. In practice, corporations are made of living humans, are thus living things themselves, and as such care mostly about homeostasis. Profit only enters the picture as food, and like humans whose imaginations they live in, corporations too tend to ignore long-term consequences for immediate gratification, especially since the law gives their parasitic load - the shareholders - control over their actions.
So, as far as the company was concerned, you were carrying - and sticking to - dangerous ideas that could had resulted in changes to corporate culture - to homeostasis. You "tasted" wrong, so you were rejected. I wonder if the whole corporate world could be described in the terms of biology more accurate than in the terms of economics, and perhaps improved through its methods?

--
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Re: Outage.. by jellomizer · 2015-07-05 00:31 · Score: 2

The Job interview process is actually a two way process.
The company needs/wants the resource, that is why they are open positions.
The Person needs/wants a job or a better job, that is why they are applying.
Now even in the height of the last recession and it was a big one. In America average Unemployment was under 10% of the population. While that created a market where employees had the advantage, it was only an advantage not supreme power.
1. The employees wanted people who were currently employed (Using an outdated reasoning that if they weren't laid off then they must be good enough to have made it). So while these applicants may be looking for a better job, they have a job currently and is only willing to take a better offer.
2. If your industry isn't offering the type of work people want to do for the money anymore, then people may make life decisions to go a different route. Go back to school and study a new topic. Use their skills in a different industry.
3. High turnover: Turnover is really expensive on average it takes 150% of the salary to deal with an employees turnover, having to retrain new employees, catch up time etc... If your corporate culture is poison. Then you will have a hard time keeping employees.
I have been on some job interviews where I lost my temper with the recruiter. One company had a very particular piece of software (Like so particular I couldn't find a relative match it with a Google search, except when I added the industry on it, then it was a few pages deep.) The recruiter kept on hounding me on this tool. I asked what does it do, where then I can at least give a general abstract answer to the questions. The they didn't know either. From this interview I got the following impression. The guy who worked on the software (Probably the guy who made it) left the company for a better job. They are trying to find someone with the exact skill sets and pay them as much as the guy who left for a better job. So they let a good resource leave, and they haven't learned from their mistakes and either realize that they will need to lower the requirements, or raise the salary and benefits.

--
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
Re: Outage.. by turbidostato · 2015-07-05 01:50 · Score: 1

"They are trying to find someone with the exact skill sets and pay them as much as the guy who left for a better job. So they let a good resource leave, and they haven't learned from their mistakes and either realize that they will need to lower the requirements, or raise the salary and benefits."
Yes.
Going back to the first post on this thread, all this means that, in a company, "the service" is much deeper and wider than thought at first glance and, say, a server breaking can have its root causes very far away from the server room.
It pays to have an holistic view about the business, but very few companies pay attention to that or are even organized to facilitate such a way.
Re: Outage.. by Wycliffe · 2015-07-07 04:30 · Score: 1

especially since the law gives their parasitic load - the shareholders - control over their actions.
I'm not sure you know the definition of a parasite. A parasite can't survive without its host. The shareholders are the investors and can survive just fine without the company but the company wouldn't even exist without it's shareholders/investors. Calling the shareholders/investors parasites is like calling the leaves on a tree parasites. Without the leaves, the tree has no energy(money) and dies.

"I broke Asia" by Anonymous Coward · 2015-07-04 05:12 · Score: 1

I cost our Asian office a day's work after I failed to verify that a deployment completed successfully.

The deployment was done on Friday evening US time, which would have been around 1 or 2am UK time. I couldn't be bothered to stay up for that so figured that I'd check in the morning.

Naturally I forgot to do that.

Throughout the weekend whenever I was out, I'd suddenly remember and think "I'd better check that when I get back in."

Naturally, I forgot to do that.

On Monday morning, I received a lot of phone calls and emails asking where I was and to get into the office ASAP. When I got in, I found out that the deployment had failed and the rollback scripts that I'd asked the team to run had not been run.

After a lot of frantic phone calls, we found a DBA in the Asia office who still had database access to the Production servers and he rolled the changes back.

By then however, Asia had lost a whole day of work and I was given a written warning by my manager.

It's still a running joke amongst my friends that I "took out all of Asia for a day". And if I ever interview and I can see it's going badly, I tell this story in response to the "What's your weakest asset" question, just to see the look on their faces.

Re:"I broke Asia" by tehcyder · 2015-07-06 01:53 · Score: 1

if I ever interview and I can see it's going badly, I tell this story in response to the "What's your weakest asset" question, just to see the look on their faces.
Um, I think you're supposed to say something like "I am occasionally impatient with people who are less intelligent and driven than me" not "I wasted our Asian operation a whole day's work and got a written warning for my incompetence".

--
To have a right to do a thing is not at all the same as to be right in doing it
Re:"I broke Asia" by HornWumpus · 2015-07-06 05:15 · Score: 1

I sometimes give stupid answers to stupid questions.

--
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'

My HD DVD player and collection... by Shabbs · 2015-07-04 05:14 · Score: 1

Heh - would have to total all that up... sigh... but it still works!

--
Mark

Improper use of systems by pierced2x · 2015-07-04 05:14 · Score: 4, Interesting

I used a system improperly over the course of a month. It connected to some services that ran up a $50k bill. I was mortified when my boss told me, thought for sure I'd be canned on the spot. I was only 22 and it was my first job out of college, so the amount was nearly double what I was being paid. The boss basically took the heat for not having explained it to me better, and I was not reprimanded in any way.

Well... by Jethro · 2015-07-04 05:16 · Score: 4, Interesting

I don't know what monetary cost they assigned to this, but this is the one I got in the most trouble for.

Frankly, it was something I got blamed for. I guess I can take partial responsibility. You guys tell me.

I was the only UNIX guy at this place. We were moving our Main Internal Server to a newer machine. I had set up a cron job to rsync all user data nightly, so that when we transition over the rsync would be faster.

So, the big day comes. I come in on a weekend, do the final rsync, change some DNS entries, shut down old machine, bring new machine up. No problem.

Next day everyone is working happily, everything is working smoothly, no worries.

Or so I thought. Turns out the main developer wanted something off the old server, so he turned it back on to copy his files... and then left it up.

So, during the night, the thing automatically rsyncs and overwrites an entire day's work for about 80 people.

Definitely partially my fault for not disabling the cron job, but I was the only one who got in any kind of trouble at all for this (to the extent of almost losing my job, and frankly that was the catalyst for me leaving that place).

--

In the land of the blind, the one-eyed man is kinky.

Re:Well... by Anonymous Coward · 2015-07-04 05:26 · Score: 1

I'd tell the developer to not touch the servers, and stick to his work. He wants to access the files? You start it up, and shut it down. In the loop at all times.
Or was he someone that was supposed to have unrestricted access to any servers on the premiss?
Re:Well... by drinkypoo · 2015-07-04 05:31 · Score: 5, Insightful

Definitely partially my fault for not disabling the cron job,
Or pulling the network cable. You have to plan for idiots, because there will be idiots. And odds are, they will outrank you.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:Well... by Jethro · 2015-07-04 05:41 · Score: 1

Good call... this was about 20 years ago, and it's not likely that I used rsync (not sure I knew how to do that back then).
My memories of the event are not... perfect. But it's likely that I just used scp to dump entire directories. Couldn't have been using rsync because, as you say, it wouldn't have one as much damage.

--

In the land of the blind, the one-eyed man is kinky.
Re:Well... by Jethro · 2015-07-04 05:44 · Score: 2

They weren't supposed to, but the head developers were like gods at that place. They had the root passowrds and I wasn't allowed to restrict them in any way.
It stemmed from them being among the original 10 people when the company started, and even though the place was now a 200+ employee organisation, in some ways they still ran it like 10-person operation.
I did vocally complain about this. They quite often went in and overrode stuff I did.

--

In the land of the blind, the one-eyed man is kinky.
Re:Well... by Jethro · 2015-07-04 05:46 · Score: 4, Interesting

You know the old saying, "make something idiot-proof and someone will come up with a better idiot."
They'd have plugged it back in. Again, the guy physically went into the server room and pushed a button.
I certainly should've disabled the cron job or, better yet (as pointed out by AC down there) have known what rsync actually was and used that - I know I said I did in the original post but in retrospect I couldn't have as it wouldn't have overwritten everything. This was about 20 years ago...

--

In the land of the blind, the one-eyed man is kinky.
Re:Well... by Jethro · 2015-07-04 06:00 · Score: 1

*laughs*
This was 20 years ago, and in a company that still thought it was very small even though it was medium-sized. The devs ere gods. They outranked me in every way and had root access to all my servers.

--

In the land of the blind, the one-eyed man is kinky.
Re:Well... by adolf · 2015-07-04 06:08 · Score: 1

Everyone else has already told you what you did wrong 20 years ago. Here's my take: If you were actually rsync'ing all of the user data, then the developer wouldn't have known the difference and would never have had the inkling to turn the old machine back on.

--
Kid-proof tablet..
Re:Well... by Jethro · 2015-07-04 06:18 · Score: 1

As I've mentioned, this was about 20 years ago, so I can't really remember it 100%.
However, this was one of those shops that started with about 10 employees, and even though by then it was 200+, it still operated as if it was a small, small company. The head devs were part of the original 10, and they were like gods. They had full access to EVERYTHING. Including root access to all the servers. They were basically allowed to do whatever they wanted.
If something went wrong where they and someone else was involved, it was never their fault.

--

In the land of the blind, the one-eyed man is kinky.
Re:Well... by Jethro · 2015-07-04 06:20 · Score: 1

I believe they were looking for old versions of some files, possibly from directories they never asked to be rsynced.
And, again, 20 years ago. I have definitely learned my lessons AGES since then (:

--

In the land of the blind, the one-eyed man is kinky.
Re:Well... by Kjella · 2015-07-04 06:26 · Score: 1

Or pulling the network cable. You have to plan for idiots, because there will be idiots. And odds are, they will outrank you.
Since this was a server unless he was at the console copying it off to a USB stick he'd probably hook the server back up to the network so he could copy it to his client.

--
Live today, because you never know what tomorrow brings
Re:Well... by barc0001 · 2015-07-04 07:50 · Score: 1

Uh... doesn't rsync have a flag to only sync files that are newer? If 80 people did their work and saved it on the new box, how did rsyncing their data from the old box overwrite newer files?
Re:Well... by ArcadeMan · 2015-07-04 08:04 · Score: 1

You have to plan for idiots, because there will be idiots. And odds are, they will outrank you.
Is that a quote from somewhere? Who said that?
In any case, I'm adding it to my list of quotes.

--
Get free satoshi (Bitcoin) and Dogecoins
Re: Well... by Jethro · 2015-07-04 08:37 · Score: 1

Yup, and as I said to the other people who said that very sane thing "you are right, and I was likely wrong about using rsync. This was 20 years ago and I probably didn't know how to use rsync yet."

--

In the land of the blind, the one-eyed man is kinky.
Re:Well... by R3d+M3rcury · 2015-07-04 08:51 · Score: 4, Funny

Can't speak for a cost, but I thought this one was funny...
A company I used to work for used Lotus Notes. For some reason, and I don't remember exactly what the reason was, I set up my e-mail to copy my mail to another account. I think it was just a "hey, I can do this" thing, playing with the e-mail system. Unfortunately, I made a typo in the name of the account to forward to.
When I came in the next morning, the e-mail system was running really slowly. Everyone was complaining about it. I logged into my e-mail and, low-and-behold, there's all sorts of e-mails in my account complaining about how it couldn't send this message to the other account and, of course, the contents of the e-mail was a message that it couldn't send this message to the other account, and the contents of that message was a complaint that...you get the idea.
I turned off the script and deleted all the e-mails. And, suddenly, from the office next door, I hear, "Hey! E-mail is working again!"
Shhhhh...
Re:Well... by radarskiy · 2015-07-04 09:08 · Score: 1, Offtopic

" And odds are, they will outrank you."
No, the odd are they *are* you.
Re:Well... by drinkypoo · 2015-07-04 09:32 · Score: 1

Is that a quote from somewhere? Who said that?
I'm pretty sure the last part is something I read someplace, if not verbatim then next door, and attached to a similar sentiment. There Will Be Idiots is my motto these days, so it crept in there. I can't find anything, either. Whatever it originally was, I probably read it here.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:Well... by petermgreen · 2015-07-04 15:33 · Score: 1

It does but it isn't always practical to use it.
If all your users do is create and edit files then sure you can use the --update flag and omit the --delete flag making the rsync operation a lot safer.
but if your users are more active that is not so practical. Assuming this storage is used as a work area by developers they are likely to be doing things like deleting files and sometimes even deleting files and replacing them with a copy of an older file (for example deleting a dirty copy of a source tree and replacing it with a clean one). So to copy all the changes you need to use rsync in a far more agressive mode without the --update flag and with the --delete flag.
It was probablly a mistake to put the agressive rsync in a cronjob, it would almost certainly have sufficed to use a less agressive rsync in the cronjob and only use the agressive one manually for the final sync but I can see how someone inexperianced would fail to think of that.
It was also of-course a mistake not to defuse the old server when decomissioning it. Ideally by BOTH disabling the cronjobs and disabling the credentials that allow the decomissioend server to talk to the active servers.

--
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
Re:Well... by JustOK · 2015-07-04 23:15 · Score: 1

Groo sounds like a mendicant.

--
rewriting history since 2109
Re:Well... by kilodelta · 2015-07-05 04:24 · Score: 1

Developers are the bane of system administrators. I had one developer who hose the entire crontab not just on the box but the one in the backup too.

Then there are stupid user tricks - like jamming an RJ-45 connector into an RJ-11 jack.

But my best - I was administering a Data Genral MV9600U running AOS/VS II. They had previously been using async terminals but switched over to an IP stack and Pacer terminal on Macintoshes.

So one day I'm cleaning out the old async cabling - no need for it anymore. Suddenly I hear the system console beeping - not a good sound to hear. I come around from the back of the system to see the system losing all it's data volumes. WTH! I look at the disk array and it's powered down. Cycle the switch, nothing.

Of course my boss flies into the room, big old knot on his forehead. I trace the power cable from the disk array - it's a Hubbel connector and I guess when I was pulling one the async cables out it rotated the power plug just enough to break contact. Twisted it back in and the disks all came up. Rebooted the system and all was fine.
Re:Well... by Jethro · 2015-07-05 04:27 · Score: 1

> Then there are stupid user tricks - like jamming an RJ-45 connector into an RJ-11 jack.
That's... actually impressive...

--

In the land of the blind, the one-eyed man is kinky.
Re:Well... by PatientZero · 2015-07-05 04:43 · Score: 1

I'm gonna have to go against the chorus and lay the blame at your feet, honestly. You left a booby trap for whoever rebooted that server at any point in the future. Had you removed the hard drive or put the machine into a donation pile, I could understand.
Say you get hit by a bus the next week and they hire a new sysadmin. A few days later he's asked to setup a new service and decides to repurpose that unused server. He connects it to the network, boots it, installs updates and new software . . . and then gets pulled onto some other task that takes a day. That night disaster strikes. Is it his fault for not ensuring there were no dangerous cron jobs left on the machine?
Perhaps, but it's much easier to disarm bombs you've designed rather than force the job onto some poor, unsuspecting sap. :)

--
Freedom to fear. Freedom from thought. Freedom to kill.
I guess the War on Terror really is about freedom!
Re:Well... by Jethro · 2015-07-05 04:53 · Score: 1

> I'm gonna have to go against the chorus and lay the blame at your feet, honestly.
That pretty much HAS been the chorus.
And I never said I wasn't (at least) partially to blame - I definitely had a blind-spot.
Also, had a new sysadmin been hired, he'd have no reason to turn the old machine on. Other people WERE aware of what was going on, including the people who would've trained a new guy. What's more, that machine was leased and would have been returned within a week or two, so he couldn't have repurposed it. And he couldn't have pulled any storage from it because that was part of the lease. And even if he could, it was a Sun box and the new one was an AIX box, so stuff wouldn't just run.
And here's another thing... say the new guy gets hired, never touches that machine because he's been told it's being returned in a few days. And then one of the devs turns it on and a week's work gets erased. They would've still blamed the new sysadmin even though he had nothing to do with it.
If you want funny, I actually knew the guy who ended up replacing me through a local Linux user's group. I know he was plagued by the same kind of crap. He tried to update the remote connections to use ssh rather than telnet and almost got fired for THAT.
This was not a good SysAdmin environment.

--

In the land of the blind, the one-eyed man is kinky.
Re:Well... by well_in_theory · 2015-07-05 16:06 · Score: 1

You rsync without --update (-u)?
If you were expecting your post-transition rsync to be faster then I presume you were doing something like either --ignore-existing or --update, in which case the files wouldn't have been overwritten, right?
What happened?
Re:Well... by Jethro · 2015-07-05 16:15 · Score: 1

Like I said in response to many... many other comments, this was about 20 years ago, and I'm likely wrong about using rsync - it's just that that's what I'd (obviously) use now. Chances are I just didn't know about it back then and was doing a straight scp dump.

--

In the land of the blind, the one-eyed man is kinky.
Re:Well... by well_in_theory · 2015-07-05 16:17 · Score: 1

for file in /* ; do scp local remote ; done
Yeah, that one's going to bite you hard.
The sad part is that you even remember this event 20 years later. I bet the guy who booted up the old machine to do what he wanted doesn't.
Re:Well... by Jethro · 2015-07-05 16:25 · Score: 1

I probably went "scp -r directory new_server:direcory/"
Yeah, the devs had full run of that place. In fact, a LOT of people at that place had root access. To the point where I would go around the place when I'd stay late and pull the damn post-its with the root password off peoples' cube walls.
And yeah, I remember it. Because, first, it WAS a defining moment. And second, I kinda remember a LOT of stuff.

--

In the land of the blind, the one-eyed man is kinky.

Patent filing missed. by Elf+M.+Sternberg · 2015-07-04 05:17 · Score: 2

In 1993, I failed to file the US Patent on "A means of accessing a relational database via the Internet." If we'd known we could do it, CompuServe might still be around.

Re: Patent filing missed. by Anonymous Coward · 2015-07-04 05:38 · Score: 1

And that would have been a bullshit patent, Elf.
Re: Patent filing missed. by TheReaperD · 2015-07-04 07:28 · Score: 1

Yea, it would have been a bullsit patent. Could have still made millions from it though. Sadly, that's how our patent system works(?).

--
"Be particularly skeptical when presented with evidence confirming what you already believe." -
Re: Patent filing missed. by Elf+M.+Sternberg · 2015-07-04 09:31 · Score: 3, Interesting

No kidding. I'm glad we didn't. It means I can look at myself in the mirror. Career-wise, I've done okay without it. But it would have been a completely legal patent through which CI$ would have raked in millions and mililons of dollars. And, as far as I can determine, it would have been completely legal. There was no MySQL, no Postgres; OraPerl had *just* been released and was barely stable on SunOS, and there were no known instances of a CGI / OraPerl gateway on the Internet until Pacific Power & Light asked us if it was possible to connect their consumer-oriented energy savings database to that new thing called "the world wide web."

Around $2 Trillion by Anonymous Coward · 2015-07-04 05:19 · Score: 1, Funny

About $2 Trillion.

I worked for the Florida Electoral Commission back around 2000.

Is it purly your mistake. by jellomizer · 2015-07-04 05:19 · Score: 1

I have been part of of a large mistake costing hundreds of thousands of dollars.
However most mistakes are part of a chain of events of little mistakes, where they all combine to a big mistake. For example, if someone happen to trip over a plug that unplugged a production server. Then questions on why was the cable was out where it can be tripped, who decided that it wasn't worth the money to put time, to get a better system of cable management...

Normally a person will get fired for a mistake if it was due to intentional misconduct or it happens to get political and needs someone to blame, however if it happens you need to be sure that you put the blame back on the system (not an individual), then you will need to follow up to fix the system so it doesn't happen again.

Most of the most expensive mistakes, are often due to a huge chain of events. A good system should be in place to stop a simple mistake from escalate into big ones.

--
If something is so important that you feel the need to post it on the internet... It probably isn't that important.

$480 Phone bill by Anonymous Coward · 2015-07-04 05:21 · Score: 2, Funny

When I was 12 years old and hanging out on BBSs in 1989, I didn't realize dialing Gilroy from San Jose was long distance (Both were 408 area code). My parents were not pleased at the nearly $500 phone bill.

Re:$480 Phone bill by ITRambo · 2015-07-04 06:44 · Score: 1

At least your $500 downloaded porn was virus free back in the day.

Well... by Anonymous Coward · 2015-07-04 05:21 · Score: 1

As High Proctor of Fahz, I once led my whole species into unrelenting suicidal despair when during the Chinz-Rahl celebration I passed our Ultron onto Chief Groo, who was not prepared to hold such a heavy object and dropped it.

My Mask of Ultimate Embarrassment and Shame is not enough to express the deep chasm of depression into which I sink.

Tech mistake by hcs_$reboot · 2015-07-04 05:22 · Score: 1

I maneuvered downward the left button of the mouse attached to the computer I was working on which pointer was right on a small gif saying "Send" that technically sent a message I should never have sent. Cost me a lot.

--
Slashdot, fix the reply notifications... You won't get away with it...

Whole computer. by o_ferguson · 2015-07-04 05:24 · Score: 1

Not me, but a friend. In high school the best computer in the school was a 386SX. They decided to upgrade it to a DX by adding a maths co-processor to the main board. So the ordered one, and when it arrived, they gave it to my friend to install for some reason. Now, the chip had one corner cut, which you are supposed to line up with the cut corner on the socket, so you know it's seated the right way. Of course, my friend put it in completely backwards (because it fit an any direction.) So he tries to boot up the computer and nothing happens. So he looks at it again, and realizes the chip is in backwards. So he turns the box off, pulls out the co-processor, rotates it 180 degrees and puts it back in the socket. Unfortunately, misfiring it in the wrong direction had toasted the chip completely, and when he put it into the socket in the correct orientation, the socket locked itself shut, as it's supposed to do. But, since the chip was fried, this effectively locked the motherboard in an unbootable configuration with a dead shop. Sigh.

--
- In Soviet Korea, only old people loose all their bases to Natalie Portman's petrified hot grits overlords.

Re:Whole computer. by tehcyder · 2015-07-06 02:02 · Score: 1

Since when did schools let their pupils perform hardware upgrades?

--
To have a right to do a thing is not at all the same as to be right in doing it
Re:Whole computer. by o_ferguson · 2015-07-06 03:34 · Score: 1

Right? He was in a special needs class, and I think they thought it would make him feel good about himself (Which it totally didn't.) You have to remember that this was an era when they had only one computer tech for the whole 1500 person school, and he was also a shop/electronics teacher, but there were tons of kids runniung around who knew a lot about computers.

--
- In Soviet Korea, only old people loose all their bases to Natalie Portman's petrified hot grits overlords.

$40k SGS by gpmidi · 2015-07-04 05:26 · Score: 1

Dropped and broke a $40k USD Symantec Gateway Security Appliance

Re:$40k SGS by gpmidi · 2015-07-04 05:58 · Score: 1

Can't say that I did. Lol. Suppose I shouldn't be surprised though.
Re:$40k SGS by gpmidi · 2015-07-04 05:58 · Score: 1

As someone who worked with the damn things way more than I care to admit, yes, you're 100% right.
Re:$40k SGS by KGIII · 2015-07-05 01:55 · Score: 1

I hope you capitalized on it by setting it alight and dancing naked around the blaze. It is the only correct thing to do at that point.

--
"So long and thanks for all the fish."
Re:$40k SGS by gpmidi · 2015-07-05 02:06 · Score: 1

We did end up burying one when we finally stopped supporting them.
Re:$40k SGS by KGIII · 2015-07-05 16:48 · Score: 1

Definitely close enough. However, immediately setting it ablaze in the workplace would have made a much more interesting story. I suppose you have more of a point on this planet than doing things for my amusement though. It would have made a hell of a funny story. Burying it is pretty good as well.

--
"So long and thanks for all the fish."

$10k. .... per day by Anonymous Coward · 2015-07-04 05:27 · Score: 2

I made a calculation error that cost $10k per day. Took 9 months to straighten things out.

I later won an award for outstanding work.

Re:$10k. .... per day by binarylarry · 2015-07-04 05:38 · Score: 1

Oh the joys of working at IBM.

--
Mod me down, my New Earth Global Warmingist friends!
Re:$10k. .... per day by yakumo.unr · 2015-07-04 05:55 · Score: 1

I guess you're a banker..

Software bugs by nodan · 2015-07-04 05:28 · Score: 2

Some bugs I've been responsible for, although it's hard to tell exactly what they did cost:
- rounding error when programming a timer in an embedded system, resulting in a baud rate to be 10% off, causing problems with several units shipped to customers
- overflow of an 8-bit counter, resulting in a serial protocol failing

Plus tons of other errors I forgot or haven't been aware of. Total damage for sure thousands of Euros. However, that's probably little for a 25+ years career mostly in software development.

A Photographic Slide by trabby · 2015-07-04 05:31 · Score: 2

Lost a slide for 3rd party client that was to be featured in a skateboarding magazine.
I think one of the coworkers stole it as I did not get along with them.

Insurance claims for that kind of thing can involve the cost of setting up the shoot again, whatever that entails.
Was fired not long after.

About $2M -- But not really a mistake... by jnaujok · 2015-07-04 05:35 · Score: 4, Interesting

Our group at FedEx released code that I wrote on a Saturday night. This was two days before the Apple iPhone 4 shipped. The code worked perfectly, however, despite our repeated warnings about nearly doubling downstream traffic, the downstream systems (like billing and tracking) weren't ready for it.

So, on the day everyone wanted to track their new iPhone, my code shut down all tracking on FedEx for about 12 hours before we could switch the config setting (10 minutes) and the downstream systems could catch up (11+ hours).

Estimate of cost was around $2 million in lost time and revenue and extra calls to customer service. Luckily, since I wasn't actually at fault, and we had multiple email chains backing up the volume estimates and warnings, we didn't get the axe.

--
Life, the Universe, and Everything... in my image.

Re:About $2M -- But not really a mistake... by Tablizer · 2015-07-04 10:52 · Score: 3, Informative

The poster was not the boss. The boss calls the final shots. The technician's job is to present the risks (trade-offs) as accurately and clearly as possible. If the boss(es) then choose to ignore the risk warnings, the blame falls on them. If you usurp their power, you are out the door (unless it's a legal matter).
Incidentally, I was in a somewhat similar situation where marketing planned to release about 30 websites for satellite offices all at once along with a press release about the new sites. I pointed out our "budget-oriented" infrastructure may not be able to handle such a sudden load, and suggested staggering the releases. Other technicians agreed with my warning, but the marketing chief was really disappointed, saying something like, "It's better P/R to have one big release. Staggering the releases takes the punch out of it."
I was tempted to respond, "30 crashed sites is not good P/R either", but smartly bit my tongue (based on prior experience with "reality" statements). He was a true P-H-B, always looking for a cheap short-sighted shortcut, but tried to blame us when his paper tigers got eaten. He drove one guy to retire early. Later he was under investigation for giving contracts to his buddies instead of basing them on merit. Not surprising, his buddies were also idiots.

--
Table-ized A.I.

Two incidents... by Anonymous Coward · 2015-07-04 05:37 · Score: 1

First one, I was lucky... there wasn't a switchover to a new database yet, and I made sure to schedule a large downtime window, because I try to do like Scotty... take the time I think will fix something at the worst, then double it. If the PHB gripes, start into detail. A side effect is that users tend to be happy when stuff is back up earlier than planned.

Well, this was a two node HA cluster back in the day where a certain vendor had a passive node and an active node configuration selling for an insane amount. They were connected via serial connections for heartbeats.

Well, it was time to do a simple update of the machines. I staked out 24 hours, just because I wanted to do backups first.

Well, I did the sysbacks, so I had two tapes of the entire boxes.

Ran one set of updates on both machines, rebooted... all fine. Noticed there was a drive array microcode update... just a 0.0.x update. Well, I tossed that on and rebooted... Well, both boxes blew their kernels. All the data on their drives was gone, because the microcode patch got the array in such a state that one machine started writing garbage to all drives.

At least I was able to restore both machines and build the shared data.from the tapes.

The second one would have been just as bad. I was cleaning out source code tree of .o files and executables... came to found one dev had libraries that were only present in binary only format, and whose only backup was in the tree (where the backup program excluded all binaries for space sake.) Thankfully, the tree was on a NetApp, and a simple copy from a snapshot fixed everything. Were it on another server, I'd have Hell to pay.

Fried an early... by michael_cain · 2015-07-04 05:37 · Score: 2

digital signal processing chip from TI. The $750 (in 1986 dollars) wasn't the big deal. That the parts had serial numbers hand-lettered on them and I had to go back on the waiting list to get a replacement was.

$40,000 - $60,000 by GovCheese · 2015-07-04 05:38 · Score: 1

A long time ago on mainframes. IBM 3083's and VAX's. I was running analysis on some waveform data, took probably about 20 reels of mag tape. Fucking marine seismic data. I sent the big deck of cards down to the floor on a Friday. 1st thing Monday, I had to go the VP's office. He explained that Monday morning, the fucking job was still running. Turns out, instead of sampling the data every 4ms, I accidentally sampled it every 2ms. Back then, you didn't own your mainframes, IBM leased it to you. The VP explained that I cost the company anywhere from $40-60k. Nice guy actually. Texas engineer, cowboy boots and a suit. He politely asked me, "Son, you probably won't be making this mistake again, will you?" I stuck around for another couple of years. Goddamn it took an army to process data back then.

--
"He's using a quantum encryption scheme! That'll take hours to break!"

Re:$40,000 - $60,000 by dbIII · 2015-07-04 15:31 · Score: 1

Funny thing is today someone is probably reprocessing the data from the area next door at 2ms and happy they don't need to redo your stuff. There is a lot of reprocessing of old data going on and some of it is even off the original reels because nobody has format shifted it.
Interesting how seismic data from the 1970s can be read with current software by MS Office documents only a few versions back have problems.
Re:$40,000 - $60,000 by NicBenjamin · 2015-07-04 18:18 · Score: 1

Tells you a lot about the design goals of the people who make the program.
MS wants to sell you a new version of Office, so the file format is always in flux and you buddy with a brand new machine makes documents you can't read until you upgrade.
Siesmologists need to do really long term studies, so they wouldn't even consider making a program that couldn't read the old format perfectly, and they'd probably stubbornly resist a new data format even if it was a good idea.
Re:$40,000 - $60,000 by dbIII · 2015-07-04 19:11 · Score: 1

You again? First point, yes, but you are incorrect with the second point - it's all about published standards to get things done (eg. SEGD) and new standards DO come up all the time and they are not "stubbornly resisted" because they are ALSO published standards and can be easily included in the software along with the old formats.
Re:$40,000 - $60,000 by KGIII · 2015-07-05 02:03 · Score: 1

Somewhere on this planet there needs to be a "Greybeard Bar & Grill." Unfortunately, that place would probably end up being somewhere in Silicon Valley.

--
"So long and thanks for all the fish."

Lost opportunity by Anonymous Coward · 2015-07-04 05:46 · Score: 1

Long before Amazon was ever more than a bookseller in the mid 1990s, a friend and I had this idea of a website that would allow for comparison shopping pulling data from other sites allowing folk to buy the cheapest electrical items possible

We never progressed because we couldn't see any way for it to make money. We had no idea that was the absolute last thing we should have cared about.

So now I'm here, an anonymous coward posting about our total lack of foresight and imagination, and not some rich fecker who owns real-estate like /Slashdot

Took an online trading company offline for a day by Nonesuch · 2015-07-04 05:47 · Score: 4, Interesting

I was hired as a firewall admin at an online trading company, then quickly discovered the director of IT was insane, but kept management happy because he made his numbers by keeping his team constantly understaffed; I was told to work on not just servers, but installing Sun servers in racks, running cable, and fixing just about anything plugged into the network.

I made the mistake of showing competence in networking, so was asked to "expand my role" (new title, same salary), and start working on the switches themselves, including executing an "upgrade" to stacked HP ProCurve switches with VLANs (replacing a hodge-podge of random manufacturer switches). The actual upgrade went fine, basic testing (ping) showed everything stable, but as soon as trading opened the next day, everything went to hell, performance dropped through the floor and customers started calling in about trades timing out. Long story short, turned out that Solaris HME cards were unable to negotiate properly with ProCurve switches, half the machines were dropping packets due to duplex mismatches. There's a reason people call the Sun interface cards "Happy Meal Ethernet"

Cost the company approximately $180,000 in direct and customer exodus losses, and was likely a factor in their eventual collapse. I wasn't fired, but management never trusted me again so I saw the writing on the wall, and quit to do consulting work at a (also doomed) dot-com online supermarket.

On the upside, I was able to make thousands in consulting income from installing those same "lock speed to 100 and duplex to full" Solaris scripts on servers for various customers who also had performance issues plugging in Sun servers to cheap switches.

--

I do not deploy Linux. Ever.

coleco vision by known_coward_69 · 2015-07-04 05:48 · Score: 1

i used to insert the cartridges too hard and broke it to the point where i had to spend 15 minutes playing with it every time i wanted to play a game

I killed three networks, but that was planned. by swschrad · 2015-07-04 05:49 · Score: 2

obsolescence, I got the task to shut 'em down. I also forced a worldwide recall of PC card disk drives in the switches that were the backbone of the Internet when we kept the vendor engineering on the phone all day for a failed switch... and read the duty cycle of the drives to them, like 5 minutes a shot, 10 minutes an hour, when they were running read/write continuously.

but I got a haircut indeed when we had to get out stuff out of a colocate that was shutting down. built a mirror data system for that in the new place, had the trunks up, costed over the traffic. then it was time to demanage and power down the old shelf. telcordia assigned a code to the new unit that was one letter different than the old one.

the good news is I got the new one back up in 20 minutes and they didn't stake me out over an anthill.

--
if this is supposed to be a new economy, how come they still want my old fashioned money?

My $5 million bug by llib_xoc · 2015-07-04 05:49 · Score: 2

We were writing a Unix program to parse transactions from some specialized terminals that read customer invoices and the checks that accompanied them, writing the transactions to digital tape to carry over to the mainframe system. During testing our tapes were compared to tapes generated by the legacy IBM system. Our team lead got a call from the customer liaison *early* on morning saying "Do you realize one of your batches was 5 MILLION DOLLARS SHORT - yes, she was shouting. Turns out that the $5 million transaction was the largest we'd ever tested with so far. All others were less than $999,999. It was my bug - I'd put the sign nybl (half a byte) on top of the most-significant digit of the packed-decimal payment-amount field on the test tape, dropping that digit from the field. Trivial fix - I had just been auditing the relevant code the previous day.

Re:My $5 million bug by baegucb · 2015-07-05 06:01 · Score: 1

If this was in the 1970s and involved the International Travel Association iirc, I was probably the person who discovered this.

I wonder... by waspleg · 2015-07-04 05:50 · Score: 4, Insightful

How many people will refrain from posting because the statute of limitations hasn't run out yet?

Re:I wonder... by dcollins117 · 2015-07-04 07:45 · Score: 4, Interesting

How many people will refrain from posting because the statute of limitations hasn't run out yet?
Well, I'm certainly not going to admit to the most costly mistake as it appears no one realizes it was me and what I had done. So I'm not gonna do it; wouldn't be prudent.
The most embarrassing mistake was I inadvertently brought down the clients' network (a major hospital) during the middle of the day. Didn't realize what I had done until about three minutes later when about a dozen IT guys flooded the computer room paying particular attention to the area I was just working in. It appears I made an error. To this day I am likely persona non grata in that computer room.
Re:I wonder... by AK+Marc · 2015-07-05 16:37 · Score: 1

The only mistake I made that cost money, nobody ever knew about. Had to put $5k on a personal credit card to re-buy an ISDN card. I was out of the country doing an install (I ordered the gear) and found out the hard way that the world is not uniform in ISDN standard. S/T vs U. Oops. Buy card locally, expense card. Leave both cards installed in router. Nobody noticed or cared enough to ever say anything about it. Relatively minor, but was a direct cost to a mistake.

--
Learn to love Alaska

Click of death by Wowsers · 2015-07-04 05:51 · Score: 4, Interesting

My worst IT disaster was suffering from a hard drive failure, click of death. I had warning of a few days of it, and I deliberately kept the pc on 24/7 instead of normal switch on/off, to make sure the drive stayed alive until its replacement arrived.

Obviously I had to turn the pc off to change the drive, it was not hot-swapable. When I powerd the pc up, the old hard drive failed, didn't work at all. I was faced with losing all the data on it. I left the drive alone for months wondering what to do, reading different ideas online, some of them weird.

Eventually I decided to try the least distructive idea first. I put a sheet of paper on the failed drive to make sure the label doesn't come off, and heated up the clothes iron, then applied the iron directly onto the top of the hard drive. When the drive casing was wam enough (not so hot as to make it hard to carry), I took it to my pc, and powered up.

The failed hard drive came to life, and I managed to grab all the files on it onto the new hard drive, uncorrupted.

Out of interest, the failed drive failed about three months before I do forced drive change as a backup / failure prevention. I got lucky.

--
Take Nobody's Word For It.

Re:Click of death by BlackPignouf · 2015-07-04 08:53 · Score: 2

Wait, what?
Re:Click of death by Anonymous Coward · 2015-07-04 12:32 · Score: 2, Informative

Heating it up causes the metal to expand which can unjam a stuck head in some circumstances.
Re:Click of death by Anonymous Coward · 2015-07-04 13:05 · Score: 1

I just had a laptop with a dying drive, needed payroll data off it brought it home power up nothing happens, gave er a good whack right above the drive she hummed to life and i managed to copy... we'll see how the restore goes

Not sure how much $$$ by minimum · 2015-07-04 05:52 · Score: 2

I used to work as a SDH/DWDM admin. In early 2000's, while my colleague screwed up a major firmware update on a STM1/4 ADM and I as senior (haha - I was in my 1st half of 20ies) admin had to drive up to site (since the affected node was unresponsive to management system). After many unsuccessful attempts to recover it, at about 3 am. I decided to hard reboot the node, which caused it to boot up from corrupt firmware bank (it had two of those); which in turn just erased all the configuration, including traffic connections (which is built very robust btw). Since the site was on a (relatively small) island and had only 2 ADM's at the time, I more or less cut off the entire communication with mainland. For morning, I had managed to get my colleagues to ferry me another, fully fitted ADM (our last resort backup scenario was to replace entire node) - but as it turned out, it was in a hurry fitted with cards with different firmware (entire network was in middle of upgrade process) which resulted in same kind of useless "brick" I had already at hand. Although it was very cool to fly ~200km/h to port and back in my sporty car, to pick up the spare (not many police on the island and I had a very good excuse). By the afternoon, my higher-up manager had mobilized a helicopter to personally deliver me fully functional ADM, which we promptly replaced and restored configuration from backup. I still have copy of the local newspapers front page, praising how our company heroically saved the day to restore connection with outer world.
At that time I was already able to make up excuses that would have made BOFH proud, which saved my ass.

Other way round for me by Anonymous Coward · 2015-07-04 05:53 · Score: 2, Interesting

I let a vendor sell me a product without really testing it. Turns out it didn't work (at all) and we lost €50k on license fees for a product we could not use.

I was able to lay the blame on an accountant who had locked us into a 5-year contract in exchange for a minor discount. So I didn't get fired.

F-16 panel flew off in flight by YrWrstNtmr · 2015-07-04 05:54 · Score: 4, Interesting

Some other fool did not install the panel properly, and left one of the three nuts off. Distinctive nuts, used in only one place.
Someone found it overnight, and held it up at the morning meeting. "Anyone know where this goes?" Unfortunately, I did not recognize it as a part one of my systems.

Aircraft flew, panel breaks off, punching several other holes in the side as it departs.
Training mission aborted. much sheet metal work needed.

Actual repair cost? Unknown, but easily 5 figures if not more.

Re:F-16 panel flew off in flight by Tablizer · 2015-07-04 10:31 · Score: 1

Was it too late to re-inspect when mentioned in the meeting? You perhaps could have said, "I don't recognize that nut, but I'm willing to go in and look around."

--
Table-ized A.I.
Re:F-16 panel flew off in flight by Tablizer · 2015-07-04 11:23 · Score: 1

Look on the bright side: if it were an F-35, the panel would have 13 nuts instead of 3, and all be different.
http://tech.slashdot.org/story...

--
Table-ized A.I.

Power cable mistake by Anonymous Coward · 2015-07-04 05:55 · Score: 2, Interesting

Working for a desktop publishing house in it. Spent just under $4000 on 36 inch flat panel displays. Accidentally plugged in printer power cable. Immediately fried monitor. My boss was not happy. The internship did not go well the rest of. The summer.

McAfee -$12.6K by Anonymous Coward · 2015-07-04 05:55 · Score: 1

McAfee on a mass spectrometer data acquisition system. System control would be periodically lost. Cost over $12.6K in lost instrument time and labour to determine that McAfee was blocking serial comms to the instrument (but only when it felt like it).

Lesson learned: never run McAfee or Norton on a mission-critical data system.

2 million by krray · 2015-07-04 05:58 · Score: 1

I let a upgrade bug slip by me during a software upgrade for the accounting software. In retrospect it should have been caught before it got out of hand. It got out of hand in about 3-4 seconds and had a cascading effect bringing down the whole datacenter for the company.

It happened when a "guaranteed" bid was due for a 2 million dollar job. We had nothing. Not so guaranteed...

Fortunately (?) I had a ownership stake in the company; so I also screwed myself too. Figuring ~12% profit on the job was typical and 10% of that was mine ... it cost me personally over $20K on that mistake.

Ooops.

~$60k by fox1324 · 2015-07-04 05:58 · Score: 2

I was working as a Jr. Network admin, helping to install some new cisco PoE switches to facilitate our building's move to VoIP phones. I aligned a brand new 48-port poe switch slightly off when inserting it into the chassis, and bent the insanely-complex connector at the back of the card, rendering it unusable. Fortunately, we had a ridiculous service agreement with cisco, and a new card arrived at our office within 4 hours. I distinctly remember buying burritos and beer for me and the Sr. admin to help make up for the fact that neither of us got to sleep that night.

NASA ouchie by CrudPuppy · 2015-07-04 06:00 · Score: 1

I was on the NASA Genesis price team. Only a few hundred million lost on that one when it crashed into Earth...

--
A year spent in artificial intelligence is enough to make one believe in God.

Just my time by corychristison · 2015-07-04 06:02 · Score: 2

Six or so years ago I was using a (fairly cheap) Virtual Private Server as a dev/testing box for a pet project of mine.

The VPS company was bought by a larger company, and prices were to double on the next billing period. I hastily chose a new provider without doing any research. I paid for 3 months of service in advance, got the container set up the way I like, migrated all of my data over, and was up and running.

2 months in the new provider vanished, along with all of my data. I wasn't very concerned about the months worth of money I had lost by not getting the 3 months I had paid for, I think it was only about $15. "Okay," I thought. I'll just pull my data out of my nightly backups and move on. It turns out I forgot to adjust my local cron script that pulled the data over rsync to the new IP address. My backups had not been pulled in over 2 months.

Luckily it wasn't very important, as it didn't make me any month and was mostly just for fun. I ended up starting over from scratch and ended up with a better system anyway.

I learned my lesson, though.

$1 BILLION DOLLARS (Puts pinkie to mouth) by perry64 · 2015-07-04 06:04 · Score: 1

Not me, but my thesis adviser became the Technical Director for JSIMS, which ran through +/- $1B before the pentagon pulled the plug. He is not shy about mentioning that fact.

http://www.nationaldefensemaga...

Re:Took an online trading company offline for a da by Anonymous Coward · 2015-07-04 06:11 · Score: 1

Oh yea, the "HME lock speed and duplex to full scripts". New some admins at a financial services company that didn't remember to run that on building the servers. Servers made it through testing, got turned on in production. The next day was ugly until we looked at the change management book (was really a paper book) and saw the new servers. 5 ethernet cable disconnects later we were back up our original capacity until they sorted it out.

The Final Nail by Dartz-IRL · 2015-07-04 06:16 · Score: 4, Interesting

The total cost was actually weet FA in numbers terms, but I think I put the final nail in the company's coffin.

My first 'job' was a jobbridge internship with a 'small' company. Small enough that I was literally person number three on the employee roster. The company worked in the renewable energy sector, and had been hammered pretty hard over the last few years by The Recession as domestic and corporate purse strings were pulled tighter and tighter.

I was taken as an Engineer, but rapidly found myself wearing a wide range of hats from Sales, to Customer Support, to System Design, to Project Management, web development in PHP, and finally, IT Support.

Because, one day, I managed to figure out why one of my colleagues couldn't log in to the server upstairs, and corrected the problem.

I will say, the Server was the problem.

It was a dinosaur. It was 14 years old - twice as old as the company - and had been bought second hand. It was a monstrous beige tower with a pentium II processor and God Knows What else inside. It ran Windows Server 2000, and was solely dedicated to serving the company accounts and acting as a networked file storage. Inside the case where four HDD's.... A pair of 9GB ones for the OS and programs, and a pair of 32GB ones for files. Both pairs were mirrored in RAID 1. It had a pair of lockable Zip disk drives still fitted though the keys long lost, along with a floppy drive and a CD Drive with no write ability. Or ability to read DVDs.

It creaked as it worked, then fumed, whuffed, whirred and occasionally burped. And it sat there, creaking away for years without thought or consideration to its well being or security. Until I came along.

By this stage, it was obvious the company was dying - the Titanic had hit the iceberg a long time ago, and everything that was happening was just a desperate attempt to bail it out. We might've slowed the sinking - from two months, out to six, even buying a full year - but the abyss of liquidation always loomed.

So, any suggestion of upgrading the server hardware was met by 'With What Money?'. At the same time, everybody knew the server was the lynchpin. If it broke, that was it - company gone. A suggestion that I use a spare computer from home was quietly discouraged - in case the company went under by surprise and someone decided to liquidate it to pay a creditor rather than give it back to me. Or we turned up to find the doors locked.

The best I could do was schedule a backup of the accounts and a few other critical systems, and have it go somewhere offsite. I asked our webhost if we could use our spare space for it, and they were happy to let it happen, provided we didn't cause them problems. So, I set it to run the backup every Sunday morning - 1am or so. Each successive backup would overwrite the previous because there just wasn't the spare space to hold two (No money to pay for it)

I figured even if the server went pop, or we had a building fire or some other catastrophe, at least those copies would survive. I'd figure out what to run them on afterwards.

Someone, somewhere, should see the potential problem in this. In my defence, I am not, nor ever was, an IT professional. The software education I have is more related to the engineering side of things - making machines and robotics work with a view towards industrial automation, rather than the maintenance and setup of IT infrastructure and data security.

I just did what I thought I could to keep the Titanic afloat.

So, one Monday morning, I come to the office and am met by shrill sound of metal screaming against metal and a high speed. There's a heart-in-mouth moment as I realise that it's coming from the server cabinet.

But, we have backups, I assured myself. The disks are mirrored in RAID 1, so if one drops out, the other should still be clean and working. If that fails, I've my own little backup too....

Unfortunately - that only works if the damaged disk decides to drop out of the array.

It didn't.

I find th

--
So there I was, scribbling down some notes off the PC screen by hand, when I reached for the keyboard and Ctrl-S'd.

Re:The Final Nail by drinkypoo · 2015-07-04 06:32 · Score: 2

There's a clawing feeling that it was somehow 'My Fault'.... and it probably was. With hindsight, maybe I should've set it to run the backup while we were in the building, rather than at home over the weekend. I could've used an external drive to keep one locally too. There were probably a dozen things that I could've done that'd stop it.
Only one thing which really mattered... verifying your backups. If you don't do that, there's almost no point in making any. (It gives you something to pray for...)

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:The Final Nail by Tablizer · 2015-07-04 10:06 · Score: 3, Informative

Databases should be backed up with a text-dump (such as an SQL INSERT list), not the actual database file, because of the internal pointers that are fragile. A text-dump "flattens" the pointers. If you do use the actual database file as a backup, shut all DB writing off first, during the backup. And keep multiple generations.

--
Table-ized A.I.
Re:The Final Nail by Dartz-IRL · 2015-07-04 11:12 · Score: 1

I wasn't the I guy either. I just had enough of a head to google shit that borked and try figure it out and make it work again.

--
So there I was, scribbling down some notes off the PC screen by hand, when I reached for the keyboard and Ctrl-S'd.
Re:The Final Nail by Dartz-IRL · 2015-07-04 11:16 · Score: 2

I honestly had no idea how it actually backed up, it was a function within the accounts application itself to generate the backup. Which it did, to a local disk. I then had an automatic scheduled upload of that backup to the server.
Ultimately, like I said, I'm not really an IT guy - I was the one with google and enough patience to fuck about until things worked again. We didn't have one. We did pay one company a hundred quid a month for a while in case something went TU, but we stopped paying him six months before the final death just to make the dead plane glide those few hundred yards further.
The most IT thing I've done is run a simple website off my own desktop at home, and maybe the whole make a datalogger work with remote internet access.

--
So there I was, scribbling down some notes off the PC screen by hand, when I reached for the keyboard and Ctrl-S'd.
Re:The Final Nail by Tablizer · 2015-07-04 16:05 · Score: 1

I'm not blaming you personally, it was just some "side tech". If an org or situation puts people into positions outside of their specialty, bleep is likely. That's just the way it is.

--
Table-ized A.I.
Re:The Final Nail by Dartz-IRL · 2015-07-04 21:58 · Score: 1

Never said you were. And such is the way in small companies. You have to do work outside your specialty. That's part of the fun.

--
So there I was, scribbling down some notes off the PC screen by hand, when I reached for the keyboard and Ctrl-S'd.
Re:The Final Nail by KGIII · 2015-07-05 02:57 · Score: 1

That was beautiful. I chuckled in the real world. Lessons learned and, really, no harm done. It was also well written. Even though I suspected the ending it was still enjoyable to read all the way through it. It read like an original BOFH type of story only you did not cause anyone any harm and, well, he would have been making fun of you.

--
"So long and thanks for all the fish."

Not my mistake, but my boss' by whoever57 · 2015-07-04 06:19 · Score: 2

Not selling the company for $250M because he wanted $300M during the dot-com boom. My boss personally owned about 30% of the company at this point.

--
The real "Libtards" are the Libertarians!

I didn't get some contractors fired soon enough by plopez · 2015-07-04 06:21 · Score: 1

Two totally incompetent twits from a populous south Asia country. Cost about $32k in salary and 4 month schedule slippage. Another contractor, who is competent, said she suspected they gave 'ghost' interviews, a common practice n her country. I heard managers say the same thing, that the two who showed up for work were not the ones they phone interviewed. They did not know command line basics in either bash or Windows, how to use remote desktop, J ava, unit tests, and other things we required.

Oddly enough of the 4 foreign contractors we used recently the two women have been competent, the two men useless.

--
putting the 'B' in LGBTQ+

Re:I didn't get some contractors fired soon enough by NicBenjamin · 2015-07-04 18:40 · Score: 1

It doesn't surprise me.
A woman who has gotten through college and gotten a job in a male-dominated culture has done so by being really really smart, and if she comes to the US it's probably partly because she's sick of saying a smart thing in a meeting, and being ignored till the some guy repeats her. So you're almost certainly dealing with someone who knows what she's doing and wants to be helpful.
Guys, OTOH, are much more likely to be in it for the paycheck and the "I worked in America" resume line.

Bug by Anonymous Coward · 2015-07-04 06:21 · Score: 1

Havent caused errors with a quantifiable dollar-amount loss. But have been involved with several errors in various systems, as I suspect is the case for developers who write code that actually goes to production ;)

For an embedded hardware/firmware module for use by a backend application, I made a bug causing the module to reboot if a given parameter passed from the application was missing in certain circumstances where it was supposed to be present. The application wasnt supposed to call with this combination of parameters, and unfortunately the test harness didnt test for this case either. And in fact the application didnt usually call with the wrong parameters. But due to a database crash and associated data integrity error (which turned out to be a bug in the DB software itself which was later fixed) the column corresponding the parameter in question actually became NULL for a few users in the database- And since the application didnt check the validity of parameters but just passed on whatever it got from the DB, this resulted in the firmware receiving the illegal NULL value thus causing a reboot whenever one of these users logged in. The module brought itself up quickly after each reboot and there was redundancy so there wasnt any user impact, but a lot of warnings and alarms went off every time and it took some time to figure out how the error could happen.

Killed a project by ordering a code audit by Dracos · 2015-07-04 06:23 · Score: 1

I was brought onto a small web startup project as a co-lead. By this time the project was already 2.5 years old and had been rewritten at least three times by progressively less lousy developers. The final iteration was built on CodeIgniter (MVC framework), a decent choice in 2013.

My first day I'm browsing the codebase to see what's what, and a grep finds something like "UPDATE my_table set foo=" . $_POST['bar']. Not in a controller... not in a model... in a view.

So I immediately told the other leads that we needed to do a security audit on the entire codebase; it took a few days for the owners to consent. The audit revealed three different mechanisms for database queries (the standard CI driver and two other crude home-grown libraries, all used inconsistently) and that one of the devs, who not conicidentally had resisted the audit, was actually AFK for 20%-50% of the hours he billed every week. It took two months to do the audit and resolve the redundant code (no one was full time, mind you). Finally the owners told us "give us two weeks to decide whether or not we want to proceed". After six weeks of silence they pulled the plug and abandoned it entirely.

Re:Killed a project by ordering a code audit by radarskiy · 2015-07-04 09:17 · Score: 1

What part of this was a mistake that you made?
Re:Killed a project by ordering a code audit by khallow · 2015-07-04 10:37 · Score: 1

Existing in this reality apparently.

Multiple multi-million dollar satellites. by GrantRobertson · 2015-07-04 06:30 · Score: 1

I had a friend who's job it was to find a way to break satellites. She said she was quite often successful.

(Hey, the OP didn't say it had to be an accident.)

Re:Multiple multi-million dollar satellites. by bunratty · 2015-07-04 06:32 · Score: 1

So once she tried to break a satellite and she fixed it by mistake? Oops!

--
What a fool believes, he sees, no wise man has the power to reason away.
Re:Multiple multi-million dollar satellites. by Greyfox · 2015-07-04 08:15 · Score: 5, Funny

Funnily enough at the satellite company I worked for that one time, one of the older guys there mentioned how he almost lost a satellite once by logging in to his own account and issuing a maneuver command to the satellite. Problem was the satellite was expecting times in GMT and got them in MST. Took them days to get it oriented correctly again.
Now the programmers in the audience could probably think of like 10 different specific things that could be coded into the system to prevent that from happening, but this company didn't. Which really isn't too surprising. I asked one of the devs on the ground systems team if the ground systems was using GMT or UTC. His answer was "What's the difference?" I was able to infer from his answer that it was most likely GMT, and that did appear to be the case. Somewhere deep in the bowels of the system there was presumably some piece of code written by an Indian contractor with a math degree adjusting times for leap seconds, but it wasn't in any code that anyone knew about.
The early history of that company read like a Monty Python sketch. The first satellite exploded on the launch pad. The second satellite fell over and then exploded. The third satellite burned down, fell over, exploded and then sank into the swamp. The forth satellite got into orbit and was promptly bricked by sending the wrong version of Windows(!) to it. To be fair they only had to do that because they launched it with the wrong version of Windows(!!) in the first place. One would think that ANY version of Windows would be the wrong version of Windows to shoot into space, but that's why you're not the head of a billion dollar satellite company.

--
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
Re: Multiple multi-million dollar satellites. by GrantRobertson · 2015-07-04 09:21 · Score: 2

Wow. Just, wow.
Re: Multiple multi-million dollar satellites. by bitingduck · 2015-07-04 10:42 · Score: 2

I talked to someone recently who lost a day of science data from a UAV because the Windows system driving the instrument decided to auto update while in the air with something like a 56kbps data rate.
I recently built a field instrument and made it Linux based specifically to prevent things like that, as well as to keep power and latency down by being able to kill unnecessary background tasks.
Re: Multiple multi-million dollar satellites. by freeze128 · 2015-07-05 06:24 · Score: 1

Issuing a kill command in a UAV may have a completely different effect than what you expect.

We let contractors fuck up all the time by fustakrakich · 2015-07-04 06:41 · Score: 1

We get big discounts that way.

--
“He’s not deformed, he’s just drunk!”

Re:Intel CPU sockets are terrible. by TWX · 2015-07-04 06:51 · Score: 1

Heh. I sort of miss the days when CPUs had pins and the sockets were just a pattern of holes. The ZIF socket of the nineties worked quite well.

--
Do not look into laser with remaining eye.

Rain by ouachiski · 2015-07-04 07:04 · Score: 1

I left the cover off of a $40,000 stabilized vsat antenna in a rainstorm once, That did about 10k in damage to the electronics inside. That's nothing compared to what our customers do though. Lets just say communications systems don't belong IN the ocean.

--
sorry for my comments, I'm drunk

Powerpoint presentation by Anonymous Coward · 2015-07-04 07:06 · Score: 1

I prepared a powerpoint presentation, where we could see small black dots. These were dirt marks on the lense of the camera.

But I thought it was missiles with nuclear warheads or chemical weapons, and presented that theory to a bunch of idiots. Next thing I knew, we were invading Irak!

- Colin

Not me, but I got fired over it by Badlight · 2015-07-04 07:15 · Score: 1

I got hired with a local ISP/network service group, and my first assignment was to go install a new frac-t1 router in a new client's office (yea, this was ~15 years ago, cheap t1 routers were still ~$1k). So the boss takes me back into the storeroom, digs out a router from a pile, and grabs a random power supply by comparing the size of the plug to the hole in the router. I actually bother to check the rating, and find that the power supply is 24V, and the router wants 18V. The boss tells me to plug it in.

Me: "Um, I don't think this is the right power supply."

Boss: "It'll work, come on, we're in a hurry."

Me: "But this is a 24V supply, and the router wants 18V"

Boss: "I said plug it in, what are you, deaf?"

Me: "OK..."

BANG! Fizzle-smoke-spark!

Boss: "What did you do that for?"

Shortest job I've ever had.

Re:Not me, but I got fired over it by ganjadude · 2015-07-04 09:05 · Score: 1

sounds like it was a good thing you got fired.

--
have you seen my sig? there are many others like it but none that are the same

BGP4 by Bookwyrm · 2015-07-04 07:17 · Score: 1

During an acquisition, the company being acquired helpfully passed along the list of AS they used in their BGP4 configurations in their core routers.

They helpfully had included the ones from other networks they provided connectivity to as well, but just had sent the AS numbers over in one big list, unlabeled, along with the AS their network originated: "Do these."

So during the network integration I dutifully entered the entire list of AS into the core routers as AS to be originated. Needless to say, hilarity ensued.

So perhaps not entirely my fault - though I should, in hindsight, have asked for more clarification or done more investigation rather than blindly trusting the information I had been given. This was a couple decades ago, and I was not cynical enough yet.

Surrendered three letter .COM domain by west · 2015-07-04 07:19 · Score: 1

Got this domain "hsa.com" in the *very* early days of the Internet (pre-web). Decided that since we were a Canadian company, I we should have a Canadian domain, and surrendered it and got hsa.on.ca. (we weren't allowed to have hsa.ca, since all our offices were in Ontario...)

A three letter .com address would probably have been the most valuable asset of the company :-).

Re:Surrendered three letter .COM domain by aaarrrgggh · 2015-07-04 10:14 · Score: 1

I can one-up you there... CIO let a 2-letter domain name expire in 2010, due to a merger and re-branding. Helped sign up for it in '95.
Re:Surrendered three letter .COM domain by greenreaper · 2015-07-04 12:21 · Score: 1

Even more so since HSAs are now the equivalent of health-oriented RRSPs in the USA. Man, that could have been golden. Of course it's just parked now because nobody wants to pay.
Re:Surrendered three letter .COM domain by Trax3001BBS · 2015-07-04 14:07 · Score: 1

Cyber squatting, did I ever miss an opportunity...
Re:Surrendered three letter .COM domain by west · 2015-07-05 02:27 · Score: 1

In 2010 ??
Ow!
At least I let mine go before they had any commercial value.

Out-of-sync DB entries for CC payments by Zapotek · 2015-07-04 07:20 · Score: 1

Worst thing (so far) has been formatting a PHP date() DB timestamp wrong for entries associating users and payments. I think it was something like accidentally using 'M' for both month and minute.
At the same time, there was a bug somewhere that periodically caused only one of the 2 tables to be written to, when we noticed that the tables were out-of-sync we immediately jumped to the timestamps to make some sense of the situation, which of course didn't work in this case.

Took only a few hours to sort out since we could use other available information to fix it, but it was my 1st or 2nd real job at around 18 so I figured I was canned; I wasn't though, it was one of those "lesson learned, watch out for it next time" situations -- my boss was really frustrated though.

During a planned power outage ..... by liamo · 2015-07-04 07:21 · Score: 1

... plugging a kettle into your 6-hour UPS is not a recommended way to make a cup of tea. This, however, is exactly what I did a long time ago. 10 or so seconds later, I had still-cold kettle of water and an entirely drained UPS. Oooops !

Well... by JustAnotherOldGuy · 2015-07-04 07:36 · Score: 1

I once forgot to open a water valve before turning on a laser in the lab.

The low-pressure safety switch for sensing water flow had been bypassed (not by me) and the laser tube immediately cracked and broke due to the instant heat buildup. Total cost, about $4000.

--
Just cruising through this digital world at 33 1/3 rpm...

Mars '98 by supertall · 2015-07-04 07:41 · Score: 1

I worked on both the Mars Climate Orbiter and the Mars Polar Lander, though not on software related to the failures. I did fry a $12k damper during testing though due to a misunderstanding with the thermal engineers on hardware placement (I didn't lose my job, I was fresh out of college). Due the fact that the capillary pumped loop heat pipe thermal system didn't work, they ended up cutting it off and adding extra heaters/sensors at the last minute. Looming launch deadlines make for crazy times ...

Re:Intel CPU sockets are terrible. by Bengie · 2015-07-04 07:49 · Score: 3, Informative

Pretty much all modern Intel CPUs from the past many years.

Crashed the Uni Mainframe Once by Greyfox · 2015-07-04 08:03 · Score: 1

Was curious what an apparently undocumented feature on the login page did. Turns out what it did was crash the mainframe. Go figure. You'd think they'd take that shit off the login page, but apparently no one had ever been so curious as to explore it before. Which says a lot about that uni, now that I think about it. Also, once trash talked a uni in a story on a news blag website. Yeah, those were the days...

Mostly I make my career out of fixing other people's tech mistakes. Which is not something that uni taught me how to do. Man I'm glad I got out of that place before I ran up any significant student debt. Did I mention I trash talked a uni on a news blag website?

--

I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

Re:Crashed the Uni Mainframe Once by rrohbeck · 2015-07-04 11:35 · Score: 1

LOL. When I started our Uni mainframe was, umm, not very secure (ICL 1906 with GEORGE 4, yay.) We crashed that thing every few weeks. Whenever you did something naughty and the terminal displayed a flashing status at the bottom saying it was waiting for a reply it was time to run from the terminal room because two minutes later one of the operators would come in and look who sat at terminal number X.

--
thegodmovie.com - watch it

About three days work, but PITA by Kjella · 2015-07-04 08:13 · Score: 1

Basically an loading tool with a bug I knew from testing, you could set it correctly once in production but if you set it twice every user was f*cked up and could only be fixed from the web interface by about 5 clicks per user, no programmatic solution. And of course we had an error in the production setup, I altered that part - which I could - but forgot to take out the "you can run this only once" settings. Hundreds of users borked and the vendor support would take forever or claim there's no other way, what do?

This was a consulting company, trying to bill this would look bad on both our vendor and ourselves and it pretty much broke everything so we gave a benched consultant the assignment from hell. Click here, here, browse, pick, save in this somewhat less than instant web interface. Now do that all day, every day for all users until you're done. Personally I'd be ready to jump off the roof after an hour, but apparently she stuck to it for three days and finished. I don't think we won any popularity points with her though.

--
Live today, because you never know what tomorrow brings

Melted down 200K worth of power supplies by Anonymous Coward · 2015-07-04 08:14 · Score: 1

Melted down a couple of LARGE high end power supplies (worth about 200K - I think the repair was about 50K). Did I lose my job? Nope, not even really called on the carpet. I had a triple redundant fail safe system, approved by management (in writing), and reviewed by both levels of client, and ALL THREE systems failed! (1 software, one independently developed firmware, one mechanical). Failure analysis on just the last one (the mechanical) was it was a once in over a million chance of it failing (yes, we did a failure analysis). Something (surge?) fried the computer, the firmware controller, AND welded the mechanical contactor closed (LOW duty cycle - close at start of test, open at end, 3x safety factor on ratings, something welded them during the test - aka I watched them close, visually inspected, and went home for the night, as per SOP)
One of those freak things, but we changed to a carbon contactor so it could not weld, and changed the firmware unit to a more robust unit, and did some other isolation. As far as I know, never happened again

a bug i found once by superwiz · 2015-07-04 08:20 · Score: 1

was created by my boss. I fixed the bug instead of reporting it. The boss was incompetent and was costing the company millions in missed opportunities and in increased turn over of really good people. He couldn't see when his successes were pure accidents and when his mistakes were entirely foreseeable and preventable. I had a few opportunities to get him fired when fixing his messes. I wasn't ruthless. It cost a number of good smart people their jobs and cost the company millions (in fixes, unnecessary delays and missed opportunities). I'd put the dollar figure at around $10mil. But it may be much larger if some of those missed opportunities were first-to-market.

--
Any guest worker system is indistinguishable from indentured servitude.

On the plus side, it discovered life... by Minupla · 2015-07-04 08:28 · Score: 2

... too bad it was here :)

--
On the whole, I find that I prefer Slashdot posts to twitter ones because I don't get limited to 140 chars before

Six figure accidents by sectokia · 2015-07-04 08:31 · Score: 1

The two biggest I have seen: -Comms card slips out of box while being carried over to submarine. Worth about $220,000, fell into the water and had to be recovered by divers for security. -Electrician didn't test circuit was isolated, he went to disconnect 3 phase circuit and decided to start with neutral. He lifted the neutral off, putting up to 400v where there should have been 230v. This destroyed over $300,000 in components, and cost another $200,000 due to lost operations.

Brought a Tandem Non-stop to a halt.. by scsirob · 2015-07-04 08:57 · Score: 1

Back in the 80's I worked for a field service organisation, fixing and maintaining PDP11 and VAX systems, but also CDC-9766 removable disk systems. Big 14" removable disk packs like you see them in old scifi movies. One of my customers had a string of 10 or so attached to a five-node Tandem Non-stop system.

Each week they brought two out of ten off-line for me to work on. I cleaned the heads, then used a servo disk pack to realign those heads.
To do this, I needed to remove the control cable from the string, and plug in an excersizer. One day I forgot to pull the control cable. So instead of moving the heads of my offline drive to a specific track, I moved the heads of *ALL* disks in the string! Without the O/S knowing about it

Believe me, that will bring a Tandem Non-stop to a grinding halt. That was my last time on the floor for that customer, but I didn't lose my job. Cost? I don't know. Perhaps a weekend of data recovery for the operators?

--
To Terminate, or not to Terminate, that's the question - SCSIROB

IBM mainframe disaster by send2cbd · 2015-07-04 08:58 · Score: 1

Late 70's. Central datacenter for a state not to be mentioned. I modified the JES2 startup JCL. Our mean-time-to-reboot was typically 2 weeks. Because of important state business, we didn't get a chance to reboot for 3 weeks. So, we reload and JES2 dies for JCL error. Then, we realized that all of our daily backups have the same error. And our last 2week backup has same problem. Our next backup, monthly, is stored at a site that is 1.5 hours away. Meanwhile, programs like AFDC and prison support apps are not up. Governor starts getting calls from important folk - wheres the system? Governor calls DP director - wheres the system? I see the end of my career looming. Fortunately, my boss had an old SVS system on tape that was just enough to allow us to edit the JES2 deck. After this, we changed our backup policy and put in stricter rules on modifying production systems. I just retired after 46years in computer industry. Still remember the fear on that day.

Posted this before by Oligonicella · 2015-07-04 09:01 · Score: 2

But it's worth repeating in this context. Thankfully, it wasn't me.

When I worked at a KC bank, we had a Wire Transfer team manager who loved golf. He was supposed to come in Saturday and test a firmware/OS upgrade, then restore. Nice, sunny day Saturday, so he decided golfing would be better.

Came in Sunday. Installed firmware/OS upgrade. Tested fine. Forgot to reinstall previous firmware and powered up old OS.
Incompatible. Froze the machine solid. He panicked and tried for maybe four hours to fix things himself. No go. Finally called Cupertino for help 4+ PM.

The techs had to be found, gathered and flown out from CA to disassemble said machine and reassemble. No wires until 1 or 2 PM Monday. Much money loss for all customers.

To answer the obvious question, no - beyond my understanding, he wasn't fired or even demoted.

$8m UPS modification. by thegarbz · 2015-07-04 09:13 · Score: 2

One of my first engineering jobs out of uni involved modifying a UPS. This UPS had a massive battery bank that was quite dangerous to load test and didn't have an automatic load testing function. I came up with a small design involving a contractor and some minor wiring changes and we were part way through implementing it on every UPS at this site.

This UPS was part of a redundant pair that fed an emergency shutdown system at an oil refinery. In between the UPSs and the ESD system were about 120 circuit breakers, two for each circuit, and one of them was off. We modified the first UPS without issue then started the process for the second one. After calling the control room to let them know they will receive an alarm I switched off the UPS and was suddenly meet with a steam of profanities over the radio.

We lost power to 80 field instruments which triggered a fail safe action on the shutdown system tripping 4 units at the refinery, one of them was the FCCU which is core to a lot of refinery processes. To add insult to injury the unit was unable to be hot restarted because of a stuck valve and then thermally contracted breaking of large chunks of coke from the overhead line which blocked the internal cyclones. The FCCU was down for repair for roughly 10 days, I had made a name for my self and was asked to display the cock-up award (a giant dildo mounted on a plaque) on my desk.

Total cost of the outage was about $8million. Fortunately only partially my fault.

Fried Unibus adapter card on VAX by Anonymous Coward · 2015-07-04 09:17 · Score: 1

In the very early 80's, I was tasked with getting a VAX 11/780 onto an internal Ethernet network using a proprietary Ethernet Unibus card (one guess where I worked). This VAX had a Unibus backplane in a separate cabinet cabled to a Unibus adapter board on the system bus in the "main" cabinet. The Unibus adapter backplane was wirewrapped and since this Ethernet card did DMA (it's been a long time, but I think that was why), it needed control of a bus line which was normally jumpered on the backplane bypassing each slot so "dumb" cards didn't have to deal with passing the signal along. Therefore, I had to snip this jumper on the backplane of the slot I was installing the card in.

The VAX wasn't used by our group but was used by other departments during the workweek for some fairly important stuff and there was no backup system. The machine was given to me on a Saturday morning and I was admonished it absolutely had to be up by 8AM (IIRC) on Monday morning. No problem as I had studied the problem and had been in email communication with someone at another site who had performed exactly the same procedure.I had never physically touched a VAX before in person but there really wasn't anyone to help me with the task locally so I was on my own (in retrospect, maybe that wasn't the smartest decision) but, being young and brash, that didn't bother me.

It didn't take me long to find the VAX once I got into the data center -- after all there was only one of them. I shut it down cleanly from the console. I set the switch on the main cabinet front panel to the OFF position (I don't actually recall how it was labeled), the lights on the front panel went off and I could hear the area around me got a little quieter as fans spun down (although there was a lot of other hardware around, so it just reduced the din slightly). I was well prepared and had just the perfect pair of wire cutters to do the job. I opened the Unibus adapter cabinet and put the card in. I then accessed the backplane, carefully identified, double checked, and triple checked, the slot and jumper that I needed to cut. In retrospect, maybe I should have paid attention to a rather obvious condition that was staring me in my face, but I had rehearsed this work flow in my mind and proceeded onward. I confidently stuck the wirecutters into the maze of wires, snipped the relevant wire, and everything was going very well.

Then I withdrew the wirecutters from among the wire-wrap posts and was more than a little surprised as sparks arced from the wirecutters to wirewrap posts that they brushed against. Nearly simultaneously with the arcing, I noticed one little detail that I should have noticed earlier -- the fan in PDU or power supply in the bottom of the cabinet was still whirring away happily and the light showing it was powered on was clearly glaring at me. Ooops...

Well, I thought, hopefully, no harm done and I closed the cabinet. It was around then that I noticed a very concerned look on the faces of a couple of FEs who were working on an adjacent machine. I walked over to them and their concerns quickly became mine -- turns out they were "downwind" of the VAX and the distinctive odor of scorched electrical bits was strong around them. I guess I made someone happy that day though - they were very relieved that it was my machine, not theirs, that was emitting that lovely unmistakable fragrance.

Unfortunately, although the VAX seemed to boot, a bunch of stuff didn't work... Ooops...

We had 7/24 support with DEC so I called service out and watched a completely incompetent service guy (he was our PDP-11 repair guy who apparently was stuck on call supporting hardware he knew nothing about) fumble around for hours and concluded that the Unibus backplane had been fried and initiated getting a new one counter-to-countered to us (fortunately, that got blocked by someone who knew what they were doing somewhere). The guy didn't even know how to run diagnostics on the VAX and refused to attempt to do so.

In the end, the machine was not up a

Regexs..... by kc8apf · 2015-07-04 09:29 · Score: 1

I missed one character in a regex in a monitoring system that would cause it to think all the hard drives in a machine had failed when the machine was booted. Since it only happens on boot, it wasn't noticed until there was maintenance work that powered off an entire datacenter. When they turned the power back on, ~5000 machines all decided their hard drives had failed simultaneously. Took 2 days to clean up the mess.

--
kc8apf

Millions by maiden_taiwan · 2015-07-04 09:33 · Score: 1

About 15 years ago, a QA engineer in my office (a large Wall Street financial form) placed a fake trade for 1,000,000 shares of company stock in one of our test systems. The test order somehow got out to the New York Stock Exchange and actually moved the market. Backing out that trade was reportedly quite expensive.

The engineer didn't get fired, because he had done everything correctly. The system infrastructure had been set up wrong.. wasn't his fault.

Warship Anyone? by GumphMaster · 2015-07-04 09:50 · Score: 1

Mid 90's. Spent a lovely weekend below the waterline on a frigate updating the ship's maintenance system with a new data picture of its systems. All went wonderfully well and I walked ashore late afternoon on Sunday and flew back to my home city. Fast forward to 4pm Monday and we get a call from the ship at sea saying the maintenance system no longer functioned: get your butt out here and unf*ck it. So, in the car, 3 hours drive to where the ship anchored for the night, RHIB ride out to the ship, up the rope ladder, about 10PM... fix it, you have until 6AM or you are sailing with us (for a week). That, my friends, is great motivation to work fast. To cap it off, there was a small fuel leak in the space outside the computer room: wonderful aroma to deal with. Tried to work out the obscure linkage between existing maintenance jobs and the system description that was causing the issue. Ultimately had to roll the database back to the pre-update state. Off the ship at 6 along with many bags of oil-soaked rags used on the fuel leak. Ship lost a few days of data and a day at sea: captain not happy... and we had to do the whole exercise again later.

Tape for data, $100, Airfare and and accommodation, $600, warship all at sea, priceless.

Not entirely my doing (what is these days) but I was the man that delivered the fun. No names, no pack drill over this.

--
Patent litigation: A doctrine of Mutually Assured Destruction... in which everyone seems willing to push the button

Red Ring of Death by Sarusa · 2015-07-04 10:03 · Score: 1

No, not me, but it's worth noting that the XBox 360 Red Ring of Death was (according to EE Times) caused by someone at MS who thought he could save a couple million bucks by doing the graphics ASIC work in-house instead of paying someone with experience like ATI to do it. That cost $1.3 billion. As far as I know nobody involved in deciding that or doing the ASIC work has ever been named (and I wouldn't blame the poor ASIC guys), but I can only imagine it would be like to know that was you.

Fried voice coil by peterofoz · 2015-07-04 10:12 · Score: 1

I fried a voice coil on a fairly expensive Hitachi 2.2 GB optical drive back in the late 1980's with a QA stress test while working for FileNet. This led to engineering improvements and I got to keep the burnt out coil as a trophy.

Spending $400 instead of $4,000. by aaarrrgggh · 2015-07-04 10:21 · Score: 1

Bought a Buffalo Terrastation. Went on vacation a year later to a country with limited internet access. On trip, one-year warranty expired and it died the next day, taking all data with it.

Fortunately, I had a copy of the server with me on a portable hard drive, so I could work remotely. That was our only backup. Sending the accounting database back to the office via GPRS was a lot of fun, but mailing that drive back to the office (after duplicating it of course) scared me to death.

The solution at the time was the right one; we didn't have the money for anything more. Ever since we have a hot backup server synchronized to the primary, for a small business. Like most screw-ups, what is important is how you move forward.

This one time... by wbr1 · 2015-07-04 10:28 · Score: 1

In 1998 i was working for an ISP in their NOC. One of our main AIX servers was filling. It housed home directories (and hence mail stores) for most of our customers. The engineers added a new array. I was supposed to write a script to move the directories to the new drive and change the home path in the passwd file.

I flubbed the script and while there was no data loss, i, by myself on the night shift broke about 25k email accounts. I had a long night fixing it.

I still remember the frantic calls from the help desk as I was in panic mode trying to find out how bad it was.

--
Silence is a state of mime.

Bricked a Samsung Galaxy by hambone142 · 2015-07-04 10:29 · Score: 1

Bought it on eBay. Had crappy Verizon firmware on it that wouldn't allow any kind of audio streaming (web page streaming or TuneIn). Loaded Cyanogen on it and it worked fine but still wouldn't stream due to some remnants of Verizon FW.

Backdated Cyanogen to older mod and that mod was corrupt. It destroyed the boot loader so I couldn't flash another copy of non-corrupt OS.

I still have the phone but no way to get an OS on it without a boot loader on it.

Re:Took an online trading company offline for a da by AmiMoJo · 2015-07-04 10:34 · Score: 2

I knew a guy who did support for a multi million pound company. They had many problems, mostly due to the fact that he was too scared to reboot their servers because he did all the support remotely and it would be a 100 mile trip up to their office if the machine didn't come back up. They insisted that he do maintenance in the evenings or at weekends to avoid disrupting their work.

So their terminal server was still running IE 7, because he was too afraid to update to IE 9 as it required a reboot. Someone actually got fired because they infected the server with a drive-by. Their mail server had a dodgy network card, but it took nearly a year to diagnose because he was terrified of updating the driver in case it didn't come back up, so that was just intermittently not responding or dropping incoming connections for over a year. The driver update fixed it in the end.

--
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC

Threw away the wrong phone by Theovon · 2015-07-04 10:48 · Score: 1

Well, one time, I had a problem with my land line, and I erroneously accused the wrong phone and threw that one out instead of the one that was causing the problem. Then I ended up throwing away two phones.

Since then I've solved the problem more generally by not having a land line anymore.

Re:Threw away the wrong phone by Trax3001BBS · 2015-07-04 12:52 · Score: 1

Well, one time, I had a problem with my land line, and I erroneously accused the wrong phone and threw that one out instead of the one that was causing the problem.
I went for a swim with MyTouch cell phone in my back pocket. You could use it as a level by the water inside the glass.
I immediately went out and purchased a new phone being so important I couldn't miss one that may come my way.
Wrapping the insides of the MyTouch in toilet paper, and shoving it into the middle of a pan of dry rice for a few days fixed it right up.

THe biggest tec mistake I ever saw... by VAXcat · 2015-07-04 10:59 · Score: 1

Wasn't mine, but it's too good not to share. Back in the mid 80s, I was working at (let's call it) SuperBigCorp's IT department. There was a fellow there who maintained the programs that handled the savings elections for employee 401K funds. One day, while making some changes to the COBOL programs that sent which funds to what investment vehicles....he made a little mistake. He got confused in a conditional statement, and all the funds that should have gone to stable investment selections went to the highly speculative vehicles, and vice versa. Even more unfortunately, this area of activity was not supervised and audited half as well as it should have been....by the time it was noticed, several months had gone by, and the stock market had suffered a bit of a setback. Millions of dollars were lost by SuperBigCorp getting it straightened out. They had to let the poor fellow go, in disgrace. The Chief of IT was reported to have said, that if the market had just moved the other way, the programmer would have been a hero...

--
There is no God, and Dirac is his prophet.

The story of David Alexander. by VAXcat · 2015-07-04 11:13 · Score: 1

Back in the mid 80s, I was fortunate enough to get my first programming job. I worked with an incredibly capable programmer, let's call him Dr. Bob. I learned a great deal about programming from kindly Dr. Bob - he was a whiz at PDP11 and VAX assembly coding, and a great mentor. One day we came back from lunch and he picked up his mail and messages from the department secretary on the way to his desk. He opened one of the envelopes he'd gotten, read the letter within briefly, then started cursing like a sailor and threw the letter in the trash. He stalked off in a rage. I retrieved the letter and saw it was a page from a phone book, with the name "David Alexander" circled. After a couple hours, when Dr. Bob had calmed down. I told him he had to tell me what was going on. It turns out that his very first assembly language programming gig had been at the local University. It involved managing the data for a planned 50 year long psychology experiment, tracking the names, addresses, and project info for all of the participants over time. Now this was the mid 70s, so there was no database, just a bunch of tape files and MACRO programs to do the updating and reporting. Dr. Bob really liked the work, and the folks in the Psych Dept were really friendly, it was a great atmosphere. One day, Bob made....the Big Mistake. Due to some typoes, he inadvertently replaced the name and address info in every record in the files with the data from the first record....David Alexander's. This was a tape database and it only went back a few tapes worth....by the time it got noticed it was too late - all the good data was gone. The long range experiment was totally destroyed since they couldn't track the participants. He had to quit in disgrace - he said what really upset him was the way the Psych Dept folks were so nice about it and didn't want to fire him. Anyway, that's bad enough...but when his "friends" caught wind of it, they started popping up David Alexander references everywhere they could - they'd leave him phone messages from David Alexander, they'd get mailings sent to his address to David Alexander, and so forth. By the time this event I saw occurred, it had been going on for years (for all I know it still is). Anyway, due to kindly Dr. Bob's David Alexander mistake, I always check my code just a leetle more carefully than I otherwise might be bothered to - I personally don't ever want to make my Big Mistake....

--
There is no God, and Dirac is his prophet.

interesting synchronicity by epine · 2015-07-04 11:17 · Score: 2

Just fifteen minutes ago I realized that my script to refactor the primary file server (newly converted to ZFS) into more sensible datasets had an irritating detail wrong (a path element was being duplicated in some paths).

I said to myself "oh, I'll just roll that whole thing back to the snapshot I made 30 minutes ago".

Then I go "zfs list -t snapshot" and discover that my snapshot was holding onto 0 GB because I forgot the -r switch to make the snapshot recursive.

Oh, well. By some impossible-to-separate mixture of good management and good fortune, it turns out I had a set of (different) snapshots from the last two days covering all datasets in questions. I lost very little work (only scripts were executed against these datasets and I still have all the scripts).

My real screw up?

Back in my second co-op workterm job, I managed not to notice that a system I was backing up changed the order of the listed drives between two very similar screen requests that I made almost immediately one after the other. Unfortunately, on the second pass I selected the active system drive as the recipient of the system backup, picking from the position in the menu where the desired destination drive had appeared moments before.

I had become accustomed to my home system being deterministic in the order it listed things. My bad.

This is back at the very beginnings of the 4.77 MHz era, so my PC was actually not yet what we now know as a "PC" (its father had an S-100, and its mother had a itty-bitty CRT).

Thirty years later I still can't type dd of=/dev/ada3 without making three trips to the metaphorical bathroom.

Whenever I type a disk-level dd command, I leave the sudo off, until after the third proof-read and several console consultations in which at least two different programs give me the same view of the drive name.

In dollar costs I couldn't say. In psychic cost, it's indelibly etched onto my permanent record.

I had a co-worker once (EEng) who claimed that as a junior intern during the late 1990s back when laser gear for fiber optics was all the rage, he routinely fried extremely delicate $2000 DUTs while the old hands just shrugged their shoulders. Dotcom dollars. Who really gave a fuck? It was considered barely worse than ruining a nice chair.

$22M - 6 hrs of downtime by Anonymous Coward · 2015-07-04 11:24 · Score: 1

$22M - 6 hrs of downtime for 1 application due to a corrupted DB. I typed what the vendor told me to type into sqlplus. The vendor was clueless, obviously. Took about an hour to determine the root cause, took another hour to find a real DB (on staff) then some more time to bring him up-to speed and restore from daily backups.

Over 20K workers couldn't do anything that day.

The lead technical architect (hired gun), my team, and the direct business clients who knew protected me. S-VPs in the client organization all wanted to fire someone - me. They never found out who to fire. However, I've been stuck in the same position the last 8 yrs. No promotion since.

Re:$22M - 6 hrs of downtime by retchdog · 2015-07-05 08:49 · Score: 1

sounds like that $22M is a total bullshit figure, unless those 20K workers were each costing ~$300K/year and working solid days without wasting any time on coffee breaks, web browsing, etc.
get another job and quit falling for bullshit.

--
"They were pure niggers." – Noam Chomsky

Re:Sligthly over 12 million USD - for couple hours by jonwil · 2015-07-04 11:43 · Score: 1

Not if that database insertion caused money to be moved somewhere else and database entries existed on a system belonging to someone else.

Fun with lasers by NormalVisual · 2015-07-04 11:54 · Score: 1

My personal best was when I was writing the firmware for a customer's laser marker system. It was a big industrial machine that moved the laser head on a very expensive gantry using 15-pound servos that could generate ungodly amounts of torque. I had a bug in the code that drove the servos, and I issued a command to home the gantry, after which the X-axis went zipping across as fast as it would go. Wouldn't have been a problem except there was a faulty limit switch on that end of the axis, so the 25-pound laser head got slammed into the stops at what we estimated was about 100 inches per second. Totally destroyed the laser head (there's nothing more disheartening to hear than the tinkling of broken steering mirrors and seeing a cracked flat field lens as a bonus), and caused some severe mechanical damage to the rest of the assembly. Fortunately the motors shut down automatically when the temperature sensor tripped, but it wasn't fun explaining to the boss that we had to replace about $30,000 of hardware.

My favorites are those I thankfully had nothing at all to do with - where I am now, we write and maintain the warehouse management software for a very, very large snack food vendor, and we have a VPN link to all of the plants to maintain and monitor what's going on. It's happened before where co-workers haven't paid close enough attention and have connected to live plants instead of the test systems, and accidentally shut down the warehouse, which means production gets shut down too since there's nowhere to put those thousands and thousands of bags of chips until the warehouse system comes back up, and it takes them hours to get stuff restarted and settled once that happens. I don't know how much it costs, but it can't be cheap. I'm also not sure why we don't have some kind of two-factor system with a unique key for each plant to keep that from happening. [shrug]

--
Please stand clear of the doors, por favor mantenganse alejado de las puertas

I nearly cost my company millions by PhilHibbs · 2015-07-04 12:19 · Score: 2

I nearly cost my employer several million by fixing a bug.

The first task I was given in my new job was to look at an old system that printed labels to be put on containers of car parts. A message would come in on a serial cable saying what part was going to be needed within a few hours at a car assembly line, the parts were packed into stillages (a frame designed to hold a certain number of a certain part, like bonnets, bumpers, doors panels, etc.) and when a stillage was full, or when a certain amount of time had passed since the first part was picked, then a label was printed, applied to the stillage, and it was dispatched over the road to the factory.

Every time the serial number rolled over 9999 to 0001, the system would go wrong and stop working. This happened about once a month, and the help desk had a sheet of instructions on how to fix the problem. Some of the staff knew the fix off by heart.

I looked at the code, found a roll-over bug, and fixed it. Everything was fine, and a couple of years went by with no problems.

Then, at 3 in the morning, the help desk called me and said that it had happened again. They didn't have the sheet of paper any more, and no-one could remember how to fix it. I rubbed the sleep from my eyes, and tried to get my brain into gear and remember what to do. It took me about an hour talking with a couple of help desk people, and between us we figured out what the fix was, and they called the warehouse and talked them through it.

The next day I talked with my colleagues, and found out that we had come within a few minutes of triggering a penalty clause for halting the production line that could have run into millions of pounds. This was back in the '90s when millions of pounds were a lot of money!

I looked back over the code, and found that there were actually two very similar bugs in the code, one of which happened fairly regularly, and one which only happend much more infrequently, but the same fix worked for both of them.

Back when I first started working in IT, my boss told me, "One day, you will probably make your million pound mistake. In our business, we build systems that, over the course of our careers, will save millions of pounds in lots of small ways. Eventually you will make a mistake, and one of those systems will go wrong, and it might cost millions. Your employer will bear the cost of it, which is why we don't earn those millions ourselves. You have to be prepared for that eventuality. If it happens while you're working for me then I will kick your arse, and maybe I will fire you, but I'd be wrong to do so, that's just the nature of the business that we are in."

3k to replace a motherboard by Trax3001BBS · 2015-07-04 12:20 · Score: 1

Not sure if it counts as it was an Amiga 3000 and they came to my house to fix it for free.

I had a "friend" who brought over a new hard drive to get working on the Amiga I did my best then the system just quit, He then says yep, did the same thing to mine.

sudo poweroff by Anonymous Coward · 2015-07-04 12:24 · Score: 1

Oops! Wrong terminal!

I was sshd into a production server and did a poweroff. Meant to run it on my own box. I didn't have authority with our host to ask them to turn it back on and those who did already left for the day. Probably didn't cost the company much since it was a small saas product, but if I pulled that stupidity elsewhere it could have.

Re:$180K mistake by greenreaper · 2015-07-04 12:56 · Score: 1

I don't think that counts: a) it wasn't your mistake; and b) the company should never had had that revenue in the first place, so it wasn't a "loss" but a restitution.

Fumbling around by lucm · 2015-07-04 13:15 · Score: 1

USB connectors also fit neatly in RJ45 ports, and this too can lead to interesting side-effects.

--
lucm, indeed.

Lightning fried network switches $8k by zerofoo · 2015-07-04 13:59 · Score: 1

I temporarily ran a copper network cable out of a window to another building while our building to building fiber was being installed.

Over a weekend we had huge lightning storms. The voltages induced in the unshielded twisted pair cable hanging outside 3 floors up fried both switches on either end of the cable.

That was an $8000 mistake.

Hmm yes. by Falconhell · 2015-07-04 14:03 · Score: 1

Back in the early 80's, I took off a little too fast in my company station wagon, and $10k DTS Data Terminal hit the road hard. Ooops.

Telecoms classic by Falconhell · 2015-07-04 14:12 · Score: 1

Not me this one, but a classic.

One Friday afternoon Telecoms tech was checking a remote unmanned exchange, one of the checks was to measure the levels on the analog multiplexer for the trunks to the main exchange, which acted as the brains for the dumb remote.

The procedure was to plug a 6.5 mm phone jack, attached to a large fixed meter into each channel at a time. Unfortunately, this chap grabbed the wrong hanging jack, this on having 50v exchange battery on it. He then proceeded to plug into each channel of the carrier system, and was mystified when there were no reading. As he plugged in the last channel, the exchange went totally silent. Whole exchange was down for 2 days.

That's nothing... by Anonymous Coward · 2015-07-04 14:13 · Score: 1

What about the guy who sold Slashdot to Dice? :)

Everyone makes $1,000,000 mistakes by NothingWasAvailable · 2015-07-04 14:15 · Score: 2

During a panel discussion with very senior technical leads, the question came up: "How many of you have made a $1,000,000 mistake?"

Every single one raised their hand. This was a very large semi-conductor company, and everyone had been involved in at least one instance where bad masks were made because a check was skipped or step was botched in the design flow.

I worked on a chip design where it took six design revs to get clean masks. All five of the prior revs had avoidable (human) errors during the design and build process.

Pay me now (in time running checks) or pay me later (in nre: non-recoverable expense) for bad hardware.

Re:Everyone makes $1,000,000 mistakes by Macman408 · 2015-07-04 19:14 · Score: 1

SIX revisions? Hopefully only metal layers, or were some a full base spin too?
Where I work, we usually go into production on the second revision. Occasionally, the first one is good enough (usually if it's similar to a previous chip). The one I worked on most recently was a brand new design from the ground up with a new team of people, so we shipped the 3rd version (both spins were just metal layers). We (almost) never change the base layer - the case I heard about was when somebody in Marketing told someone in Engineering that there was no way they'd ever want to market a specific part to use >n MB of memory (probably 512 or so), because it was a low-end part. So they put enough address bits on the part for 512 MB - and then not too long after making it, Marketing decided that they needed a 1 GB version too. Then it just became a question of "is it worth a million dollars to be able to sell it with 1 GB?"
I'm in verification, so my whole job is to make sure we haven't made any million dollar mistakes. I produce no useful output, other than a thumbs up to management right before they start producing wafers. Some mistakes still get past us, but when a million dollars is on the line, some creative changes (often just in software) can help us keep the problem at bay.
And any time a big mistake gets by, another item gets added to our checklist. Being the first guy to make a particular mistake is usually professionally survivable; everybody makes mistakes sometimes. But being the second guy to make the same mistake does not bode well for your future...

30K in 30s by VictorTango · 2015-07-04 14:17 · Score: 1

I once wrote a temperature monitoring system for a cargo airline flying 747s. The system would read the loadplan to determine if there was temperature-sensitive cargo onboard, then after takeoff, would send an ACARS message to an aircraft asking the ECS what the temperature was in each section of the aircraft. The rules table could be set to a different frequency of monitoring based on the exact cargo, so AVI (live animals) would be monitored every 5 minutes, pharmaceuticals every 10, etc. Once the temperature report came back, the system would compare that to determine if the temperature was within limits of the cargo onboard. Anyway, accidentally put zero in the frequency table, and basically DOSd 5 aircraft that were in-air carrying perishables. Realized the error pretty quickly when the monitoring system freaked out, but the data charges alone where about 30k in 30 seconds. ARINC was very nice and waived the fees though - thanks guys!

Multiple... by xploraiswakco · 2015-07-04 14:47 · Score: 1

Warranty work: In the late 90's I was repairing a beige desktop Mac (early PPC), I needed to remove the logic board, and while attempting to pry up the logic board I slipped with the screwdriver, which ripped off a resistor in the process. As it was warranty work on behalf of the manufacturer (I was working for a service agent), all parties agreed it was a mistake that could have happened to any technician, so it continued to be covered.

Destroyed keyboard: I once spilt a Fanta on a white Apple keyboard, the clear plastic base with the full height keys, the last of it's kind before the current flat aluminium keyboards cam in.

Almost lost data: I was click happy once during the process of backing up a laptop for a staff member (planning to upgrade the OS), and instead if hitting backup, I hit erase. I was able to restore the data thanks to hard drive erasing only modifying the first block or two on the disk, instead of going to the time and trouble of erasing the entire disk.

Powerful mistake by gtarthur · 2015-07-04 15:26 · Score: 2

Back in the 70's when I was still a junior electrical design engineer working for a distribution transformer company, we used algorithms loaded into TI calculators to compute the electrical, heat, and mechanical stresses. I later got the task of modernizing those codes and merging them with a FORTRAN code that another engineer had written and abandoned because it was too expensive to run. Things went well at first, we saved a lot of time and used that as any good engineer would to optimize our designs using different parameters to reduce cost and improve efficiency, both very important to my company and its customers. Then one day we got a limiting case which we didn't recognize at the time. As usual, one of our engineering assistants used the computer generated design and the old methods to validate the design. The engineer always takes responsibility for the design. After the build, the unit, a 3 phase unit that had 76,000 volt inputs, was tested in our "hi pot" chamber - a voltage pulse of the rated voltage but with reduced current and only for a short pulse. The center core winding turned into shards of copper spaghetti in the 8 foot tall tank. It cost $25,000 to repair, and delayed delivery for 3 weeks. My heart rate hit about 200 when the engineering manager called me and my supervisor into his office. Then he explained that he had run the calculations also, and discovered that our methods had a flaw in the prediction of the axial forces on the center coil. It was a very subtle mistake, and he said it could have been much worse. We were able to revise the code within a few hours, and that incident led to further improvements in methods and automation. It also taught me my most important lesson about computers - human error is the greatest risk. Real tests of your code sometimes do "blow up".

--
Every change is not progress, but there is no progress without change.

Re:2400 (thanks HP) by Pubstar · 2015-07-04 15:29 · Score: 1

Cant be worse than the Kenwood TrueX DVD-ROM drives. Those things were fast as hell, but notorious for dying.

Comment removed by account_deleted · 2015-07-04 16:01 · Score: 1

Comment removed based on user account deletion

got a router to hang ... by oneiros27 · 2015-07-04 16:27 · Score: 1

I managed to flood it with enough data that it locked up, and required a manual reset. The second and third time that I did it, the network admins were getting much faster about fixing it, but my boss told me to stop doing it.

I have no idea how much it cost ... but it was the router that fed NASA Goddard's active missions, and I was told that the Hubble folks were getting upset when it kept happening.

I didn't get fired, as I was testing to ensure that we had sufficient bandwidth for SDO data transfers. (we didn't ... and I probably didn't need to run the additional tests to prove it). It did convince them to move us over to an isolated network when we moved offices, though.

--
Build it, and they will come^Hplain.

Got Lucky by Drethon · 2015-07-04 16:32 · Score: 1

I dropped a 50k sensor on the ground but it tested out fine afterward. It was used for development so if there was hidden damage it didn't really matter.

$100K in cabling that went unused by carlos92 · 2015-07-04 19:37 · Score: 1

I was tasked with a fiber cabling project for a new upstream connection at a small ISP. I documented the requirements, placed a purchase order, interviewed contractors, recommended one of them and went ahead with the project. My boss was downsized during this process, and when I informed my new boss that the cabling was completed and that his signature was required in some document in order for the contractor to be paid, he said something along the lines of "did nobody tell you that the upstream connection will not use that kind of fiber?" I wanted to die at that moment, but the fact was that it wasn't my fault - it was a consequence of the massive layoffs, the resulting chaos, and the deficient flow of information.

Convinced my boss not to use ColdFusion by Shag · 2015-07-04 20:50 · Score: 1

...by writing a simple page and putting it under load on a Sun E4500... which was the front end of our dot-com's website. We were only invisible to the rest of the world for a few minutes, thankfully...

--
Village idiot in some extremely smart villages.

Subprime Mortgage crash by EmperorOfCanada · 2015-07-05 00:08 · Score: 1

I read an article years ago about a guy who developed the software that made transacting CDOs (Collateralized debt obligations) much easier. Basically that lead to the entire sub-prime mortgage industry which lead to 2008. So I think that he wins this whole discussion.

Re:Didn't break but helped to fix... by KGIII · 2015-07-05 02:08 · Score: 1

I only know of two such instances where this happened or something similar happened. One was only about five years ago and the other was longer - it made the news. Assuming it was the latter then that grocery store chain either begins with an S or a K? I can not recall which one it is but I do recall hearing about a computer mishap that took out warehouse access for a major grocery chain. The more recent one was due to a malware infection that spread across their network (as I recall) and its primary goal had been collecting credit card data but it had spread much further. That one was covered in eWeek and noted, by me, simply due to its proximity to me.

--
"So long and thanks for all the fish."

Wasted Day by Hardhead_7 · 2015-07-05 02:29 · Score: 1

I worked for a 3PL (third party logistics) company. Years ago, they'd decided they were going to make $$$ with SaaS, basically selling our services to others. A huge undertaking had been embarked upon to make our system usable for other companies. They got a grand total of one client.

A few years later I was working there, and we got a second client! Bad news was, literally no one was still working there that had been when the first SaaS client had been set up. So there was a lot of guesswork trying to recreate it. I was a Junior Developer at the time, and was tracking down why some data loading wasn't working right. I knew the issue was almost definitely a trigger in the database, so that day I made some changes, loaded the days's data import into the Test DB, and checked if my fixed worked. It didn't, so I cleared out the load, made another change, and did it again. OK, now it was kind of fixed, but there was a problem somewhere else. Wash, rinse repeat.

I'm sure you see where things went wrong.

About the sixth or seventh time I did this, I accidentally ran it against production. I distinctly remember the panic that gripped me the moment I hit the F5 key to execute that SQL statement - I realized what I'd done immediately. The drivers (this was a logistics company, remember?) had been out on the road for about two hours at this point, and all the sudden all their handheld devices just stopped working. Where's the next stop? As far as their handheld was concerned they didn't even have a route, much less anything on the truck. This happened for all of the Office Depot drivers in Florida. And we couldn't just reload the day either. After the initial import happened at around 1:00 am a lot of virtual paperwork was done by humans to optimize routes and such, work that couldn't be easily duplicated.

I spun around in my cubicle and told him what I'd done immediately (I was told later I looked white as a sheet) and he assured me it'd be OK. An hourly snapshot was taken by the database. We'd lose a bit of data, but it wasn't the end of the world. He went to talk to the DB Admin.

Those snapshots? It turned out six weeks ago they'd just stopped running. Why? I don't think we ever figured out for sure, but either way they weren't there. Now everyone was panicking a bit. This was a new client we'd just picked up and we didn't want to screw the pooch. In the end, they ended up doing an emergency purchase of some software that allowed them to roll the database back using the transaction logs. Fun times.

Re:Biggest tech mistake by KGIII · 2015-07-05 02:47 · Score: 1

I spent about 32,000 USD upgrading to CD-Rs in ca. 1995. The worst part is that only covered eight of the computers in the office. At the end of the year there was an offering from HP that was under 1,000 USD. By the following summer they were half that. At the end of that year they were half again. Then, not more than a year and a half after that I could find SCSI CD-Rs for near 125 USD. Blank CDs were something like eight bucks when you bought in bulk... My mistake was adopting the tech that early. We were using large data sets (for the time) and the idea was portability. It worked, it *sort of* paid for itself. It would have paid much nicer to wait. I can not say that it lost us money but I can say it sure as hell did not make us any.

--
"So long and thanks for all the fish."

Beware the killall command in AIX by supremebob · 2015-07-05 03:35 · Score: 1

I was trying to fix a broken backup process on an AIX box, and found that there were a ton of stuck Legato processes on the system. Rather than kill each one individually, entered the killall command to get the correct syntax to kill all of the processes with legato in the name.

In Linux, entering killall gives you the syntax on how the killall command works. In the old version of AIX this system was using, it killed EVERYTHING with no warning and basically rebooted the box. That's not usually not a big deal, except that this was the primary SAP database server for a Fortune 500 company. It took the DBA's about a day to clean up the mess.

The system was clustered, thankfully, but it probably cost about 10K in labor to clean up the mess.

Netware was evil by leonbev · 2015-07-05 03:50 · Score: 1

I once built a Windows NT 4 system image that used an older version of a Novell Netware driver that was incompatible with the newer version of Netware that the file servers were using.

It seemed to work fine on the master system that I built, but after that image got deployed to 50 classroom computers it flooded the network with garbage traffic and caused the entire University network (about 500 computers at the time) to crash. It took the network team about two days to figure out what the problem was.

Fried two computers for the price of 1 by p0larity · 2015-07-05 04:19 · Score: 1

When I was 12 I put the BIOS chip from one motherboard (it was still the kind of EEPROM with pins) into another in an experiment.

Sadly I didn't know what the orientation of the pins was or what the little dot meant (pin 1) so I must have reversed them.

Put the BIOS chips back but I had fried both boards.

One-line classic Cisco network outage by Lorens · 2015-07-05 05:06 · Score: 1

Working on Cisco command line, I was in the habit of typing "no " and doing a double-click-middle-click on the line I wanted to delete. Worked very well except for
(IIRC)

redistribute bgp 100 metric 100 metric-type 1 subnets route-map BGP2OSPF

In this specific copying the entire line after "no " does not remove the line, it just removes the route-map limitation, and hey presto I was redistributing our full BGP into OSP. Clincher was that it took some 20 minutes for the network to actually stop working, so bu that time I had totally forgotten about it. It took an hour to find out what the problem was and to correct it, during which my ISP was basically of the network.

A few interesting ones by nukeade · 2015-07-05 06:34 · Score: 1

*I had an off-by-one error in a TopCoder problem (I used > instead of >= in a loop) that I didn't catch that cost me $3000 in prize money and a trip to the finals.

*I was working at an observatory on campus and left the huge, Peltier-cooled CCD for the telescope on a table but still plugged into a computer and left for the day. When I came back, I found that someone had tripped over the cable, smashing the CCD on the floor. They then sat the broken CCD next to the computer without a note or anything. $7000 CCD destroyed.

*Another time I was working with an AFM in a basement of the university, and left for the day. It stormed really hard that night, and when I came back the next day the basement had 6 inches of water in it. It turns out that the water had come from a leak directly above the AFM. I guess the AFM didn't like getting a shower in filthy storm water and it cost $20-$30K to replace.

*However, my biggest save was probably more important than all of that combined. Without divulging too many details, I was writing some tests and caught a serious data-loss bug in production before any customers were affected by it. The bug actually made the news: http://www.theregister.co.uk/2...

Approximately $80,000 by Funksaw · 2015-07-05 07:02 · Score: 1

I'm the amateur programmer who first programmed the code for Lawrence Lessig's Mayday PAC. I don't know if you remember this, but the site went down on May 2, for about 8 hours, when we were raising roughly $10,000/hr. I had built everything on a LAMP stack and sent everything through a single MySQL database, which just didn't scale. (I was - and still am - an amateur). Luckily, pro developers stepped up and staunched the bleeding, and eventually we moved onto a Ruby-on-Rails system for the front-end and a NodeJS/Google App Engine solution for the backend.

Expensive Mistakes by villageelder1 · 2015-07-05 07:07 · Score: 1

Back when Linux was much more primitive I had to set the video monitor parameters by hand coding configuration files. And, by accidentally over-specifying the maximum sync rates, I "smoked" the flyback (horizonal output) transformer in a new 21" Sun monitor in short order. I typed in one wrong number and $$$.

Packard Bell - Not even once by stolidobserver · 2015-07-05 07:22 · Score: 1

Somebody unhooked the cable from inside a cabinet to a spectrum analyzer I was trying to use to monitor a signal I was setting up to a satellite. I thought something was broken and was messing around with the controls to see if anything happened. I finally found the cable wasn't connected about the same time the satellite controller came across screaming that I was about to burn out the satellite. I didn't, but it was a very close almost. When I plugged in that cable there was a huge spike on the screen.

The billion dollar mistake that nearly killed UAL by Nonesuch · 2015-07-05 08:49 · Score: 1

Three people, working independently, made errors in programming and website updates which nearly bankrupted United Airlines when the errors came together on September 8, 2008. "Shares fell to about $3 from more than $12 in less than an hour, wiping more than $1 billion in value before trading was halted.".

When the market first opened that Monday, United Airlines was trading at over $12 a share. The public summary of the events state that Chicago Tribune re-indexed their archives, resulting in a six-year-old story about United Airlines bankruptcy to be re-posted on the Web site of The South Florida Sun-Sentinel without a date. Google picked up the "new" article, saw the missing date, and inserted the current date of 9/8/2008. That article was picked up by a research firm, Income Securities Advisers, which then posted a link to it on a page on Bloomberg News, which sent a news alert based on the old article. The news alert triggered automated trading systems to issue sell orders. Nasdaq finally ordered a halt in trading the stock at 11:08 a.m, but the damage had been done, United Airlines Stock had lost 75% of it's value.

--

I do not deploy Linux. Ever.

underestimates... by Creepy · 2015-07-05 10:14 · Score: 2

Underestimating time needed happens all the time in the software industry. It probably is worse in the gaming industry where publishing deadlines often get set 6 months or more in advance, but I still get hit with guaranteed release dates for customer commitments at my job now where I've put in ~100 hour weeks to fulfill (telecommuting many of these probably saved my marriage, as I would work 4 hours after my wife went to bed). Still, it is nothing like the 160 hour weeks in the office for a game release crunch (and no, that isn't all work - I slept on beanbag chairs in the testing room and they catered in meals, but at some point you're just so burned out and stinking of feet that you need a night sleeping at home and a long shower).

I can't think of any instance where I've cost a project, but I'm sure they exist. OTOH, I did have a workaround for a $5 million dollar contract where the customer was going to reject our Linux port due to a bug I found and reported. The developer and pubs person assigned the defect were laid off after 9/11 so the defect slipped through to the customer. Fortunately, I overheard a sales person talking about it and supplied the workaround, saving the contract.

Two embarassments... by zaywot · 2015-07-05 15:54 · Score: 1

When I was a junior programmer working on a mainframe, I was given a problem ticket for an intermittent issue. I stuck diagnostics into the code, but because my disk quota was far to small, I sent the output to a virtual printer that I looped back to my account. Unfortunately, after I got the whole testcase set up (couple hours) the mainframe crashed and I went for coffee along with the rest of the 300 users on the system, for the 10 mins it took to restart. After several days where I hadn't been able to make progress because of the suddenly frequent mainframe crashes, I got a message from the operator asking me to delete my large spool files, since the mainframe was crashing due to a lack of spool space. That's when the penny dropped that my testcase had been exhausting the system spool space, crashing the mainframe about 8 times. Probably $100,000 in lost labour.

Years later, working on extending some high reliability software, I found some bugs in pre-existing code. The system had some internal checks and watchdog timers that would force a restart if it thought some code was taking too long. Both bugs would trigger the restart system by making something take too long and triggering the watchdog timer. One was in very complicated code, but explained some intermittent issues we'd seen over the years. The other was in a newly released, still unused utility, that didn't work properly on old HW, but would need to be re-written to fix. I only had time to fix and test one bug before going on a month long vacation, so I fixed the complicated one. While I was on vacation, an alpha release of the product went out, and promptly started crashing intermittently with stack corruption issues. I got back, to find six such tickets on my desk. In the meantime, the broken utility had acquired some users, so I decided to spend a couple of days fixing the utility.

It turned out that the stack corruption issue was holding up the production release, worth many millions of dollars.

Of course, I wasn't able to reproduce the intermittent stack corruption.

I spent 3 weeks looking everywhere, trying anything to reproduce it, resorting to rebuilding the alpha load where I could sometimes reproduce it, but not if I loaded my diagnostics.

Meanwhile, management was getting very antsy about the revenue implications.

My boss was very good, and sheilded me from the flames, but I didn't like seeing him getting fried, as the release date kept getting pushed.

I tried hunting around to see if anyone had been changing code in that area of the system, but of course, there were only my updates. I asked anyone I could find for suggestions, and nobody had any ideas until one person said it reminded them of one very old issue they'd worked on, and described the problem they'd had.

I went back and checked my archived output. Sure enough, I'd been a bit careless testing the broken utility before fixing it. I only checked that my testcase triggered a restart, not why. It turned out that long before it could trigger the watchdog timer, the utility corrupted the stacks of other processes.

I'd just spent 3 weeks holding up an important release, because I didn't realize I'd already fixed the bug.

Re:Took an online trading company offline for a da by AK+Marc · 2015-07-05 16:32 · Score: 1

Most people don't realize that 100/full on one side and Auto on the other should properly negotiate to 100/full and 100/half in a duplex mismatch. I've seen that problem many times.

--
Learn to love Alaska

Re:Didn't break but helped to fix... by KGIII · 2015-07-05 16:46 · Score: 1

It is all good. I can not blame you for not commenting. You may well still work there or still be covered by some sort of contract such as an NDA. I wouldn't recommend violating any such things - a job is not worth losing for idle banter with random pixels nor are said random pixels worth a court case.

--
"So long and thanks for all the fish."

800 German Marks for pushing a button.. by MoarSauce123 · 2015-07-05 23:20 · Score: 1

...a second too early. Worked as broadcasting engineer and cut short a commercial by one second. Lucky me, that was in the middle of the night, so the damage was not that bad. As you may have guessed, that was in Germany and quite a while ago. When I watch commercials on US TV they get cut off constantly, seems as if the ad customers are more forgiving here. Working as broadcasting engineer was awesome except for the craptastic hours and the constant stress of not being allowed to make even a tiny mistake.

Drilling Rig by RockDoctor · 2015-07-06 00:26 · Score: 1

After 21 days on the job (24x7 cover, typically 20 hour working day) I had to identify one saple of dark grey claystone from one of two possible other types of dark grey claystone. I decided one way, then went to pack my bags to crew-change with my relief.

The implications of deciding one way not the other were a million dollars worth of ironmongery (9.925in OD liner pipe) being run and cemented into the hole. That operation occupied a rig crew of 90-odd people for 8 days while I was on leave. When we drilled ahead, it became clear that I had been wrong. Total unnecessary cost was about 2 million dollars.

These days, I don't lose sleep for less than ten million. The fact that I still do work for the client suggests that they figure it's better to have me around than not.

A couple of years ago I got some grief for pointing out a problem on day 10 of a job, which people upstairs from me decided wasn't likely to be a problem. So they shelved the problem, told me in writing to shut up, and continued with the well. 3 months of work later, we'd made a beautifully-tuned geo-steered well ... and had to wait on weather for a major storm. And when we came back on location, the problem I'd been making a fuss about had come back to haunt us and forty million dollars worth of ironmongery and effort was junk. Several embarrassed faces upstairs, but all my fellow contractors knew who had said "We need to deal with this problem, now." when we were five million into the project. Who needs advertising?

--
Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"

Bug Hunting by Koutarou · 2015-07-06 02:05 · Score: 1

Found a gaping goatse-sized security vulnerability in a package that had been outsourced and the original contractors long since gone.

It was less expensive to just kill the product which we had been selling for about 3 years than to re-engineer the thing from the ground up with new staff.

I've done the opposite by phorm · 2015-07-06 04:21 · Score: 1

I've never heard of somebody *heating* a drive to recover a stuck head, but I've done the opposite.

Many a drive has been recovered by a day or two's stint in the freezer in deflated ziplock bag. I'd imagine the principle is the same.
With cooling, you do have to watch out for condensation build-up as the drive defrosts. With the heating I'd worry about damaging the data on the disk (magnets in general do not like heat, so I'd imagine magnetic storage would similarly be a gamble).

Computer Numerically-Controlled Machine Tools by david_thornley · 2015-07-06 09:19 · Score: 1

Since I write software that writes software for machine tools, I have extra opportunities to break things.

There's a technology called Electrical discharge machining, which means putting stuff close together in a fluid, running current through them, and having sparks burn off little pieces of material until you've got what you want. One manufacturer makes machines that have sophisticated programming, but it's not at all safe. Once, with the support guy from the company we got these from looking over my shoulder, I made a slight mistake that caused the arm of the EDM machine to slam against the metal we were machining, for a $16K repair.

Another time, a variable contained a Z level (height) that was used for two different things, but for everything we'd done up to then the two different things shared the same value. I was the guy who made the change that made the difference significant, and so some of our CNC mills thought the metal being machined was significantly lower than it was, so the setup moves for the machining that assumed the endmill was moving through air tried slamming through the metal. Some of the results were spectacular, although I never did find the cost.

Fortunately, at least for my self-esteem, people more experienced than me were supervising each of these mistakes, so I didn't feel too stupid, and my colleagues were very understanding.

--
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes

'Coolest' mistake ever by RingDev · 2015-07-06 10:15 · Score: 1

A co-worker of mine had just finished implementing a new caching system for a legacy app that interfaced between multiple systems and the mainframe to track progress and shipping of pilot production runs. Due to a bug in his code, in a very specific use case, one of the cached systems would not get flushed. This was identified a few days after the production release when the company (a multi-billion dollar food sciences multi-national corporation) received a phone call from a Pastor in BFE, Minnesota asking why we had sent him almost 500 gallons of ice cream. Apparently, his church's address was in the system from some charity event we had sponsored, since the ID and business type didn't flush from the previous transaction, when the pilot plant told the software to print labels for the next order, it pulled the shipping address from the wrong database and the ID just happened to collide.

The cost of shipping the ice cream back for disposal was ridiculous. So the company told the Pastor to have a huge ice cream social.

The responsible developer was not fired, but there were running gags about him being the Ice Cream Man for the next year.

-Rick

--
"Most people in the U.S. wouldn't know they live in a tyrannical state if it walked up and grabbed their junk." - MyFirs

Re:Most expensive mistake ever. by SharpFang · 2015-07-07 01:52 · Score: 1

Do you happen to work for RIAA? They tend to sue people for causing them losses like these.

--
45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2

Slashdot Mirror

Ask Slashdot: How Much Did Your Biggest Tech Mistake Cost?

239 of 377 comments (clear)